Thanks to everyone who has chimed in already; all these suggestions are super helpful! After spending some time digesting all the posts so far, and ingesting holiday food, I now have some more refined ideas for what we should think about for libtest 2.0. I’ll try to summarize so you all have an opportunity to disagree 
There are many ways to design a testing framework — see in particular relevant Catch C++ and pytest examples for particularly good ones that have been highlighted elsewhere in this thread. However, reading through this thread again, and the various issues and PRs linked in the original post, I have realized that what we really want is solid infrastructure for implementing a test harness (so that a Catch C++ or pytest clone could be implemented in, and integrated with, Rust). I believe that is what @nrc was hinting at in his first response when he said
this model does not support the division between test harness and test framework, afaict
In particular, we should be thinking about what would be needed to facilitate someone writing their own test driver, and having it integrate nicely with cargo test and the like. I’d like to echo my original post here, and say that such an integration requires two distinct components: one for running/declaring tests, and one for collecting test results. How the tests are written and organized should be up to a crate’s chosen test driver.
For example, we could/should provide a default Rust test driver (shipped with rustup), which conforms pretty much exactly to what libtest has today (i.e., #[test] annotations on FnOnce() -> (), running all tests in a flat namespace, supporting the same runtime flags as today’s libtest). Users could then write new testing frameworks with more sophisticated test declarations (e.g., test suites, BDD, PBT, etc.), arrangement (e.g., setup/teardown, fixtures, parameterization, serial execution), better runtime flags (e.g., regex test name matching), etc.
More concretely, I believe libtest 2.0 should consist of two components: a test driver, and a test output formatter. The former is chosen by the crate by declaring it in dev-dependencies, and setting it globally inside the crate using something akin to the global system allocator (except it would be crate-local). The user cannot change the test driver without modifying the source, partially because the test driver defines how tests are declared and specified. The latter is chosen by the user invoking cargo test, and should work for any test driver the crate may have chosen. The exact mechanism for choosing the formatter isn’t clear to me yet, but changing it should not require source changes, nor changes to Cargo.toml, just changes in the user’s environment. Ideally it should be possible to change with a runtime flag to the test binary (probably --format), but it would likely be acceptable for a recompile to be needed if the formatter is changed.
I believe a concrete plan for making progress would need to answer the following questions:
- For test drivers:
-
How does the crate author select a test driver? I propose something akin to selecting the global system allocator as outlined in the original post.
-
How does the chosen test driver interact with the test binary’s
main? I think this can be absorbed into the trait we require the chosen test driver’s “main” struct to implement.
-
How does a test driver enable users to declare tests? This is the trickiest point in my mind. Procedural macros alone aren’t sufficient (afaict), since
#[test] couldn’t be implemented solely using those. In particular, the missing piece is that (again, afaict) a custom attribute macro cannot generate identifiers that are discoverable at runtime. This is similar to why Rocket cannot automatically register routes just from request guards, but instead require them to be explicitly named when mounting (see what the route attributes generate here). @jan_hudec aired the idea of a “registry”, which could work if we carefully decide what information is included in the registry, but I’d be in favor of a slightly more flexible model where a macro has a way of declaring code that should be run in a resulting test binary’s main?
- For test result collectors:
-
How does the collector receive test results from the driver? I continue to believe that an event stream as highlighted in the original post is the way to go here. Having a crate-local (thread-safe?)
on_test_event function seems like a decent option here. It could then either dispatch to the chosen formatter either with dynamic linking, some kind of lookup among multiple compiled-in implementations, or something else.
-
How does the collector receive flags and other options? This one is also a little tricky given that the command-line parameters are shared with the test driver. How do we deal with overlapping options/flags? Positional arguments allowed by both? Giving them both access to a
&[&'static str] seems like punting what will certainly become an issue.
With the questions above resolved, we should be in a place where pytest, Catch C++, xUnit, and the like can all be implemented externally from libtest itself, while still integrating nicely with both the existing tooling, and orthogonally to the much-requested custom test output formats.