While I agree with @quodlibetor that it’d be neat to be able to specify other mains beyond those of test, I think trying to tackle that in the same breath as custom test frameworks will cause us to make no progress. As far as I can tell, my proposal above could be extended to what @quodlibetor proposes, but I think we should deal with this problem in two phases. First get custom test frameworks to a good place, and get some experience from that process, and then see if we want to expand that to also include arbitrary compile “modes” where libraries can provide their own main()s.
I think this is a very important point, and probably something we all agree on. I believe my proposal also provides this.
I think filtering should be completely handled by the test runner. This does mean that crates with different runners will have different arguments to cargo test, which is a little unfortunate, but I think otherwise we’d be constraining them too much. I don’t have any concrete examples, but I suspect that forcing all test frameworks to support the same flags would bring some pain. We might want to think about requiring that the first positional argument is a filter for which tests to run, but even then we have questions like “are regexes supported?”.
This is related, and also very tricky. I think we’ll simply want to stipulate that cargo test passes all of its flags on to the test runner, and the test runner decides what to do with them. Maybe it makes sense to specify some flags that must be supported to promote consistency, but I don’t have a good sense of what those would be yet.
Absolutely. Though I am of two minds here about how we actually go about that. My thinking with TestEvent was that there should be some common set of events that we expect basically every test runner to want to report. And “standardized” formatters would deal only with those events. If a runner is very special/different, it probably doesn’t make sense to try to fit it with any formatter the user might think of using. Instead, those test runners will likely bypass the output formatter, and instead produce their own output directly. However, I expect that to be relatively rare – I think most testing frameworks can get away with a very narrow set of events.
I’ve been wrestling with this as well, and don’t have a good, clean answer. I think it makes sense that the mechanism we use for introducing custom test frameworks into Rust should be general enough to also support benchmarking (e.g., both mine and @quodlibetor’s suggestion do that I believe), even if the form those runners will take are very different. It’s true that it’ll probably be tricky to have test output formatters integrate with benchmarks, but that’s fine I think.