Past, present, and future for Rust testing

That requires defining some data structure for this runtime access. How much metadata should be in there? &[fn()] is not enough because you want metadata to support attributes like #[should_panic(expected = "some message")], or just the name of the function. rust/src/libtest/lib.rs at 1.22.1 · rust-lang/rust · GitHub is somewhat complex and has changed over time, stabilizing it would be tricky.

This is why I mentioned generating code: you can use the usual proc macro APIs to inspect attributes and other info about the crate’s items, then generate whatever data structure is appropriate for a given test harness.

Well, an alternate harness that would want different metadata would have to process the attributes itself anyway, no?

So it shouldn’t really matter whether it does so for each item separately or at one place after all are collected. And I think separately it would be easier to implement. I can also imagine that being completely global, for which I can imagine other uses like registering extensions (one crate would define array of factories to which other crates could register—if that happens somewhere deep in the dependency chain it is better if the final user does not have to register them manually at the start of main).

Maybe later an extensible metadata format could be developed so the extended harnesses could reuse parts of the standard one, but for the first experiments I don’t think it’s big obstacle.

Thinking about it more, it would also be possible—and probably simplest—to use kind of “inverted objects” for this. That is the #[test] macro would register the test method with its name and the #[should_panic(expected = "some message")] would register separate record to a separate registry and then the runner would match them up by the function poitner or something. That would allow implementing—and reusing—the individual attributes independently.

Sorry to crash this thread – but what about doc tests? Should they be able to use custom test runners as well in some capacity? (relevant tweet)

2 Likes

Yes py.test's fixtures support parameterization. The feature in pytest has a typo/uses a different languages spelling (I'm not sure which), so the method is called "parametrize". It's extremely convenient, and uses a python-equivalent syntax to what you suggest.

FWIW your idea of type parameterization is one of the things I what I was thinking of when I described using something like rocket's request guards as a possible implementation direction. If nothing else, rocket proves that the concept is already implementable on nightly.

1 Like

I’d agree that having type parametrization for tests would be immensely useful, e.g., like Theories which combine type and value parameterization. I see a major use case for libraries exposing traits. The library authors could also expose a set of type-parameterized tests which assert invariants on the implementations of the library’s users (e.g. if I add something to a list, I should also be able to retrieve it).

Just fyi, simple test fixtures and parameterization can be emulated in stable Rust (too some point) with macros (sry, for the shameless self-plug, but I think it illustrates what could be done with the current state of affairs). Though proper support by a test harness API could be far more useful, easier to develop, and maintain.

As the maintainer of Criterion.rs, I’d like to add my thumbs-up to creating a stable benchmark-harness API. I have no idea what it should look like at this point (I suspect any serious candidate will require extensions to the macro system) but I want to make a few comments based on that experience. Some of these comments will apply to test harnesses as well.

The current way to use different benchmark harnesses is unsatisfactory for a few reasons.

First, it increases the burden on the user in that they need to separately disable the standard harness for each benchmark executable before they can use a different one, and they need to either write their own main function or use a macro from the benchmark library to do the same. I’m not sure how this burden can be reduced but it’s something we should think about.

Second, it appears that it’s only possible to replace the benchmark harness for benchmarks defined in the benches/ directory - not those defined in the src/ directory. Even if there is a way to disable the standard harness for the default benchmark executable, cargo only generates one executable that contains tests and benchmarks, so doing so would presumably disable tests. I’ve already had a user raise an issue requesting support for benchmarks in src/, as the user wanted to benchmark crate-private code which wasn’t available to benchmarks in benches/ (which are considered to be part of a separate crate). Cargo should build separate executables for tests and benchmarks and make it possible to change the harness for benchmarks in src/.

A third point is that some users will probably want to use different benchmark harnesses for different benchmark executables. If nothing else, cargo bench runs the src/ test executable as well as the benchmark one. There’s only one place to provide command-line arguments to these executables, and the standard harness panics (halting the cargo bench run) when it sees an argument it doesn’t recognize. That means I can’t define additional command-line options for Criterion-based benchmark executables without breaking cargo bench.

1 Like

Thanks to everyone who has chimed in already; all these suggestions are super helpful! After spending some time digesting all the posts so far, and ingesting holiday food, I now have some more refined ideas for what we should think about for libtest 2.0. I'll try to summarize so you all have an opportunity to disagree :stuck_out_tongue:

There are many ways to design a testing framework — see in particular relevant Catch C++ and pytest examples for particularly good ones that have been highlighted elsewhere in this thread. However, reading through this thread again, and the various issues and PRs linked in the original post, I have realized that what we really want is solid infrastructure for implementing a test harness (so that a Catch C++ or pytest clone could be implemented in, and integrated with, Rust). I believe that is what @nrc was hinting at in his first response when he said

this model does not support the division between test harness and test framework, afaict

In particular, we should be thinking about what would be needed to facilitate someone writing their own test driver, and having it integrate nicely with cargo test and the like. I'd like to echo my original post here, and say that such an integration requires two distinct components: one for running/declaring tests, and one for collecting test results. How the tests are written and organized should be up to a crate's chosen test driver.

For example, we could/should provide a default Rust test driver (shipped with rustup), which conforms pretty much exactly to what libtest has today (i.e., #[test] annotations on FnOnce() -> (), running all tests in a flat namespace, supporting the same runtime flags as today's libtest). Users could then write new testing frameworks with more sophisticated test declarations (e.g., test suites, BDD, PBT, etc.), arrangement (e.g., setup/teardown, fixtures, parameterization, serial execution), better runtime flags (e.g., regex test name matching), etc.

More concretely, I believe libtest 2.0 should consist of two components: a test driver, and a test output formatter. The former is chosen by the crate by declaring it in dev-dependencies, and setting it globally inside the crate using something akin to the global system allocator (except it would be crate-local). The user cannot change the test driver without modifying the source, partially because the test driver defines how tests are declared and specified. The latter is chosen by the user invoking cargo test, and should work for any test driver the crate may have chosen. The exact mechanism for choosing the formatter isn't clear to me yet, but changing it should not require source changes, nor changes to Cargo.toml, just changes in the user's environment. Ideally it should be possible to change with a runtime flag to the test binary (probably --format), but it would likely be acceptable for a recompile to be needed if the formatter is changed.

I believe a concrete plan for making progress would need to answer the following questions:

  • For test drivers:
    • How does the crate author select a test driver? I propose something akin to selecting the global system allocator as outlined in the original post.
    • How does the chosen test driver interact with the test binary's main? I think this can be absorbed into the trait we require the chosen test driver's "main" struct to implement.
    • How does a test driver enable users to declare tests? This is the trickiest point in my mind. Procedural macros alone aren't sufficient (afaict), since #[test] couldn't be implemented solely using those. In particular, the missing piece is that (again, afaict) a custom attribute macro cannot generate identifiers that are discoverable at runtime. This is similar to why Rocket cannot automatically register routes just from request guards, but instead require them to be explicitly named when mounting (see what the route attributes generate here). @jan_hudec aired the idea of a "registry", which could work if we carefully decide what information is included in the registry, but I'd be in favor of a slightly more flexible model where a macro has a way of declaring code that should be run in a resulting test binary's main?
  • For test result collectors:
    • How does the collector receive test results from the driver? I continue to believe that an event stream as highlighted in the original post is the way to go here. Having a crate-local (thread-safe?) on_test_event function seems like a decent option here. It could then either dispatch to the chosen formatter either with dynamic linking, some kind of lookup among multiple compiled-in implementations, or something else.
    • How does the collector receive flags and other options? This one is also a little tricky given that the command-line parameters are shared with the test driver. How do we deal with overlapping options/flags? Positional arguments allowed by both? Giving them both access to a &[&'static str] seems like punting what will certainly become an issue.

With the questions above resolved, we should be in a place where pytest, Catch C++, xUnit, and the like can all be implemented externally from libtest itself, while still integrating nicely with both the existing tooling, and orthogonally to the much-requested custom test output formats.

3 Likes

Hello

I like one think about the current situation. It is simple and trivial to use. There was Catch mentioned and its sections. Sometimes, you need to read the code many times to know which different test scenarios will run in the end (especially if you nest these sections), and with which values. Also, when you look at tests in different C++ projects, each one is different and you need to adjust.

I’m actually looking forward for the ? support in tests. And I wanted to see (and haven’t written any RFC for it yet) a quite natural extention for basic fixtures in the current implementation: a test function could take a parameter, provided this parameter would implement Default. With special case of the parameter being self, so the test could be actually a method of the harness itself.

struct Tests { ... };

impl Default for Tests {
  fn default() -> Self { ... }
}

impl Tests {
  #[test]
  fn do_stuff(&mut self) -> Result<(), Error> {
    self.it_works()?;
  }
}

The good think about it is that part is already in the pipeline and the other part feels just obvious.

Anyway, maybe this isn’t the exact place to discuss it, is it?

Dumping my thoughts on this. TLDR: I think we also need to care about use cases like cargo fuzz, and try and make all the Cargo.toml goodness for test frameworks work well as well.

The problems

Ideally, a solution for this would involve

  • being able to specify a custom framework that takes #[test]
  • being able to specify a custom framework that takes #[bench]
  • being able to specify a custom framework that takes custom attributes, e.g. #[fuzz]

This will probably make the most sense as a proc-macro style crate.

The compiler should do all the “collect up all the attributes” work for the framework, and provide an endpoint from which the test framework can provide the output AST for the program.

Cargo currently allows you to specify [[test]] and [[bench]] entries. Custom test runners should be usable with these, and also should be able to make their own.

For example, currently cargo-fuzz does a whole bunch of hacks by creating a second nested crate for storing fuzz targets, which would ideally be obviated by this.

A rough solution

Like custom derives, test frameworks can be specified via attributes:


#[test_framework]
fn framework(x: TestHelper) -> TokenStream {

    let body = vec![];
    for func in x.iter_functions("test") {
        let name = parse(func).name;
        body.push(quote_expr!($name()));
   }
   quote_item!{
       fn main() {
           $(body);+
       }
    }
}

(I haven’t written proc macros in a while, so this is pseudocode)

We can also make this more like a regular proc macro where it takes in a TokenStream, spits out a TokenStream, and you need a library to do the collecting of attributes for you. But we may want to add a &TestContext parameter through which you can get additional info and tweak parameters (e.g one could add extra Cargo dependencies through this, which would be nice but not necessary for cargo-fuzz)

A test runner crate will have a test_runner=true key in its Cargo.toml.

Crates using a custom test runner can do so like so:

[testing.test]
cratename = version


[testing.fuzz]
cratename = version

Creating a runner called test or bench will override cargo test and cargo bench respectively. Creating a runner called something else will provide a new kind of test, which can be run with cargo test --kind foo. cargo test --kind foo will run the harness on the crate itself as well as any .rs files found under foo/.

If you wish to specify targets for this kind of test, you can do it like so:

[[testing.target]]
kind = fuzz
path = "foo.rs"
name = "foo"

The [[test]] and [[bench]] keys are sugar for [[testing.target]] kind = test and kind = bench respectively.

I’ve been thinking about posting a pre-RFC on this subject myself. I have to thank @Manishearth for saving me the trouble - this is exactly what I was going to suggest, only more fleshed out. The only thing I’d add is that the test::black_box function would have to be stabilized as well for benchmarking.

This already covers pretty much everything I need for Criterion, so I’m ready to support it as-is.

With regards to the wishlist of features discussed earlier in this thread (JSON output, customizable formatters, test generation) - I’d suggest that rather than trying to design and standardize the One True Test Framework, this should be left to the Crates.io ecosystem, for all the same reasons that Rust leaves so much else to the ecosystem already. If desired, there can always be something like the HTTP crate added later to improve interoperability. For now, I’d say let’s stabilize the minimum possible interface necessary to allow community experimentation.

2 Likes

Just a question… any plans making it possible to combine them, eg let tests/a.rs use one harness and tests/b.rs another?

Not in my proposal – in my proposal you’d use different folders for these – but this can be made to work as well if it’s something folks want.

I didn’t have the need, but now there’s just the one default harness, so there’s not much to choose from. But I can imagine I could want two or three different types of tests for the same crate, but would prefer to put all of them into tests, just for clarity.

If you declare these test files in Cargo.toml instead of relying on auto-detection, it looks like this:

[[test]]
name = "a"

[[test]]
name = "b"

Today, you can add harness = false to either of these sections so that the corresponding file is compiled as a normal program with a fn main() {} function instead of with rustc --test and the built-in test harness. It’s not hard to imagine extending that syntax to harness = "foo" if/when there’s Cargo support for custom test harnesses.

https://doc.rust-lang.org/cargo/reference/manifest.html#configuring-a-target

3 Likes

I’d like to write down an idea I’ve had during yesterday’s dev-tools meeting.

It seems to me that at least some groundbreaking test frameworks were implemented completely without any language support. The two examples that come to mind are JUnit for Java and py.test for python. These frameworks crucially relay on the ability to inspect code at runt time. Specifically, it is possible to implement “collect test” behavior in Python and Java using only language build-in mechanisms for reflection.

So it seems to me that we actually lack some language feature which should allow to implement such frameworks completely independently. However, it may be the case that special-casing the tests would work better in practice, I don’t really know :slight_smile:

As a completely unrealistic strawman proposal, imagine a new kind of procedural macro, which acts as if it was called as custom_macro!(code for the whole crate). Not sure if this is practical, but it is at least a fun thing to do :slight_smile:

1 Like

We don't! Proc macros can cover this case, kinda, if you use an inner attr on the crate. Even if not, such a feature is pretty minor and easy to add.

What we're really missing is the cargo integration; cargo handles all the dependency tracking and stuff and overall makes it nice. Like I mentioned, cargo-fuzz creates a second fuzzing crate because redoing this work itself is extremely nontrivial.

1 Like

I’d like to make a slightly more concrete proposal, which I think addresses the issues raised thus far.

Creating a test runner (/harness)

A test runner is any type that implements the following trait:

#[non_exhaustive]
enum TestEvent {
    Success { name: String },
    Failure { name: String, error: String },
}

trait TestRunner: Default {
    type TestComponent;
    fn run(self, mpsc::Sender<TestEvent>);
    fn register(&mut self, Self::TestComponent);
}

In the case of the current test runner, that implementation may look like:

pub struct DefaultTestRunner(Vec<(String, Box<FnOnce() -> ()>)>);

impl TestRunner for DefaultTestRunner {
    type TestComponent = (String, Box<FnOnce() -> ()>);
    fn run(self, res: mpsc::Sender<TestEvent>) {
        for (test_nm, test_fn) in self.0 {
            test_fn(); // catch panics and send TestEvent::Failure
            res.send(TestEvent::Success(test_nm));
        }
    }
    fn register(&mut self, test: Self::TestComponent) {
        self.0.push(test);
    }
}

It may also inclue a number of procedural macros (probably attributes), such as:

fn test_decorator(ecx: &mut ExtCtxt, sp: Span, meta_item: &MetaItem, annotated: Annotatable) -> Vec<Annotatable> {
    // NOTE: this macro does not currently exist:
    register_test_component!(/* ... */);

    let mut items = Vec::new();
    items.push(annotated);
    items
}

#[plugin_registrar]
pub fn plugin_registrar(reg: &mut Registry) {
    reg.register_syntax_extension(
        Symbol::intern("test"),
        SyntaxExtension::MultiModifier(Box::new(test_decorator))
    );
}

The register_test_component macro is the magical thing that would have to be added. It doesn’t technically have to be a macro, but whatever. Specifically, what it does is accept an expression generated at compile time, which it will evaluate in the test main() at runtime and then pass to the chosen test runner’s register method.

Note also that you can register components that aren’t necessarily just a single test. They can be whatever set of types the test runner cares about (suite, block, test-generating function, any of the above, etc.).

Choosing a test runner

To choose a test runner, the user would add, say, test_runner_foo to their dev-dependencies, and then this to their src/lib.rs (or whatever other crate they want to use that test runner in):

#[macro_use]
#[use_test_runner(DefaultTestRunner)]
extern crate test_runner_foo;

Note that this would select the indicated test runner for the entire crate. Since integration tests (files in tests/*.rs) are basically their own crates (they extern crate the crate-under-test (CUT)), they can choose whether or not they want to use the same test runner as the CUT.

What happens at compile time?

If a crate is compiled with cfg(test), the following main is generated:

fn main() {
    let mut runner = crate_test_runner!()::default();
    // for every expression registered at compile time with register_test_component!:
    for expr in registered_test_components!() {
        runner.register(expr);
    }
    
    let (tx, rx) = mpsc::channel();
    let runner = thread::spawn(move || runner.run(tx));
    for event in rx {
        match event {
            /* initially, just mimic current test output
               eventually we'll want the ability to forward
               to a customizeable formatter. */
        }
    }
    runner.join();
}

What about benchmarks?

A benchmark harness could be implemented the exact same way. You’d add another annotation (#[bench]), make TestComponent an enum of Test and Benchmark, collect them separately, and choose in run which ones to run depending on whether benchmarks or tests were requested.

2 Likes

Would it be possible to somehow not restrict the list to just tests and benchmarks, but allow to provide more dynamically? Eg. if someone decides to add #[fuzz] or #[quickcheck], or whatever else, to be able to do it without going through the RFC process and modifying the compiler?

Yes. I believe the scheme above would allow arbitrary annotations to be registered for a test runner. The restriction is that it only works when we (or rather, the compiler) controls the main function, which happens when you instruct it to compile tests with rustc --test (which sets cfg(test) and builds in the test harness).