Dynamic tests, revisited

So this came up in the context of Dynamic set of tests · Issue #39 · la10736/rstest · GitHub which refers back to Pre-RFC: Dynamic tests

Right now, generating a dynamic set of tests in Rust requires switching to a custom test runner, for example datatest - Rust

Never mind that it requires nightly - any other proposal would likewise always live in nightly for a while - but it seems a pity one has to do this just to generate dynamic tests.

It seems there's an "easy" way to add support for the notion of dynamic tests without committing to a specific mechanism for generating them. That is, something like datatest to provide a specific way to generate the dynamic tests is great, but there's no reason this couldn't work with the default test runner.

To see this, consider the following test file:

#[test]
fn test_it_works() {
    let result = 2 + 2;
    assert_eq!(result, 4);
}

Right now it is expanded to (tweaked a bit so I could compile it as a standalone binary):

#![feature(test)]
#![feature(core_panic)]
#![feature(rustc_attrs)]
extern crate core;
extern crate test;
#[rustc_test_marker = "maybe_it_works"]
pub const __MAYBE_IT_WORKS: test::TestDescAndFn = test::TestDescAndFn {
    desc: test::TestDesc {
        name: test::StaticTestName("maybe_it_works"),
        ignore: false,
        ignore_message: ::core::option::Option::None,
        compile_fail: false,
        no_run: false,
        should_panic: test::ShouldPanic::No,
        test_type: test::TestType::IntegrationTest,
    },
    testfn: test::StaticTestFn(|| test::assert_test_result(maybe_it_works())),
};
fn maybe_it_works() {
    let result = 2 + 2;
    assert_eq!(result, 4);
}

pub fn main() -> () {
    extern crate test;
    // An array with hard-wired entries, one per `#[test]`.
    test::test_main_static(&[&__MAYBE_IT_WORKS])
}

Now, suppose that the code generated by #[test] used a global vector of TestDescAndFn instead of a hard-wired array. This would allow an API for "insert test into the vector" to generate dynamic tests in whatever way one desires (e.g., using something like datatest or any other method). Add a way to register a #[generate_tests] function, and with a small tweak to the way things are done today we get:

#![feature(test)]
#![feature(core_panic)]
#![feature(rustc_attrs)]
extern crate core;
extern crate test;
// Result of expanding `#[test]` are identical to today:
#[rustc_test_marker = "maybe_it_works"]
pub const __MAYBE_IT_WORKS: test::TestDescAndFn = test::TestDescAndFn {
    desc: test::TestDesc {
        name: test::StaticTestName("maybe_it_works"),
        ignore: false,
        ignore_message: ::core::option::Option::None,
        compile_fail: false,
        no_run: false,
        should_panic: test::ShouldPanic::No,
        test_type: test::TestType::IntegrationTest,
    },
    testfn: test::StaticTestFn(|| test::assert_test_result(maybe_it_works())),
};
fn maybe_it_works() {
    let result = 2 + 2;
    assert_eq!(result, 4);
}
pub fn main() -> () {
    extern crate test;

    // Generated main program is modified:

    // 1. Use a vector instead of an array for the tests.
    let mut tests: std::vec::Vec<test::TestDescAndFn> = vec![];

    // 2. Generate a push for each static test (annotated by `#[test]`).
    //    Can still use a simple array if there are no registered generators.
    tests.push(__MAYBE_IT_WORKS);

    // 3. Invoke any registered test generator (annotated by `#[generate_tests]`, see below).
    maybe_it_works_for_some_values(&mut tests);

    // 4. Need to call `test_main` instead of `test_main_static`, of course.
    test::test_main(&vec![], tests, None);
}
// A function marked as `#[generate_tests]` is not modified, just invoked:
fn maybe_it_works_for_some_values(tests: &mut std::vec::Vec<test::TestDescAndFn>) {
    for x in 0..10 {
        // Provide a better API here, something like `add_test!(...)`
        // with the same flags as today's `#[test]`.
        tests.push(
            test::TestDescAndFn {
                desc: test::TestDesc {
                    name: test::DynTestName(format!("maybe_it_works_for_{x}")),
                    ignore: false,
                    ignore_message: ::core::option::Option::None,
                    compile_fail: false,
                    no_run: false,
                    should_panic: test::ShouldPanic::No,
                    test_type: test::TestType::IntegrationTest,
                },
                testfn: test::DynTestFn(Box::new(move || test::assert_test_result(maybe_it_works_for(x)))),
            }
        );
    }
}
// The actual dynamic test (not annotated by `#[test]`).
fn maybe_it_works_for(x: i32) {
    let result = x + x;
    assert_eq!(result, 2 * x);
}

That is, add #[generate_tests] and add_test! (or whatever names are chosen for them), making dynamic test generation orthogonal to the choice of a test runner.

Some points:

  • If there are no dyamic tests, the generated code would be exactly the same as today - that is, dynamic tests would be a "zero cost abstraction" - only pay for it if/when you actually use it.

  • Does not change any existing types or function interfaces. So zero impact on existing code such as custom harnesses. It would make redundant any work done in them for generating dynamic tests, but they would still continue to work.

  • Either (A) expose a lot of types (TestDescFn and everything it uses) as in the example above, or, (B) use a global variable, and hide the details in a add_test!(...) macro using the global variable and taking similar flags to #[test].

Overall it seems like a small change (other than the inevitable bikeshedding for deciding on an exact API).

Is this something that would be favorably looked upon - e.g., worth creating an RFC for, pull requests, etc.? I'm not sure what the proper procedure for such things is.

1 Like

How would the test entry points would look like? Currently having the static test cases as entry points have significant benefits: one can easily know all existing test cases, and they can be found automatically. An IDE can run a specific test on click, cargo test can filter out required tests, nextest test runner can find all tests and run them in parallel. If you allow tests to be created via arbitrary logic, it's quite likely that those use cases will stop working.

So, slight caveat to this: while it's currently true that the compiler knows the full set of tests with just a check[1], the only way to get at that list (even with nightly) is a run (more specifically, cargo test -- --list). Adding dynamic test generation to that would have no impact on the interface, though it would make test discovery take longer in order to do the dynamic generation. We still get a complete list of tests before running any of them, so nothing changes beyond the extra step to get that list.

Rather than support for runtime test registration, though, I'd personally prefer extensions to the procedural macro API such that they can just generate the #[test]s as part of check --tests, registering rerun-if-changed predicates on any external file/directories utilized.

We can't access it currently, but I would like it to be possible in the future to get access to the test list after the check portion of the build is complete without requiring the build portion. This is useful to external test runners (e.g. nexttest or IDE test discovery) in workspace environments, where they'd be able to discover all tests but then only build and run one of potentially many test binaries if the only filter-included tests are in that one. (Note that cargo test filters always build and run all test binaries targets, and the individual test binaries do the filtering.)


  1. Also, remember that check can require running arbitrary code via procedural macros. ↩︎

@CAD97 : Thanks for the explanation about listing the tests. Yes, anything using cargo test -- --list should just keep working. If instead one peeks at the binaries (metadata) without running them, then naively one will only see the static tests (which is still something).

I didn't understand this though:

Rather than support for runtime test registration, though, I'd personally prefer extensions to the procedural macro API such that they can just generate the #[test]s as part of check --tests, registering rerun-if-changed predicates on any external file/directories utilized.

What does it mean "they can just generate the #[test] as part of check --tests"? How would the code do something like a glob during check --tests which doesn't actually run any code (if I understand it correctly)?

FWIW, GitHub - la10736/rstest: Fixture-based test framework for Rust already generates a list of #[test] cases based on static-known-at-compile-time data. That's great, but it still requires modifying this list whenever on-disk (or in-database) test cases are added or deleted, followed by recompiling the tests.

Instead of emitting code to do the glob, have a procedural macro would evaluate the glob. Procedural macros run arbitrary Rust code to generate token streams; essentially, code generation.

Instead of writing something like

#[test_generator]
fn generator() -> Result<Vec<Box<dyn Test>>> {
    glob("./corpus/*.in")
        .map_ok(|p| Box::new(move || test_it(&p)))
        .collect()
}

fn test_it(p: &Path) { /* … */ }

you'd write

#[test_glob("./corpus/*.in")]
fn test_it(p: &Path) { /* … */ }

with a proc macro of very roughly

fn test_glob(item: TokenStream, attr: TokenStream) -> TokenStream {
    let test: FnItem = item.parse()?;
    let test_name = &test.name;

    let glob_lit: StrLit = attr.parse()?;
    let mut out = item;

    for path in glob(&glob_lit.value()) {
        let path = path?;
        let path = path.as_str()?;
        let gen_name = combine(test_name, path);
        out.extend(quote! {
            #[test]
            fn #gen_name() {
                #test_name(Path::from(#path))
            }
        });
    }

    out
}

This would need to also provide some way of the proc macro telling the compiler to add some directory as a rebuild watch; buildscripts can already do this via rerun-if-changed directives. The test generation is actually doable on stable today, it's the communicating with the compiler about scheduling rebuilds and

Yes, this means recompiling the test target when the snapshots change. But that's also the behavior for tests integrated with the libtest harness today, so it's not really a regression on that point. For common cases like this, we could even imagine special proc macro kinds which only generate an array of inputs to give to a function and aren't able to impact compilation otherwise, making incremental compilation (theoretically) easier.

But tbh it'd probably also be satisfactory enough if the check-only way of listing tests just listed any test generators as well; then whatever integration can display the generator as a test group and/or know it needs to request a full compile and run of that target to get the full listing. That honestly does seem sufficient.

My original motivation was to suggest a KISS implementation with a minimal impact. More complex changes that enhance the metadata of the crate to list test generators and/or impact the way cargo build tracks dependencies, to make things a bit easier for the user, would probably have a much lower chance of getting adopted (and would take much longer to be implemented).

If it is possible to implement this as a normal crate, so much the better! It seems you are saying implementing #[test_glob(...)] should be possible to implement in such a crate, today, in stable, as long as one creates/updates build.rs to print some rerun-if-changed triggers?

Note that listing the matched glob files in rerun-if-changed isn't the right thing to do - one only wants to rerun the tests, not recompile, if the files change. You only want to recompile if the list of files changed.

I think one can solve this by having each #[test_glob(...)] generate a .tg file containing the glob pattern and the list of paths it matched, modifying the timestamp only if the content changed.

Also provide a fixed function rerun_if_test_globs to invoke from build.rs which would scan for such .tg files and emit a rerun-if-changed for each one, in addition to a a rerun-if-changed for each (relevant) directory used in the glob pattern, to detect adding/removing test files.

This still leaves open the question of bootstrapping - in a clean directory, there would be no .tg files so no rerun-if statements would be generated. I guess rerun_if_test_globs would also need to scan the sources for #[test_glob(...)] directives and generate rerun-if-changed for all the relevant glob directories, even if no .tg files exist.

So far this is tricky, but seems possible. The user will have to use a build.rs file and call the rerun_if_test_globs in it, but this seems an acceptable overhead.

However, I am not clear on how such an approach would only trigger rebuilding of tests (#[cfg(test)]) as opposed to rebuilding other configurations of the crate. Is this even possible?

It should be possible to pick a consistent known folder path in the target directory and watch that; then the creation of a new .tg file would trigger a buildscript rerun.

This scans the entire directory for any modifications, so it will trigger a rebuild when test data changes, unfortunately. Being able to only monitor file creation/deletion but not mtime is not currently possible.

If it's from the buildscript, it's not possible. That's the main reason I would like to be able to do it directly from the proc macro; if the proc macro itself is gated behind #[cfg(test)], its requested change tracking will only impact builds its cfgd in for.

This should be possible by only informing the compiler about the directories traversed so that it detects changes to the set of files rather than the content of the files. Unfortunately, that's not possible with rerun-if-changed (passing a directory asks for changes to the content of any file within it) and hasn't been proposed for proc-macros yet (only change tracking of a single file has been).

It seems that actually dynamically generating the tests ends up being so much simpler to implement, is easier to use, and is also more robust.

Part of the attraction of dynamic tests (for me at least) is that I can easily (and quickly!) test new cases without having to recompile. Not only that, removing test cases will not cause the tests to fail.

In contrast, compiling the list of tests into the binary forces one to recompile more often. Neglecting to do so will cause the test binary to fail if test cases were removed ("false negative") - more worryingly, the binary will silently ignore added test cases ("false positive").

What is the advantage of compiling the list of tests into the binary (using procedural macros or whatever other method)? Is it only the need to run cargo test -- --list?

There is also the -Zpanic-abort-tests option which allows testing with -Cpanic=abort by running each test in a separate process which can abort on panics without crashing the test harness itself. Spawning a new process for every test requires the list of tests to be static to avoid inconsistencies. And in fact libtest will error today if there are any dynamic tests produced using it's unstable api when using -Zpanic-abort-tests.

1 Like

It seems we need some nice use cases here. What would be the usage where generating tests using a macro is insufficient and you would want a more complex runtime system?

1 Like

@bjorn3 - good example, thanks for bringing it up. I think in my proposal the list of tests would appear to be static to the code, since I'm using the standard API to tell it about the dynamic tests. But it is worth looking into further.

@afetisov - In my case I have a program which takes a set of input files and flags, does some very non-trivial processing, and emits some output files. A test case is basically a directory with the input and expected output files. Test cases are added or removed during development, and ad-hoc test cases are used for debugging. Having to recompile the test program every time the list of test cases changes is a pain. Having dynamic tests would be much more convenient for us.

Why would you need to recompile? You should just write a single test function, which traverses the directory and checks all input test fixtures vs their expected output. If your test fixtures follow a predictable naming structure, it should not be an issue.

I admit, that it's not very convenient if you just want to check a single test fixture, but I don't see how it would be more convenient if you tried to do dynamic generation. One option would be to make the test function read an environment variable or a special file in that directory, and filter fixtures based on their contents (if set). A better option would be to allow passing program arguments from the test runner to specific test cases, but afaik there is no API for that, and I'm not sure how it would look like.

In other words, basically all dynamic behaviour you would want to do is already implementable. The question is, do we need some more declarative approach.

I've written many tests that do things like this. This technique loses the following benefits you get from the built-in test harness:

  • running test cases in parallel
  • running test cases selected by command line
  • catching panics and letting other tests finish, so you can compare which succeeded vs. which failed, rather than only observing the first failure
  • capturing println! output and displaying it only if the test fails (this is highly useful for producing diagnostics to understand the failure, in complex situations where the info doesn't fit in a single assert)
  • capturing the panic handler output, so each panic is reported along with the test failure rather than asynchronously before it

All of these can be reimplemented inside of a single #[test], but with costs:

  • an internally parallel test case can't cooperate with --test-threads or produce lines of reporting as sub-tests pass/fail
  • the filter cannot be passed on the regular command line or it will filter out the container test
  • since println! capturing is unstable/internal, you'd have to write your own entire printing destination and macros
  • installing a custom panic handler which prints to said special destination if active

The reason I'd like to see dynamic tests is so that I can write tests that have all the benefits of the built-in test harness and are generated by any procedure I care to write, rather than having to pick one or the other, or go to lengths to get (a half-baked version of) both.

2 Likes

The problem is that your test generator could be nondeterministic. The -Cpanic=abort test runner basically does roughly

fn main() -> Result<Vec<TestResult>> {
    let args = Args::parse()?;
    let test_list = get_tests();
    Ok(match args {
        Args::AbortHarness(args) => {
            test_list
                .iter()
                .enumerate()
                .filter(|(_, desc)| args.filter(desc))
                .map(|(i, _)| exec!("{self} --run-test-index {i}"))
                .collect()
        }
        Args::RunTestIndex(args) => {
            vec![test_list[args.index].run_test()]
            // this will abort on a panic indicating test failure
        }
    })
}

On a unix platform, you might fork the process to run the test without running startup code again. That's not portable, though, so we just launch the test harness again with different arguments. If get_tests isn't deterministic, weird things will happen; the harness and child process(es) won't agree on what tests are at what index. Even if you use an exact and unique name filter instead of index to find the test to run, you can still have a test show up to the harness but not to the child process. If you're looking at the filesystem to generate the list of tests in get_tests, this is a trivial TOCTOU error.

(Off-topic, but I do hope that eventually we can use RPC to run multiple tests in a process until one fails and the process dies, spinning up a new child worker process to replace it. Namely, because this would allow for optional perfect stdout/stderr capture even if tests spawn threads or otherwise bypass the Rust-API-level thread local stdout/stderr hijack. Processes are expensive to create on Windows, or at least so I'm told. We discussed process-per-test as a way to get perfect stdout/err capture back when custom test harnesses were first getting implemented.)

You are right that if the list of dynamic tests is too dynamic, the abort harness will fail. But if you want to be strict about it, this is true in general. Even if not using the abort harness, there’s a short time window between expanding the glob and running the test, if something deletes the test files during that window, the test will fail. And of course tests take a finite amount of time to run, so if the test file is deleted during the run, again the test would fail.

So I think in general the rule should be that one shouldn’t modify the test data during the execution of the test program, regardless of whether one is using the abort harness. This seems like a very reasonable restriction.

You are right however that in addition to the above, we’d need to add a second one saying test generation should be deterministic in general. For example, if one is doing property-based tests where tests are generated at random, this should be done in a reproducible way by using a random seed controlled by command line options or environment variables or something along these lines.

This again is reasonable because you would want to be able to re-run the failing tests when fixing bugs. And if you don’t care about reproducibility, it is enough that the generator will create the same number of tests for the abort harness to work.

I think this is therefore more of a documentation than an implementation issue.