Past, present, and future for Rust testing

jonhoo · December 8, 2017, 4:42am

Moved from https://users.rust-lang.org/t/past-present-and-future-for-rust-testing/14293.

Support for a custom testing harness has been a long-standing item on the Rust wishlist. Many a bug and PR has been closed or punted on with the reasoning that "this will be fixed by custom test harness support". This post tries to summarize the areas in which a custom test harness would be useful, the discussion so far, what features it would need to support, open questions, and some thoughts on how it may be implemented. It does so in the hope that this might help lay a foundation for further discussion, and what things will need to be resolved before continuing.

What is this, and why now?

See the IRC discussion from #rust-libs starting here (it's not very long). Long story short: hopefully this "summary" of the current state of affairs and proposals is useful for a meeting that may happen at the work week next week. And if not, it might still be useful as a starting point for improving libtest.

Why a custom test harness?

Configurable test output formats. This is probably the largest single category of complaints about the current standard test harness. Everything from how to format the output of assert! to supporting standardized machine-readable output formats like TAP, mozlog, and xUnit XML. Several users have also requested support for JSON output (#2234, #45923, #46450) for IDE integration.
Stabilizing benchmarking. The second most sought-after aspect of this is the eventual stabilization of built-in benchmarking (i.e., #[bench]). This has seen a lot of discussion (summarized below), but overall the wish is that benchmark "tests" in one form or another should be supported on stable.
Grouping tests. The test harness that ships with Rust by default considers all tests equal. Every function annotated with #[test] will be run as a test, no matter where it is, and independently of any surrounding code or tests. There is no support for setup/teardown code for tests (#18043), nor a notion of test "suites" that are related (and may have shared setup/teardown).
Finer control over test execution. Tests can currently be ignored (#[ignore]), allowed to fail (#[allow_fail]), or they can be skipped entirely by passing a test name filter to cargo test. While this suffices for simple use-cases, there are plenty of situations where better control over which tests run when, and how, would be desireable (e.g., #45765, #42684, #43155, #46408, and #46417)
Test generation. The #[test] annotation only allows a static set of tests that are defined in code. However, there are several cases where dynamic generation of tests would be welcome (e.g., for parameterized tests(1 2), tests with multiple fixtures, or other kinds of dynamically generated tests).

There has been significant previous discussion on this topic, both in terms of desired features, goals, obstacles, and implementation approach. This post represents a very condensed summary of those discussions. I encourage interested readers to go read the linked threads. At a high level, I'll refer to @alexcrichton's comment on cargo#2790:

Most of what we've been thinking has been along the lines of separating the concern of test frameworks and test harnesses. That is, a test framework is the syntax that you actually define a test. I believe the descriptor crate would fall in this category, but not the expector crate. Our thinking is that we don't actually deal with test frameworks at all for now and instead punt them to the eventual stability of plugins. The one interface for --test to the compiler would be the #[test] attribute. In other words, any test framework would in the end "compile down" to a bunch of no-argument functions with a #[test] attribute (like we have today).

On the other hand, though, the test harness is something we consider is responsible for running these functions. The test harness today for example runs them in parallel, captures output, etc. It should be the case that any test harness is capable of being plugged into any test framework, so we'd just need a standard interface between the two. We're also thinking that test harnesses encompass use cases like running in IDEs, producing XML/JUnit output, etc.

Relationship to benchmarks

Let me first point you to 1, 2, 3, 4, 5, and 6.

These threads are mostly about stabilizing #[bench] by pulling things out to separate crates. These have mostly been closed or punted on, but there seems to be general consensus that benchmarking will see progress with custom testing frameworks.

It is unclear what the full story is here --- do we want to treat test and bench as separate, or as part of the same problem (and thus should share a solution)? Personally, I think benchmarking and testing are sufficiently different that they shouldn't be handled through the same mechanism (i.e., #[test] and #[bench] should be handled by different runners). That said, as observed below, it may very well be that we want their output to be unified. This suggests that we may want two interfaces for tests/benchmarks: one for running, and one for formatting.

Amidst a discussion on stabilizing #[bench], @nikomatsakis observes:

Another aspect that hasn't been much discussed here is that I think we should stabilize a standard output format for writing the results. It ought to be JSON, not text. It ought to be setup to give more data in the future (e.g., it'd be great to be able to get individual benchmarking results). We've already got tools building on the existing text format, but this is a shifty foundation.

He continues by pointing out that the output of benchmarking is already used by other tools (like @BurntSushi's cargo-benchcmp), and it'd be neat to have a single format all of these tools could consume. This suggests that there should be at least some connection between testing and benchmarking (namely their output format).

On the topic of benchmarking, he also observes that:

I think part of this is that the current runner basically forces you to have closures that execute very quickly, which means I can't write benchmarks that process a large amount of data. This all seems eminently fixable by building on the existing APIs.

That same thread has pointers to JMH as the gold standard for microbenchmarks. We may want to draw some inspiration from that in designing a benchmarking interface.

@ tomaka suggests that different benchmark types should just be different annotations:

Since procedural macros are finally slowly getting stable, alternative test harnesses or benchmarkers should probably simply be procedural macros.

In other words if you want to use the default bencher you use #[bench], and if you want to use a third party library you add a dependency to awesome_library and you use #[awesome_library_bench].

@ alkis lists a number of further requests for benchmarking here.

Implementation thoughts

Separating runners and formatters

PR#45923 and PR#46450 suggest providing JSON test output as a stream of test events. Specifically:

{ "type": "suite", "event": "started", "test_count": "1" }
{ "type": "test", "event": "started", "name": "f" }
{ "type": "test", "event": "failed", "name": "f" }
{ "type": "suite", "event": "failed", "passed": 0, "failed": 1, "allowed_fail": 0, "ignored": 0,  "measured": 0, "filtered_out": "0" }
{ "type": "test_output", "name": "f", "output": "thread 'f' panicked at 'assertion failed: `(left == right)`
  left: `3`,
 right: `4`', f.rs:3:1
note: Run with `RUST_BACKTRACE=1` for a backtrace.
" }

Leaving the fact that it's JSON aside for a second, this kind of event streaming seems like a solid foundation on which to build both test runners (emit events as they occur), and output formatters (decide how you want to show each event/set of events). This could also be extended to benchmarks:

{ "type": "benchmark", "event": "started", "name": "b" }
{ "type": "benchmark", "event": "finished", "name": "b", "runtime": 300, "iterations": 100 }

By introducing this abstraction, test runners and output formatters can be nicely decoupled. This also addresses a concern that was raised in RFC1284: a user may want to change testing output at runtime with flags (for example based on whether consumer is a human or machine). This kind of design would let a crate pick its test runner (perhaps because it needs a particular feature), while the user can independently pick the formatter. @ tomjakubowski suggests something like --format=tap. The thread also has more requests for pluggable output formats further down. This design also allows features that might otherwise be tricky such as exposing test results in a streaming fashion.

Choosing a test runner

Crates may want to pick a specific test runner because they provide a particular feature (like test setup/teardown), or perhaps because they integrate deeply with some other external system. This process should be relatively painless. To quote @alexcrichton:

my hope is that you just drop a [dev-dependency] in your Cargo.toml and that's almost all you need to do

This has much the same flavor as changing the global, default allocator (RFC1974, #27389), and we may be able to borrow the solution for there. For example, one could imagine an interface like:

#[test_runner]
static RUNNER: MyTestRunner = MyTestRunner;

use std::test::TestRunner;

struct MyTestRunner;
impl TestRunner for MyTestRunner {
    // ...
}

API for users

From https://internals.rust-lang.org/t/pre-rfc-stabilize-bench-bencher-and-black-box/4565/19:

Last time Alex and I worked through a custom test framework design, we landed on punting test definitions entirely to procedural macros, likely compiling down to today's #[test] fns. For benchmarks there would possibly need to be some extensions to the test crate's APIs since it itself is responsible for running today's benchmarks and isn't extensible to other approaches.

@brson

This seems pretty reasonable to me. In fact, using just #[test] as it exists today is probably sufficient. Additional ways of specifying tests (e.g., stainless) can use macros to produce #[test] annotated functions, but this does not require special support in libtest.

One addition that might be nice for supporting fixtures and parameterized tests is to introduce an annotation like #[test_generator], which is a function that dynamically generates tests to run. Something like:

#[test_generator]
fn test_many() -> impl Iterator<Item = (String, Box<FnOnce() -> ()>)> {
    (0..10).map(|i| Box::new(move || test_one(i)))
}

Currently, the calls to test_one would need to be made directly from test_many, but this both forces those tests to be serialized, and causes errors in one generated test to also fail all the others. With a test generator-like construct, these could be run independently by the runner.

Another feature that is often requested is support for test suites. Here, I'd like to echo @brson from this comment:

As to some of the requirements you mention, nesting of tests we see as being done through the existing module system. Rust tests already form a hierarchy through their module structure and naming, they just aren't represented or presented that way in the test runner. They way we've been envisioning this nesting working is as a feature of the test runner (or harness): they would take the flat list of test names and turn it into a tree based on their names, which already reflect the module hierarchy.

To support arbitrary test names I might suggest further test attributes like:
 #[test_name = "some spec"]
 mod {
     #[test]
     #[test_name = "some spec"]
     fn mytest() { }
 }
If we end up accumulating a lot of new requirements for features that require communication between the frontend test frameworks and the backend test runner, and therefore a lot of new annotations to support them, then this plan could start looking pretty messy. But I'm still hopeful that most test framework features can be expressed either as syntactic sugar over a simple test definition that we mostly already have; or as features of the program actually executing the tests; without a whole lot of new information necessary to communicate between the two.

API for the runner

This is where a lot of the trickiness (and bikeshedding) will arise. Some questions:

How does the runner learn about command-line arguments?
How are tests and suites communicated to the runner? An iterator? Consecutive method invocations?
How does the runner asynchronously report test results? Futures? Channels? Some abstracted form of Writer?

API for the formatter

The interface here is likely much more straightforward. Probably all that is needed is a number of methods that are called when an event occurs (e.g., suite starts, test starts, test finishes, etc.), and it's up to the formatter to eventually write its formatted results to some designated output. Some questions:

Does the formatter always write to STDOUT? If not, how is it informed about the correct output? A Write perhaps?
How does the test runner actually use a formatter? If the user puts tap-format in their [dev-dependencies] and gives --format=tap, how does the test binary end up feeding events from the test runner to the formatter? Does it recompile to link against tap-format specifically, or does it always link against all formatters specified in [dev-dependencies] and dynamically switch between them depending on the provided --format flag?

Open questions:

@brson points out that we'd like to support other types of tests from cargo in the same framework:
We'll probably want support for setup/teardown for doctests.
How do we distinguish between tests with the same name (potentially in different suites that run in parallel). Including filename+line would help with this, but we probably also don't want to include that with every event.
If the test runner dictates what command-line arguments are available, and how they're interpreted (e.g., regex test name filtering), users may be surprised when they move between crates and cargo test behaves differently. Do we need a third layer that chooses which tests to run? Should certain filters/flags be supported across all runners?
Do we want additional annotations, such as whether some tests must run serially1 2, to be a feature of the test runner or a feature of libtest? If the former, how are annotations communicated to the runner? If the latter, how is this enforced when we don't have control over how tests are run? From the linked threads:

Discussed at the dev-tools meeting today, this problem can be solved today using a mutex (I filed #43155 to document that). Given that this solution feels somewhat ad hoc and there is workaround, we would prefer to punt on adding this in favour of custom test runners or some other long-term solution.

@nrc
How are command-line arguments to libtest/the test runner/the formatter handled and distributed? This comment, and the following discussion, brings up a good point about flags and options to the test runner through cargo. What should, e.g., cargo test --verbose do with a custom testing framework? It's likely that we'd want cargo to forward nearly all options and flags (especially unknown ones) to the test runner (currently realized by including them after --).
@ekiwi brings up some interesting points about testing on embedded platforms:
- the test harness should be able to be made up of at least two different binaries: one that runs on the host pc and another one that runs on the target
- there needs to be a mechanism to be able to compile only a certain number of tests, so that we can make sure that they still fit on the target
It's unclear that we actually want to solve this with this new custom test framework design, but the resulting utest-rs crate may be worth checking out.
@ jsgf asks for the ability to "generate a list of all the tests/benchmarks without actually running them (machine parseable)". How might we do this? Perhaps by having a "dummy" runner that just emits each test it is given without running it?

Other threads:

Issues · rust-lang/cargo · GitHub Add option for printing test output summary.
PR#44813 and issue #43381:

The testing framework should have a logging/reporting interface and the teamcity service messages could be one such listener (the console and crate cargo-test-junit being other implementations of test listeners).

@ gilescope
crates.io: Rust Package Registry Pulls out all of the rust testing, including benchmarking.

Passes --extern test=…/target/…/libtest-….rlib to rustc so that extern crate test uses that crate instead of the standard library one. Since that crate doesn’t use #[unstable], this works fine on stable.

It's not clear that this solves anything, it just makes the same testing code that currently exists in std externally accessible. But it is cool that it works.
PR#46417 and issue #46408: Test name filters do not allow for disambiguating tests with overlapping names. More generally, the current test name filtering is extremely limited, and would be something that a custom test runner would need to be able to augment.
meeting-minutes/weekly-meetings/2015-03-24.md at 64c85df3b86d2c19183257892294d4ad80cdd0a8 · rust-lang/meeting-minutes · GitHub Some short internal discussion about libtest and benchmarks:
Issues · rust-lang/rust · GitHub Request for support of "run test near this line" for editors.

/cc @dtolnay @alexcrichton @nrc @nikomatsakis @llogiq @QuietMisdreavus @steveklabnik

jonhoo · December 8, 2017, 4:43am

This post was moved from https://users.rust-lang.org/t/past-present-and-future-for-rust-testing/14293. See initial discussion there.

jonhoo · December 8, 2017, 4:48am

Reply to @nrc at Past, present, and future for Rust testing - #4 by nrc - The Rust Programming Language Forum :

I think if it is easy enough to install a test runner using rustup, then that would be OK

Yes, I completely agree with this! If the default test runner can easily be bundled with rustup then that's a much better solution. It would also allow the test runner to be maintained externally, which makes it less likely to stagnate.

my feeling is that at some point a test runner is going to want an annotation that someone else isn’t going to want

Yup, I think you're right about that. This is why I think it'd be reasonable to come up with some small set of annotations that we expect every runner to support (possibly just #[test], but I think setup/teardown will be common enough that it's worth picking/supporting it), and then allow test runners to manage additional annotations.

I don’t think we should throw it away or anything. I share the goal of being able to do benchmarking on stable via a ‘built-in feeling’ mechanism. I just think the way to do that is to stabilise an API and provide an external tool to use it.

Yeah, that seems reasonable. I think a benchmark runner could fall in a similar category as the test runner, in that it could be maintained externally and distributed with rustup.

euclio · December 8, 2017, 5:11am

Great summary! Thanks for the writeup. Perhaps slightly orthogonal to this discussion, but I’d also like to see the collection of test coverage get some love. There are tools like cargo-travis and this tutorial but it’s still a pain to get reliable coverage for all crates in a workspace.

matklad · December 8, 2017, 8:53am

Absolutely great write up, thanks @jonhoo !

I have two points to make

I think setup/teardown will be common enough that it’s worth picking/supporting it

I am a huge fan of py.test framework, which uses a better approach for test fixtures than JUnit-style setup/teardown. I do not want to argue that py.test is the best (though it is ), but I want to suggest that setup/teardown might not be the ideal approach to fixtures, especially in a language with RAII. While its undeniably true that setup/teardown will be common, because everyone is used to them, I would not be happy if Rust bakes them in.

PR#459232 and PR#464502 suggest providing JSON test output as a stream of test events. Specifically:

I have high hopes for those PR. I think it is important to have a single standard format for test events, such that all test runners support it (or, at least, have an incentive to support it): it won't be super convenient to ask users to add something to their Cargo.toml just to make IDE work.

SimonSapin · December 8, 2017, 10:27am

Unlike nrc (in https://users.rust-lang.org/t/past-present-and-future-for-rust-testing/14293/4) I think procedural macros are the way to go. Rust does not have runtime introspection, so collecting for example functions with a #[test] attributes necessarily involves generating code.

Macros 2.0 already enables defining new attributes for per-item processing. #[test] (or #[foolib_test]) could be such an attribute. What’s missing is some way to find all tests in a crate, and generate code in fn main() {…} based on that. Do do that, there could be:

Some way in a proc macro to traverse all items in a crate. This may not be a good fit for a TokenStream-level API where the tokens mod foo; don’t have the semantics associated to finding the module’s source in some other file.
Make proc macro attributes like #[foolib_test] have side effects to “register” tests in some kind of crate-global registry. This would require some way to declare intra-crate dependencies so that proc macros using that registry are executed after those adding to it.

This crate-global registry mechanism for proc macros could be general-purpose, not limited to test harnesses. For example, Servo/Stylo have a DSL for defining CSS properties, and generate code based on that. At the moment this works by writing a .rs file in a build script, but proc macros would provide a better debugging experience (correct source span/location in compiler error messages).

oli-obk · December 8, 2017, 10:34am

What I’m missing are compilation tests. If you have a library that makes a safe abstraction, you will want to test that using it wrongly actually produces an error.

While we have the compiletest crate, it’s not really nicely integrateable.

Another issue is changing compiler output. Changing rules around lifetimes might still error, but with a different message.

matklad · December 8, 2017, 10:47am

We do have compiletest on stable though: https://github.com/rust-lang/rust/pull/43949

SimonSapin · December 8, 2017, 10:51am

https://doc.rust-lang.org/nightly/rustdoc/documentation-tests.html#attributes has an example of a compile_fail doctest.

epage · December 8, 2017, 7:06pm

I want to reaffirm this. py.test's use of reusable fixtures and setup/teardown using dependency injection is amazing.

In pytest, you define what fixtures you need by the parameter names of your function. A fixture can be a simple function or it can be Python's equivalent of RAII which is how you handle setup/tearDown. Each fixture can also accept fixtures.

Benefits

Decouple fixture initialization from test organization unlike suite-wide setup/teardown
Easy composition of fixtures
While custom command line flags aren't automatically fixtures, its trivial to convert them to fixtures
Fixtures can determine if a dependent test should be skipped (for example, based on missing command line flags needed to initialize for hardware in your system)

Not sure the best way, if any, this should adapt to Rust but pytest is one of the best testing systems I've seen.

johnthagen · December 8, 2017, 8:19pm

We should also look to the fabulous Catch C++ framework for some inspiration. It's a joy to use, doesn't have some of the boilerplate of a setup/teardown approach, and it's designed for a system's programming language with RAII.

Here's its docs on SECTIONs.

Most test frameworks have a class-based fixture mechanism. That is, test cases map to methods on a class and common setup and teardown can be performed in setup() and teardown() methods (or constructor/ destructor in languages, like C++, that support deterministic destruction).

While Catch fully supports this way of working there are a few problems with the approach. In particular the way your code must be split up, and the blunt granularity of it, may cause problems. You can only have one setup/ teardown pair across a set of methods, but sometimes you want slightly different setup in each method, or you may even want several levels of setup (a concept which we will clarify later on in this tutorial).

Catch takes a different approach (to both NUnit and xUnit) that is a more natural fit for C++ and the C family of languages.

TEST_CASE( "vectors can be sized and resized", "[vector]" ) {

    std::vector<int> v( 5 );
    
    REQUIRE( v.size() == 5 );
    REQUIRE( v.capacity() >= 5 );
    
    SECTION( "resizing bigger changes size and capacity" ) {
        v.resize( 10 );
        
        REQUIRE( v.size() == 10 );
        REQUIRE( v.capacity() >= 10 );
    }
    SECTION( "resizing smaller changes size but not capacity" ) {
        v.resize( 0 );
        
        REQUIRE( v.size() == 0 );
        REQUIRE( v.capacity() >= 5 );
    }
    SECTION( "reserving bigger changes capacity but not size" ) {
        v.reserve( 10 );
        
        REQUIRE( v.size() == 5 );
        REQUIRE( v.capacity() >= 10 );
    }
    SECTION( "reserving smaller does not change size or capacity" ) {
        v.reserve( 0 );
        
        REQUIRE( v.size() == 5 );
        REQUIRE( v.capacity() >= 5 );
    }
}

notriddle · December 8, 2017, 9:01pm

So… like this?

#[test_suite]
fn vec_resize() {
    let mut v = Vec::with_capacity(5);
    assert!(v.len() == 5);
    assert!(v.capacity() >= 5);
    test!("resizing bigger changes size and capacity", {
        v.resize(10, 0);
        assert!(v.len() == 10);
        assert!(v.capacity() >= 10);
    });
    test!("resizing bigger changes size and capacity", {
        v.resize(10, 0);
        assert!(v.len() == 10);
        assert!(v.capacity() >= 10);
    });
    // * notriddle elides the rest of the Vec test suite.
}

That does seem very nice and usable, and surprisingly Rustic. More importantly, it can be desugared to N functions, one for each test! declaration (which is what I’d want anyway, because that’s friendlier to borrowck).

kornel · December 8, 2017, 9:28pm

Please make test output format possible to change without any changes to the source code.

I would like human-readable output when I test from CLI, but also machine-readable outputs for integrations with CI servers and IDEs. If that depends on annotations in the source code, or changes to Cargo.toml, it’ll be hard to work with.

jsgf · December 8, 2017, 10:18pm

One area I’d like to address is controlling execution, esp for tests which affect global state. They’re relatively rare in pure Rust, but we’ve been hitting it a lot in Rust/C++ integration testing.

In particular we’d like to control of:

only ever invoke one test at a time per process instance
completely serialized execution
run each test in its own sub-process (either parallel or sequentially)

Ideally the correct test execution mode could be derived from constraints on each test. For example

#[test(process_isolated)]
fn my_test() {
// change global vars without affecting anything else
}

#[test(singleton)]
fn other_test() {
// we know we're the only test running in this process instance
}

#[test(serialized = 1)]
fn in_sequence() {
// run one at a time in a specific sequence
}

In this case, singleton tests would only run when specifically requested on the command line, otherwise everything else gets run as now - except serialized tests are always run exclusively to other things.

In general I wouldn’t expect this to be used too much, and using it means something strange is going on. But we definitely have C++ libraries with global state which work very badly with the default runner.

Also these libraries need to be initialized from the main thread, before any other threads are started up - we we also need an init phase:

#[test(init)]
fn init_foo() {
// Not a test, but set up some state from main thread. Panicing is not a test failure, but
// does prevent any test from producing results
}

I’ve been thinking about doing a library which simply allows a normal executable to emulate the standard framework (ie parses the same command-line options, generates the same output to terminals and files), which would allow us to write normal executables as compatible test binaries. But that would give us maximum flexibility but would lose all the convenience of the current generated frameworks. But maybe we can recover all that with proc macros?

quodlibetor · December 8, 2017, 11:37pm

I strongly agree with this. py.test "fixtures" are basically dependency injection managed by the py.test runner. They're very similar to Rocket's request guards, except for how they're defined.

A direct translation of pytest's fixtures might look like:

#[cfg(test)]
mod test {
    #[test_fixture]
    fn simple_data() -> String {
        "A fixture that returns simple data and is called each time it's requested".into()
    }

    #[test_fixture]
    fn complex_data(simple_data: String) -> String {
        simple_data.to_kebab_case()
    }

    // This uses the names of functions that the test_fixture attribute was applied to
    #[test]
    fn uses_data(simple_data: String, complex_data: String) {
        assert!(simple_data.starts_with(&"A fixture"));
        // you would never compare fixtures to each other in a test, but for our purposes
        assert_eq!(simple_data.to_kebab_case(), complex_data);
    }

    // pytest has a built-in `request` fixture that allows you to do things after tests finish,
    // this is how you would do a setup/teardown pair.
    //
    // autouse means the same thing as a #[begin] would mean in the current
    // proposal
    #[test_fixture(autouse="true")]
    fn begin(request: ::std::test::TestRequest) {
        setup_database(env::get("DB_URL"));
        request.register_finalizer(|| {
            reset_database(env::get("DB_URL"))
        })
    }
}

A more rusty solution might look more like Rocket's request guards (based on the types of arguments rather their names), but it'd probably be nice to have something like a test_fixture macro to reduce boilerplate even in that case.

The big advantage of pytest fixtures, in my experience:

they encourage clarity in test design where each test states what it requires
they're efficient: you only end up actually doing the required work to run each test, without requiring crazy refactoring to make things more efficient.
DI is just super nice

Centril · December 9, 2017, 1:42am

Nice work summarizing things.

There seems to be a heavy focus here on unit testing, and while that is nice, discussion regarding the needs of property based testing (PBT) libraries seems lacking. Currently, the two larger libraries providing PBT functionality are the crates proptest and quickcheck. These may be helped by having a notion of “number of tests passed” (which is different from number of #[test]s passed).

I want Rust to have excellent support for PBT surpassing even Haskell and Erlang in the future.

quodlibetor · December 9, 2017, 10:34pm

I realized my previous comment could be read as me not wanting any progress unless it looks like py.test. mostly I was hoping to demonstrate (without requiring folks to read through some Python libs’ docs) that there are reasonable alternatives to the xUnit standard. (And maybe to inspire some enterprising developer to port pytest to rust to see if it’s possible.)

That said, something maybe more useful to the discussion:

I would prefer if there weren’t any required attributes (no test, nor setup/teardown)
That would mean test-lib authors would be free to write their own collectors (e.g. test attribute macros) and runners. Which would also require that the built-in proc macro system was powerful enough to do that.
Given that, it would be fine for the current libtest to grow more opinionated (xUnit, Catch C++, pytest, whatever)
To bootstrap an ecosystem it might be nice for the libtest test runner to depend on itself as a library, so that any default collection strategies (test, begin/end) can be picked up and enhanced by other test libs.
I really like the direction the original summary went in, clearly defined inputs, outputs, and a way to define your test runner as a dev-dependency, combined with proc macros that can keep state across calls seems like all you really need.
Again, to reiterate, I would love for someone to port py.test to rust, but that isn’t intended as stop energy towards this proposal.

I’m extremely excited to see progress in this area, thanks for all the work!

gnzlbg · December 12, 2017, 7:41am

Nice write up!

One question: by setup/teardown do you also mean process setup/teardown? It is not 100% clear to me whether this is what you mean

One thought, a lot of people seem to only want to customize the output format, so maybe we can have an API for plugin in a different test or bench runners but they all produce JSON, and we have a different API for pluggin in a formatter (so the JSON formatter does nothing, but we can have a formatter to human readable text that transform JSON to the output we have today). That way, those who want a specific let’s say XML output for some CI system, or maybe even a different type of JSON for an IDE, only have to plug in a relatively small component, and that component would work with all test runners that users might want to use.

This might be overengineering, but it at least makes sense to me that “how to run test” and “how to display test results” are two different orthogonal problems.

matthieum · December 14, 2017, 5:43pm

@matklad It seems to me that the py.test “features” could be used not only for DI, but also to inject different values (for parameterization) or am I mistaken?

Is it possible in py.test to get something like:

 #[test_fixture]
 fn make_length(index: usize) -> usize {
     1usize << index
 }

 #[test(make_length(0..4))]
 fn test_lengths(make_length: usize) { ... }

And have test_lengths be invoked with multiple lengths?

Could it possibly be adapted to type parameterization?

 #[test_types]
 type Unsigned = (u8, u16, u32, u64, u128);

 #[test_fixture]
 fn make_type<T: Default>() -> T { T::default() }

 #[test(make_type(Unsigned))]
 fn test_type<T>(t: T) { ... }

jan_hudec · December 14, 2017, 7:49pm

THIS. The macro #[test] is special, but this is really the only bit that can't be reproduced with macros 2.0. So if we fill in that bit, we can go wild prototyping better test suites as crates without having to touch the compiler.

Actually, it would be simpler to create a registry that would be accessible at runtime. That way there would be no need to define dependencies between macros and running them in multiple passes. The compiler would put the items aside, the linker would collect them and the test runner would simply iterate an array.

I would suggest something like this:

First a registry array would be defined. Something like:

#[registry]
static test_functions: &[&fn()];

and then each function would be annotated like:

#[register_to(test_functions)]
fn test_function() { … }

And the compiler and linker would conspire to simply make the test_functions be an array of all the appropriately annotated test functions. A code in main would then simply iterate that array.

Now in crate this should be simple to implement as the Rust compiler does have all of the information at some point. But

I would actually see some uses for this being cross-crate global. Even that can be done. On ELF platforms it can be done by putting the array members (references to the annotated items) to a special section and having the linker collect them. And on platforms that don't have special sections, it can still be done using the mechanism provided for C++ static constructors, i.e. there would be a bit of code before main that would collect the pointers. It would mean some code would run before main, but it would be implementation detail hidden in the standard library, so it should still not cause the sort of trouble the fully general mechanism causes for C++.

I even think that the mechanism could be prototyped currently with macros and the cpp crate.

Note that this tool would also be reused for things like the output formatters. The

could similarly desugar to #[register_to(test_runners)] and then the main would look in the test_runners array and pick one by comparing something in it to a command-line option, falling back to either first one or default. If the mechanism was global (i.e. if the attribute permitted qualified name like #[register_to(::test::runners)]), merely adding the formatter crate to Cargo.toml would make the formatter available for selecting.

Topic		Replies	Views
Proposal for Custom Test Framework support language design	10	1368	March 25, 2019
A path forward towards re-usable libtest functionality, custom_test_frameworks, and a stable #[bench] macro libs	12	1987	April 17, 2019
Path for stabilizing libtest's json output? tools and infrastructure	8	717	April 19, 2024
#[bench] status libs	12	7781	March 25, 2019
Final async/await testing push announcements	1	6818	September 16, 2019