Past, present, and future for Rust testing

I’m not really up to date on the discussion, so I’ll hold off my comments for now. Here are some of my notes on the topic, though:

@killercup Regarding dynamic test generation, do you want the TestHarness thing to be something we provide, or something some custom framework provides?

Because if we take the generic approach and make output formatters into a crate then you can easily do this with a custom test framework. Would be good to have a reference impl since that pattern is so common.

Do you mean the “test harness” or TestRegistry, @Manishearth? Let me try be a bit more specific in what I imagine we could do (this is just an API straw-man of course, open to discussion).

The test harness is what the test framework generates and the observable interface a test binary exposes – implemented as its main function for example. Its job is to start the test runner that executes your tests in the correct way (with necessary setup, isolation, order, iterations, etc.).

Aside: I propose that the test runner’s output is initially a stream of JSON documents (that cargo test can consume and either output verbatim or in a human-readable manner).

I expect that specific TestRegistry type in my example to be something the test framework provides as well. Its basically meant as a way to interact with the test runner at test-run-time.

TestRegistry, sorry.

I expect that specific TestRegistry type in my example to be something the test framework provides as well. Its basically meant as a way to interact with the test runner at test-run-time.

Cool, this makes sense.

Aside: I propose that the test runner’s output is initially a stream of JSON documents (that cargo test can consume and either output verbatim or in a human-readable manner).

I'm a bit wary of this; we don't know what kind of flexibility folks will want here. We could perhaps try to figure this through the erfc, but really I think we should leave this unspecified and try to iterate on standardized output formats as a library instead (which we also use in libtest)

For example, benches and cargo-fuzz will not want the same output format as this. cargo-fuzz can't work at all if the output is constrained.

It’s worth discussing trait level design-by-contract or trait integration test consisting of two features:

  • permit #[test] on inherent and trait methods that do not require arguments and
  • provide a #[required_of_client] attribute for trait impls.

Right now, you could add macros that generate test code to test trait impls, but they must be called manually. If you wanted to enforce it, then you’d need something more:

pub trait Foo {
    fn foo(&self);
}

#[cfg(test)]
pub trait FooTester : Foo {
    #[cfg(test)]
    fn test_new() -> Box<Self>;
}

#[cfg(test)]
mod test {

#[require_of_client]
impl<F: Foo> FooTester for F;

impl<F: Foo> F {
    #[test]
    fn do_foo_test() {
        let me = test_new();
        ...
    }
}

} // mod test

In this, the #[require_of_client] demands the impl exist but does not provide test_new itself. Also F::do_foo_test() is an inherent method for any Foo that only exists here and becomes a test.

You might need the #[test] to appear inside the FooTester trait sometimes trait too. I used Box<Self> here so that test_new need not be Sized, but maybe some use cases still need it, like tests in no_std.

This doesn’t seem to have come up, so I want to mention that the biggest features I find to be missing from the existing test support are “expected failure” and “not supported” test outcomes.

“Expected failure” means “this test has indeed failed, but it is due to a known bug which will be fixed in due course.”

“Not supported” means “due to details of the environment in which the test is being run, we can’t even try to test this particular thing.”

It needs to be possible to produce both of these outcomes from code, within or adjacent to the test function; in sufficiently gory situations it may even be necessary to inspect the details of a failure before deciding whether it is due to a known bug or environmental problem. Declarative annotations are desirable for the easy cases, but they are not enough.

@JeffBurdges and @zackw the issues you raise are important ones, but ones that we hope for individual testing frameworks to solve. In a sense, the goal is to write infrastructure such that people can develop good testing frameworks with features like the ones you mention, rather than building them into Rust. Essentially, to do testing you’d pick a testing crate you liked, and stick that in your Cargo.toml somewhere, and then you could use whatever features that testing framework supports (such as #[allow_fail], #[expect_fail], whatever else you might want).

“Leave it to the crates” doesn’t entirely work for expected failure and unsupported states, because they are part of the protocol between the “test harness” and the “test formatter”. That protocol has to be standardized and there’s only one place to do it.

A lot of the details, like how you mark a test unsupported and how flexibly and granularly you can do that, can be left to the crates, though.

It’s looking like custom test frameworks will be handled in two steps – one where we just introduce how you have alternative execution contexts (that is, providing a feature that allows effectively allows depending on crates that introduce new “special” main() functions that operates on your code in interesting ways), and a later one that specifies a protocol between test runners specifically and a standardized test output formatter. These two are fairly orthogonal, except in that the latter will be much easier to design and implement given the former.

All that said, you are completely right that expected failures is something that such a protocol would need to include. I think (?) @killercup is writing up a proposal for a testing protocol separately from the custom test framework RFC, so he may be able to provide more insight there.

Yes, if #[test] becomes in effect a proc macro then yes I guess a testing crate could easily support methods.

I’d think #[requite_of_client] is a pure trait system feature that goes nowhere near crate controlled build logic though, but I suppose impl<F: Foo> FooTester { fn test_new() -> Box<Self> { panic!() } } comes close enough, provided you can enforce running it.

For those interested, my new proposal that is based on @Manishearth’s is here: https://github.com/jonhoo/rfcs/blob/alt-exec-context/text/0000-alternative-execution-contexts.md

Stabilizing a small and limited proc-macro (akin to custom-derive) is the right way to go, I think. This is just about the most minimal stable-API surface I can think of that does what we need. Then these good ideas about testing from other commentators can be implemented out-of-tree without needing to go through the whole RFC process - and, importantly, without needing to delay the implementation of this feature.

Regarding the questions:

  • I think the compiler would have to pass a path as well as the token stream. Only alternative I can think of is that the compiler could put these annotated functions in a special namespace, but that seems too magical to me.
  • Visibility is tricky. I see two options - either we require (and add a lint to remind users) that all ancestors of a marked element should be at least pub(crate) or the compiler could ignore visibility rules in this case. It must already be doing that for #[test], yes? I tend to favor the latter approach because it means less hassle for users and less change to how users have already learned to expect #[test] and #[bench] to behave.
  • I don’t have a strong opinion on cfg(test) vs other config features.

Thumbs-up from me.

1 Like

Summarizing the differences between my proposal and Jon’s:

  • Jon’s base API is &mut [TokenStream] -> TokenStream, mine is TokenStream -> TokenStream. His API pre-includes the fact that it will collect test attrs and take the output to replace main(); mine assumes nothing and instead allows for a Rust-maintained helper crate
  • The cargo integration is mostly the same, except for some minor points. All of these points can be worked into the other proposal:
    • Jon’s cargo stuff allows for multiple test frameworks (execution contexts) in a single crate, and for users to select between the two. Mine doesn’t say anything about this but implicitly allows them, however users get both foisted upon them which will be problematic since there’s no way of separating them in the target declaration.
    • My cargo stuff allows for frameworks to declare that they do not run in “lib mode”/“unit test mode”. Jon’s doesn’t. I think this is important for frameworks like fuzzers.
    • Jon’s proposal doesn’t talk about how [[test]] can be generalized. But my proposal for that is compatible.
    • Jon’s proposal allows for multiple folders, mine doesn’t.
  • The exact syntax and stuff of the cargo integration otherwise differs a lot, but IMO the exact syntax is minor and we can go either way
  • Jon calls them “execution contexts”, I call them “test frameworks”. I kinda like Jon’s name though it may confuse folks. Not that “test frameworks” isn’t confusing (benches and fuzzes aren’t tests).

I still prefer the “stabilize the simplest thing and provide everything else as a helper crate” path. This is exactly what was done with proc-macros, and we should do this here as well. The more stable APIs we have to maintain the worse it is, and TokenStream -> TokenStream is significantly simpler (from the POV of the complexity of the internal consumer), whereas the &mut [TokenStream] -> TokenStream has a lot more internal complexity to maintain as stable; and if we need to make this flexible in the future we may not be able to without introducing new APIs. For example, one use case not covered would be ones where you need to keep the old main around as well, e.g. cases where you want to do some whole-binary instrumentation and some setup/teardown (so you still call main(), but you rename it and wrap it in your actual main function). With the simple API case, this option is still available to people, though they may need to write their own clean_entry_point-like thing.

However, the cfg() story does become easier with Jon’s proposal. In my proposal testing crates may have to explicitly declare what attributes to cfg out in their Cargo.toml. Which isn’t too bad imo; it would be nice to have such a feature for regular proc macros too.

The RFC as specified doesn't actually require that the output replaces main(). In fact, it could just as easily keep its defined name, and that way could call a user-defined main() if so desired. I believe this also address the later point you make about wanting to wrap a user main().

I'm not exactly sure why you think they won't work in that mode? Is there a reason to preclude that use-case? Even a fuzzer could be applied the lib case, you just wouln't have all that many things tagged as #[fuzz]?

I may be misunderstanding, but I believe the execution context sets do the same thing?

Although @Centril pointed out just now that it'll be a little tedious to always specify the folder if you generally want the same folder for, say, benchmarks, in most cases. It'd be totally reasonable for the execution context to also define its default folders in some way. This is a minor point though.

Agreed.

I'm not sure I understand? Benches and fuzzers aren't tests, and therefore I'd argue that it is confusing to call them test frameworks. Which execution contexts avoids?

I'm having a hard time making up my mind about this one. While it is true that conceptually we are "supporting less" by providing a full-crate rewrite macro, I think it's also exposing a very wide API that we'll then need to deal with for the foreseeable future. This is a very powerful form of macro, that will enable people to essentially come up with their own "version" of Rust that looks completely different. Maybe this is a good thing, but I'm not so sure. We really don't want to end up in the world of crates choosing their own Rust flavors with vastly different syntax, as it starts to split the ecosystem. I agree that my proposed rewrite is much more limited, but it still sufficiently flexible to support all the use-cases brought up so far in this thread (I think), while not just saying "do whatever you want!". Ultimately, I wonder if the lang team might want to have a gander at this too, but regardless I think we should think carefully about how wide of an API it's a good idea to expose.

That all said, this is (for now) an eRFC, so picking one and moving forward is probably fine no matter which one we pick. They will both require mostly the same components.

1 Like

Oh, also, another missing thing is path handling; the API needs to provide paths for each TokenStream. In my helper crate I handle that by passing in the path to fold_function (or fold_item)

Yup. I think the list of TokenStreams provided to my proposed function would also need to be given a list of paths.

1 Like

You missed a negative, I said "not that 'test frameworks' is not confusing", i.e. "test frameworks" is confusing.

Agreed.

The RFC as specified doesn’t actually require that the output replaces main(). In fact, it could just as easily keep its defined name, and that way could call a user-defined main() if so desired. I believe this also address the later point you make about wanting to wrap a user main().

You need to pass in the existing main somewhere? Because if you intend to wrap main you need to rename it.

This is a very powerful form of macro

It's not very different from using proc macros on entire modules.

We really don’t want to end up in the world of crates choosing their own Rust flavors with vastly different syntax, as it starts to split the ecosystem.

I do not have this fear, really. You already can do custom DSL stuff via proc macros or bang macros. And the stuff as currently defined only lets you apply a macro to the topmost crate as part of cargo test (or cargo execute); this has no impact on cargo build, and definitely no impact on the ecosystem because you can't apply these macros to crates that are dependencies.

Also nobody will want to rewrite all the test-collecting infra, so the helper crate will become the de facto API anyway (this is already the case with syn, for example), except it need not be maintained as a stable-forever crate which is great.

That all said, this is (for now) and eRFC, so picking one and moving forward is probably fine no matter which one we pick. They will both require mostly the same components.

Fair. From that POV too I'd prefer to settle on the more generic API, so that we can experiment on the specific API in the helper crate, and if we really like it we can try and stabilize that API instead.

I really like @jonhoo’s proposal; It may be limited, but still has a lot of flexibility and is user friendly to write runners with and to use for runner-users. I think supporting good defaults and especially supporting multiple folders is super important wherefore I’m for @jonhoo’s proposal currently.

FWIW I’m going to include multiple folders in my proposal as well. All the stuff from the “cargo integration” section is probably stuff we’ll include in both proposals.

The main difference is that I provide a very simple API and provide the fleshed out API in a helper crate, whereas Jon’s proposal provides a restricted API kind of in between the two directly.

2 Likes

On discussing more with @jonhoo I think allowing crates to opt-out of unit-testing mode isn’t necessary. I still think it’s useful since it might make “test sets” work well, but that’s no big deal. We think that allowing frameworks to specify if they can be run all at once or if you must specify a single target might be helpful.

Jon also agrees that their proposal needs something mapping to [[test]]. Neither of us care about the specifics i think.

So with that (i.e. including potential changes we seem okay about), both of our proposals are identical, aside from the “what API do we expose” question, and bikesheddable syntax which I think neither of us care about much.

2 Likes