Past, present, and future for Rust testing

Manishearth · January 23, 2018, 4:48am

We’re going to be discussing this today at the devtools meeting anyway, but I think I prefer the specifics of Jon’s cargo proposal (the syntax, i mean), modified so that they can “specify if they can be run all at once or if you must specify a single target might be helpful.”.

Not 100% sold on the execution-contexts name, but not sold on “testing frameworks” either.

Maybe “post build somethingortheother”? [postbuild.context.test], etc.

nrc · January 23, 2018, 5:00am

I think I like the general gist of the proposals, but I have some qualms. I don’t want to add a whole new category of ‘thing’ here, in particular any kind of special case proc macro seems like a really bad idea - not least because proc macros are very unstable.

I think that it also puts the test framework at kind of the wrong level of integration - it feels like it is extra-linguistic, when it could be more part of the language.

I haven’t had time to work through any ideas seriously, so I apologise for the half-formed suggestions, hopefully we can integrate into the existing proposals.

I imagine that a test framework should be a crate, and that a client crate which wants to use it would have it as a dev-dependency in Cargo and a #[cfg(test)] extern crate in the Rust code. That test crate can contain macros, including attribute macros which the client can use. Thus any program transformation is done entirely like usual macro expansion.

We extend the test attribute to allow is to be used more flexibly, e.g., #[test::foo], #[test(...)], etc.). The expectation is that the test framework macros output items marked with test attributes (I realise we want to apply to things which are not strictly tests, but they are free to use their own attributes and expand to test, test doesn’t have to be user-facing).

The test runner itself is indicated by a trait, the trait is a lang item in core or std or wherever. Any type which implements that trait is treated as a test runner by the compiler. There might be multiple test runners for any crate. When the compiler is building in test mode then any test runners in any dependent crates are given (to a run_tests method) a data structure representing all items marked with any test attribute in the crate being tested. Strawman for that: Vec where

struct TestFixture<T, U> {
    tokens: TokenStream,
    span: Span,
    attributes: TokenStream,
    fn_ptr: Option<Fn(T) -> U>,
}

run_tests also takes some context object which it can use for emitting test events. I think that run_tests is the only method a test runner trait would need.

If TestFixture is a function and has the signature T -> U then the function pointer is in the fn_ptr field. That gives the test runner a way to run test functions. T and U are associated types in the TestRunner trait.

I believe that this approach doesn’t need any Cargo integration, since running the compiler in the same way as cargo test today would run any test frameworks which are depended upon. However, we probably want to add some convenience arguments for running only one test runner, etc.

Appendix

Test runner trait

trait TestRunner {
    type TestArg;
    type TestResult;

    fn run_tests(tests: &[TestFixture], context: &mut TestContext);
}

Manishearth · January 23, 2018, 5:24am

This kind of API doesn’t help cases like quickcheck and fuzzing, where tests may take different arguments.

Also, cargo integration is necessary – your current proposal handles it for writing tests inline, but that’s not everything. In particular, the ability to write multiple targets in a tests/benches/examples folder is exclusive to tests. When we needed this kind of functionality for cargo-fuzz we had to hack it together by creating a dummy crate and requiring users to explicitly do boilerplate generation. There has to be a way to extend this for your own kind of target; i.e. one should be able to have fuzzes/ or criterion-benches/.

I don’t think proc macros being unstable is that much of an issue here. Both proposed APIs do not overstep the bounds of what is already stable as custom derive, and I’m fine with waiting this out as we stabilize proc macros. In my proposal this is exactly the same format as attr proc macros, except attr proc macros need to be explicitly applied.

nrc · January 23, 2018, 7:21am

This kind of API doesn’t help cases like quickcheck and fuzzing, where tests may take different arguments

True, but I imagine that there must be an encoding into trait objects or something so that that can be handled - the question is whether that encoding is more effort (for test framework authors) than having to handle everything in a proc macro style

Also, cargo integration is necessary – your current proposal handles it for writing tests inline, but that’s not everything.

I wonder if we could restrict the Cargo integration to Cargo-run tests? It seems like we're leaking across abstraction boundaries by making Cargo aware of how rustc is building tests.

I don’t think proc macros being unstable is that much of an issue here

I object to having a new 'kind' of proc macro, when the current system is good enough for the 'expansion' stage of the test framework. The hard bit is making the test framework aware of the test inputs, but that seems unrelated to the expansion - ideally I'd like the 'running' step to be entirely free of macro-like stuff (i.e., no TokenStreams), but I don't see a way that can be done.

Manishearth · January 23, 2018, 7:37am

I'm not sure what you mean here?

I don't really see one, but I haven't thought of this that much either.

Note that for most framework authors the proc macro style will be quite simple, it will mostly just be "collect all the paths using TestCollector, and then generate a main function calling them". The main function generation can be done through the simple application of a quote macro. We can provide additional helpers that make this easy, e.g. providing an API like the one you envision.

I think if the question is effort, the helper crate solution gives us a lot more flexibility to provide semver-stable (i.e. can be changed by bumping the crate version) APIs at different levels that trade off various levels of control and effort; as opposed to stabilizing one restrictive way of doing it.

I dispute this

jonhoo · January 23, 2018, 2:56pm

Went to bed and then this discussion happened

Instead of answering directly to the arguments above, let me try a higher-level response. First, I agree with @nrc that there’s an advantage to keeping this proposed new feature as “uninteresting” as possible. If we can avoid introducing new concepts and powerful generalizations, that’d be good, as it means we can do more targeted experimentation in the eRFC context. However, as @Manishearth argued above, and as it seems @nrc sort of conceded to, we will likely need at least some new mechanism to be able to develop something like #[test] out-of-tree.

I believe @nrc’s original proposal is actually not that far from mine. Specifically, if we make the #[execution_context] function take a Vec<ExecutionContextAnnotatedItem> (where ExecutionContextAnnotatedItem is basically @nrc’s TestFixture, but with a more general name), then the interface becomes a lot simpler. Basically, an execution context is allowed to gather up annotated items, and produce a main(). If it wants to modify source trees, it would do so with proc macros separately. Unfortunately, I don’t believe @nrc’s proposed TestFixture is quite sufficient. In particular, the Option<Fn(...)> is a neat idea, but it limits the generated main() to calling functions; it wouldn’t be able to, say refer to an annotated static variable, an annotated module or type, etc. This is why I believe the path would also need to be exposed.

On a somewhat orthogonal note, I think Option<Fn()> would be sufficient; T and U shouldn’t be required to be a single type for a given runner, and arguments + returns could be faked with thread locals. It’s a little bit ugly, but better than the single argument type + single return type restriction.

As for integration with Cargo, I’m with @Manishearth. I don’t think the current proposals leak too much between rustc and cargo. It is true that rustc will need to “integrate” the chosen execution context implementation with the crate-under-test, but that’s about it as far as I can tell? I also think this integration is the key to making this all work nicely. I also think, as I’ve outlined before, that we shouldn’t think specifically about this as being just for tests.

Manishearth · January 23, 2018, 3:08pm

That ... that doesn't sound like a clean API . Also I'm not quite sure how that would work either, I can't seem to come up with a way to make that work that doesn't involve making the user reimplement a bunch of stuff.

jonhoo · January 23, 2018, 3:14pm

Yeah, I agree. Paths would be better. I updated my proposed RFC to address some points from this discussion in 106b2c7bc1ba9011cba67f17cf7bb242c12cdde1.

jonhoo · January 23, 2018, 6:49pm

Minutes from today’s dev-tools meeting on testing @ https://github.com/nrc/dev-tools-team/blob/master/minutes/meeting-notes-2018-01-23.md

Manishearth · January 25, 2018, 9:29am

@jonhoo Just realized, your RFC merges the dependencies of execution contexts with dev-dependencies. We can’t do this; we need a separation between compile time (build-dependencies) and runtime dependencies (especially in the presence of cross-compilation, though that’s not so important for execution contexts)

Also, for cargo-fuzz ideally we need the ability to insert a #![no_main], which cannot be done with the restricted API

Manishearth · January 25, 2018, 9:44am

Also realized that the way you pick and choose contexts from a crate providing multiple contexts is incompatible with the current design which has a with_attrs block. I changed the syntax to #[post_build_context(test, attributes(foo, bar))], where the first thing is the context name (and attribute), and the rest are extra attributes. This is more in line with how custom derive works.

Manishearth · January 25, 2018, 10:15am

Incorporating feedback from the meeting, the working draft is https://github.com/Manishearth/rfcs/blob/post-build-contexts/text/0000-erfc-post-build-contexts.md

Will be posted in a few days, once the dev tools team has put in any final feedback.

Manishearth · January 25, 2018, 12:02pm

Come to think of it “post build context” can be misleading, but “execution context” doesn’t really explain what it’s about either. Alternate name ideas?

Centril · January 25, 2018, 1:07pm

Btw, you are using the old RFC format in the draft =)

Centril · January 25, 2018, 1:19pm

I think the problem is that anything other than post build will be a bad common term for some sort of post build context that goes out of the norm of test/verify/benchmarking… You could order those three under “quality assurance” but that is less apt than post build, so I think that is a good name for now - we can certainly iterate on that after implementing the eRFC and before stabilization.

Manishearth · January 25, 2018, 1:27pm

No, like I responded to your other comment about this, that's on purpose. It says:

(As an eRFC I'm excluding the "how to teach this" for now; when we have more concrete ideas we can figure out how to frame it.)

"how to teach this" is another name for the "guide-level explanation" section (it's the older name of that section).

So instead of having a guide-level explanation / reference explanation split, I'm having a "detailed design" that's a bit of both with a lot of gaps. eRFCs cannot be actually put on the path for stabilization till a second, complete RFC is merged. That RFC will have all the sections.

jonhoo · January 25, 2018, 5:07pm

I don't understand why that can't be done? Surely it makes sense that the dependencies (i.e., runtime dependencies) of a context must be used when compiling a binary based on it? And so the right thing to do is combine those deps with the dev-dependencies + dependencies of the crate-under test (since that crate is being "tested")?

Why is that necessary? We could always get around this with another Cargo.toml variable.

I don't think that's true? The name of the context is the name of the annotated function. There is no need to also give a name in the annotation. I also don't see why we need a "primary" attribute for a context?

I still don't understand how "post build" is accurate? It doesn't actually happen after a normal build, but rather instead of a normal build. What specifically is the objection to execution context (a way in which to execute the crate)?

From your e-mail:

Why do you believe this is desireable/necessary? I'm not directly opposed to it, I just don't see the point?

Didn't we come to the conclusion that this probably isn't necessary nor sufficient? The flag we would need is one that says "can only execute one thing".

I don't see how there is no ambiguity even with a single folder? For example, the user could declare two targets that both have the same folder? Either we check for and disallow conflicts, in which case specifying multiple is fine, or we need to support having multiple contexts for the same set of files (which, tbh, I think is totally doable).

Manishearth · January 25, 2018, 5:41pm

No? The dependencies will be compile-time. Your context is a crate that works on token streams, so it will have compile-time dependencies like syn. On the other hand, the generated harness will have dependencies like libfuzzer-sys and whatever we use for output formatting. These are different things. Your harness has no need for syn, and your harness-generation-macro has no need for libfuzzer-sys.

For example, the thing that generates the test harness is libsyntax, but the harness itself has dependencies in libtest. Different things.

It's necessary because libFuzzer replaces main and takes over the program. It feels weird to split the codegen across cargo flags and a proc macro; it would make more sense to provide a turn_off_main() function to the proc macro but this is exactly the kind of API surface expansion I was worried about and hence I proposed a whole-crate proc macro. I'm fine with not exposing this functionality and instead having cargo-fuzz users write it explicitly, but I'm not fine with putting codegen modifying flags in Cargo.toml.

oh it's the name of the annotated function. I hadn't picked up on that. My bad. Will swap back; though I'll use attributes not with_attrs to match proc_macro.

It's "post build" in the CI pipeline sense; tests/benches/fuzzing are considered "post build" because you do them after a successful build.

Yes, under the hood it is a different way of executing things but I suspect that's not the model people have of cargo test and cargo bench.

Alignment with proc macros, which do the same thing. It also signals to cargo that this crate should not be cross compiled (otherwise you get weird errors).

Yeah, I did that bit in a hurry. My bad. I forgot the "not sufficient" part.

You misunderstand, I removed the ability to specify multiple folders for a context, not multiple contexts for a folder. The ambiguity is that if you have the same file in both you no longer can pick one.

I also didn't find it necessary, really, if you really need multiple folders using the same context import the context twice. And I don't think you'd really need that.

jonhoo · January 25, 2018, 7:43pm

I'm proposing that the harness dependencies are merged with the crate-under-test dev-dependencies?

But that's always the intention of a context? It should always not use any user-defined main, and use whatever main is defined by the TokenStream returned by the context function. If that TokenStream does not contain a main, the crate would not have a main (as desired for libFuzzer).

So, I'm not sure I agree here. First, it doesn't actually follow a regular build, at least not in the lib case. It's instead of a regular build. For things like tests/ where it does follow a regular build, I still don't think people would think of it as a "normal build" having happened. A normal build to me would be building everything as it is in my source tree (e.g., including any binaries). But maybe this is just overly bikeshedd-y. How about just "build contexts"?

So we're changing from apply-to-lib to something like never_exits? I also think this annotation has to be on each context, not on the crate as a whole. For example, a crate could define two context functions, one which generates a main that never terminates, and one that does not.

I don't understand. That is how I understood you, but I don't see what you mean by "if you have the same file in both you no longer can pick one". If the user's Cargo.toml lists, e.g., both slow_tests/ and fast_tests/ for a given context, that seems perfectly fine? How could the two folders have the same file? And even if they did through a hardlink or something, why would that be a problem?

I'm not sure I fully understand the objection here. The proposed feature is to let you have some context that will execute files in multiple directories, rather than restricting it to a single one. So that you can, for example, split your test files into multiple directories for organizational purposes, while still using the same context for all of them.

fitzgen · January 25, 2018, 10:08pm

Regarding dependencies:

The way I see it, there are two new kinds of dependencies:

I am crate_under_test and I depend on the context_expander proc macro crate to emit some context’s test/bench/whatever code
I am context_expander's emitted code, and I depend on context_runtime.

This split is analogous to something like lalrpop (build time parser generator) and larlpop_util (runtime functions used by the parser generator’s emitted code).

For (1), I think we want the crate_under_test to have something like what Manish has illustrated in the latest revision of the eRFC:

[post-build.context.test]
provider = { context_expander = "1.0", context = "test" }
folder = "tests/"

For (2) it isn’t clear to me which crate’s Cargo.toml this dependency information should be declared in:

On the one hand, with custom derives today, if the custom derive emits code that depends on some crate, then the crate_using_custom_derive needs to add that dependency to its Cargo.toml.
```
// crate_under_test/Cargo.toml
[dev-dependencies.context.test] // or whatever section
context_runtime = "0.1.0"
```
On the other hand, it is the context_expander crate that knows which crates its emitted code depends upon. It seems like it would be easier for the context_expander crate to maintain the dependency list, since it is more familiar with them, than for crate_under_test. However, this is unlike the dependency declarations with custom derives.
```
// context_expander/Cargo.toml
[emitted-dependencies]
context_runtime = "0.1.0"
```

Regarding `compiletest`:

Although compiletest is explicitly called out in the eRFC, it wasn’t clear to me how it would be supported, since it requires allowing code to fail to compiler. But because compiletest is explicitly called out in the eRFC, I think it is important to have an idea of how it would work here.

Since this eRFC proposes append-only tokens, a naive implementation would not work because appending more tokens to some code that fails to compile does not guarantee the resulting code will compile.

After discussing this with @jonhoo on irc, we came up with:

Write the compiletest macros within a proc macro:
```
compiletest! {
    ...
}
```

This proc macro writes the ... to an external file, and replaces it with this in the source:

#[context(compiletest)] // or whatever attrs we end up using
const COMPILE_TEST_CASE_1234: &'static str = "path/to/the/external/file";
// alternatively, pretend that `Path::new` is `const
// const COMPILE_TEST_CASE_1234: &'static Path = Path::new("...");

Finally, the compiletest context expander crate knows how to gather all these external file paths and do the right thing with them.

Topic		Replies	Views
Proposal for Custom Test Framework support language design	10	1364	March 25, 2019
A path forward towards re-usable libtest functionality, custom_test_frameworks, and a stable #[bench] macro libs	12	1972	April 17, 2019
#[bench] status libs	12	7759	March 25, 2019
#[test] and external test harnesses language design	22	10491	March 25, 2019
Towards a Stable Compiler API and Custom Lints	28	2448	November 3, 2021

Past, present, and future for Rust testing

Appendix

Regarding dependencies:

Regarding compiletest:

Related topics

Regarding `compiletest`: