Finer-grained way to choose which unit-tests are compiled


#1

“Internal” unit tests have, in some cases, many usability advantages over “external” unit tests, the main one for me personally being that they can be generated from macros and procedural macros, to directly insert test for the functionality being added that exercise the actual types. For example, impl_x! derive(Foo) can insert #[cfg(test)] automatically.

With cargo test pattern one can control which unit tests are run, but not which tests are compiled. This can introduce compilation time problems if a crate has many unit tests.

The current ways of controlling which tests are compiled have big downsides:

  • split tests into different files: tests/subset_a.rs - this has the downside that the implementation and the tests become separated and that crates might need to export macros just for testing purposes.

  • add feature flags, and #[cfg(all(test, feature = "sub_tests_A"))] instead of just #[cfg(test)]: this is problematic because one cannot add “dev” features, so these testing-only features become part of the API of a crate. They also recompile the whole crate every time, even though only the tests must be recompiled when they change.

  • use a cfg flag #[cfg(test_subset_A)] passed via RUSTFLAGS="--cfg test_subset_A" cargo test. Same problems as above, but worse, because now all crates in the dependency graph of the dev target must be recompiled.

I would like to be able to control, in a finer-grained way, which internal tests are compiled and run. and run all tests of a particular submodule, or of a submodule or its submodules.


#2

Interesting! On the surface of it, this looks like a hard problem to solve: there’s little difference between the tests and the other code. For example, there are often #[cfg(test)] modules, which contain helper structs, methods, etc. And selectively compiling test modules looks pretty much like selectively compiling any modules!

To put it in other way, the distinguishing feature of the “unit” tests is that they are in the same translation unit as code. This explains why “they also recompile the whole crate every time, even though only the tests must be recompiled when they change”. Looks like the proper solution here is incremental compilation?

This can introduce compilation time problems if a crate has many unit tests.

It would be interesting to see how large is “time to compile tests” relative to “time to compile code”. I would naively assume that the majority of compilation time for cargo test --lib --no-run will be spent compiling the actual code, and not the tests.


#3

Check out the stdsimd crate, cd stdsimd/crates/stdsimd, time cargo build, time cargo test.

I am working on the third workaround to this problem: splitting the test TU intro crates using #[path =...] + macro hacks to compile the tests only on these crates and not the main one. We’ll see if that works.

Looks like the proper solution here is incremental compilation?

Incremental compilation actually makes this way worse! Often I need to cargo clean on small edits because with incremental compilation… compilation never finishes.

I suspect that compiling the crates requires a lot of memory, and incremental compilation makes this much much worse @michaelwoerister


#4

So, the times don’t seem to differ much for me. Though, I’ve measured a slightly different thing, to make sure I don’t count the dependencies and integration tests:

# all in the root folder of repository
$ cargo build && cargo test --no-run # to make sure we've fetched everything from the network

$ rm -rf target/debug/incremental target/debug/deps/*simd* # clean everything except the local crate
$ time cargo build --package stdsimd
   Compiling coresimd v0.0.3 (file:///home/matklad/projects/stdsimd/crates/coresimd)
   Compiling stdsimd v0.0.3 (file:///home/matklad/projects/stdsimd/crates/stdsimd)
    Finished dev [unoptimized + debuginfo] target(s) in 12.37 secs
cargo build --package stdsimd  12.57s user 0.30s system 102% cpu 12.537 total

$ rm -rf target/debug/incremental target/debug/deps/*simd* # clean everything except the local crate
$ time cargo test --no-run --lib --package stdsimd 
   Compiling coresimd v0.0.3 (file:///home/matklad/projects/stdsimd/crates/coresimd)
   Compiling stdsimd v0.0.3 (file:///home/matklad/projects/stdsimd/crates/stdsimd)
    Finished dev [unoptimized + debuginfo] target(s) in 12.90 secs
cargo test --no-run --lib --package stdsimd  13.07s user 0.38s system 102% cpu 13.078 total

Am I measuring the right thing?

EDIT: perhaps, I should be checking the coresimd and not stdsimd?

EDIT: I see it now, hehe :slight_smile:

λ time cargo build --lib --package coresimd                                                                                ~/projects/stdsimd
   Compiling coresimd v0.0.3 (file:///home/matklad/projects/stdsimd/crates/coresimd)
    Finished dev [unoptimized + debuginfo] target(s) in 12.9 secs
cargo build --lib --package coresimd  12.23s user 0.29s system 102% cpu 12.252 total

λ time cargo test --no-run --lib --package coresimd                                                                        ~/projects/stdsimd
   Compiling coresimd v0.0.3 (file:///home/matklad/projects/stdsimd/crates/coresimd)
   Compiling simd-test-macro v0.1.0 (file:///home/matklad/projects/stdsimd/crates/simd-test-macro)
   Compiling stdsimd-test v0.1.0 (file:///home/matklad/projects/stdsimd/crates/stdsimd-test)
   Compiling stdsimd v0.0.3 (file:///home/matklad/projects/stdsimd/crates/stdsimd)
    Finished dev [unoptimized + debuginfo] target(s) in 208.79 secs
cargo test --no-run --lib --package coresimd  226.25s user 4.91s system 110% cpu 3:28.96 total


#5

Yeah you weren’t building any tests, use -p coresimd -p stdsimd --manifest-path=crates/stdsimd/Cargo.toml, or as I suggested initially just cd crates/coresimd and cargo build vs cargo test in there.

The way the crate is organized is a bit… special, because it must work as a standalone crate as well as be able to compile inside core:: and std::.

EDIT: FWIW this was the second attempt: https://github.com/rust-lang-nursery/stdsimd/pull/376. You can check out the changes in ci/run.sh to get an idea of what it does.

EDIT2: 200s vs 10s sounds about right. It is basically the difference between I can work productively and “this is no fun at all”. But the problem is the one you mention: everything is in a single TU. Keeping the code together while splitting the test into different TUs is doable (I did it in that PR), but it is not doable in a sane way.

Something like “test-only” features -> #[cfg(test = "foo")] would make that approach “cleaner”.


#6

Interesting! So it seems like another important thing in play here are macros. Specifically, the fact that a single macro generates both the test as well as implementation, which forces the tests to be in the same TU.

Just to through an idea into the mix, perhaps it’s possible to use some sort of code-gen, that doesn’t have a single-TU constraint? For example, a python script could find all simd_f_ty! invocations, and, for each invocation, generate a corresponding simd_f_ty_test! invocation inside some tests/foo.rs file?


#7

Yes, this is the source of the issue. As mention in the OP:

“Internal” unit tests have, in some cases, many usability advantages over “external” unit tests, the main one for me personally being that they can be generated from macros and procedural macros, to directly insert test for the functionality being added that exercise the actual types. For example, impl_x! derive(Foo) can insert #[cfg(test)] automatically.

Keeping the tests and the code close to each other is a property that I’d like to maintain.

Generating the tests in the way you propose, would require to either separate the tests from the impl code e.g. into a template that can generate tests/*.rs files, in which case, I might just as well put it directly in tests/rs files. Or parsing the rust source code, removing the test code while compiling the crate, and adding the test code to some tests/.rs.

Basically, if we remove the constraint that the implementation code and the test code must be close to each other, solving this problem is trivial. The interesting issue is solving the problem without lifting the constraint.

I have a new solution in which I added a new layer of indirection: a macro that expands the tests macros. The library crate defines those macros to expand to nothing, not generating any tests. Then I add multiple crates, that define those macros to expand the tests, so that the tests only get expanded there. Right now, these other crates still need to recompile the original crate, but I am tweaking that to add it via a dependency and using #[path = “…”] mod …; to only include the modules of the tests. All of this is just really painful.

There should be a way to keep tests close to the code, and still be able to generate different TUs for different test subsets.


#8

That sounds like a bug, actually.

Incremental compilation requires a bit more memory because of dependency tracking. It might also require more memory because it splits the program into more CGUs, but since each CGU is smaller and can be freed after being process, the memory consumption might actually decrease. It depends on the structure of the program.

I haven’t had time to look into this in more detail, but here are some thoughts:

  • with incremental compilation or a sufficiently high number specified via -Ccodegen-units, the compiler will put each mod into its own CGU, so making the test generating macros wrap things in a mod might make a difference.
  • I don’t know where we are at with custom test runners, but ideally, if some kind of test filter is applied, tests that are filtered out should already be removed at the AST level. The compiler would do the rest of only compiling things that are necessary.