TL;DR - The nightly compiler can do incremental ThinLTO now and we’d like to know what the benefits are for (1) compilation time and (2) run-time performance of artifacts generated this way.
For a few weeks now the nightly compiler supports combining incremental re-compilation and ThinLTO. This should be good news for anyone regularly re-compiling things that need good runtime performance even for testing, like benchmarks and soft real-time applications.
Incremental compilation has always supported release builds but the resulting binaries often were 2-3 times slower than their non-incrementally built counterparts. The reason for this is that incremental compilation splits crates into many small compilation units that are optimized in isolation and LLVM thus misses many inter-procedural optimizations that it can do for a regular build. ThinLTO was invented to alleviate exactly this problem: After each compilation unit is optimized, ThinLTO does an analysis pass that takes note of what things could be inlined into other compilation units. Using this information, it then does another pass, doing optimizations across compilation unit boundaries.
This is similar to what we already do for regular release builds since ThinLTO is the default for those. However, incremental compilation partitions the crate in a more fine-grained way and we’d like to know if and how much this affects runtime performance (we hope that it’s not too much) and how much compilation time we can save by compiling incrementally. We have some idea about the later from perf.rust-lang.org (compilation is 2-5 times as fast for small changes, depending on the project) but know very little about the former. This is where you come in:
How can I help?
I would be great to have two data points for as many real-world projects out there as possible:
- How does incremental compilation affect compile times for release builds.
- How do incrementally compiled programs perform at runtime, compared to non-incrementally built ones.
For that you need a project to test which either contains benchmarks or has some other way of measuring runtime performance. If your project uses
cargo bench, you can do something like the following:
# Make sure we have the latest nightly rustup update nightly # Make sure all dependencies are downloaded and # we start with a fresh `target` directory cargo +nightly build && cargo +nightly clean # Get the non-incremental baseline, collect the values in baseline.txt # Note the build time here, it should be displayed as something like: # Finished release [optimized] target(s) in 5.84s CARGO_INCREMENTAL=0 cargo +nightly bench | tee ./baseline.txt # Make a small change in a program somewhere, something you'd likely do in # between two benchmark runs. # # USER INTERACTION REQUIRED HERE # Build again non-incrementally, in order to see how long re-compiling takes. # You don't have to wait for the benchmarks to finish. CARGO_INCREMENTAL=0 cargo +nightly bench # Clear the target directory for good measure cargo +nightly clean # Build incremental now and run the benchmarks. The build time, again, should # show up as something like: # Finished release [optimized] target(s) in 6.04s CARGO_INCREMENTAL=1 cargo +nightly bench | tee ./incremental.txt # Again, make a small change in a program somewhere, something you'd likely do in # between two benchmark runs. # # USER INTERACTION REQUIRED HERE # Now run `cargo bench` again and take note of the build time. CARGO_INCREMENTAL=1 cargo +nightly bench # If you don't have it yet, install cargo-benchcmp cargo install cargo-benchcmp # Print a comparison of the benchmark results cargo benchcmp ./baseline.txt ./incremental.txt
If your project does not use
cargo bench, you’ll have to adapt the above as needed. The most interesting questions are: Is building with
CARGO_INCREMENTAL=1 faster than with
CARGO_INCREMENTAL=0, and does the resulting program have roughly the same performance in both modes?
Looking forward to seeing results