Incremental Compilation Beta

michaelwoerister · February 3, 2017, 2:34pm

Over the last few weeks incremental compilation has reached a level stability and performance where we think it is ready for more widespread testing. We invite everyone who is already using the nightly version of Rust to switch the feature on and enjoy the sometimes substantially reduced compile times. If you are using Cargo, you can opt into incremental compilation by setting the CARGO_INCREMENTAL environment variable:

CARGO_INCREMENTAL=1 cargo <command>

Cargo will take care of choosing and allocating an incremental cache directory. If you are invoking the compiler directly, you need to specify the cache directory explicitly:

rustc -Zincremental=<path> <other arguments>

The compiler will create a directory at the given path and store any intermediate artifacts there. You can use the same directory for all your projects but, in order to avoid thrashing, we recommend to have separate caches for debug and release builds, as in /tmp/rustc-incr-debug and /tmp/rustc-incr-release. Note that this is something you don't have to worry about if you are going through Cargo.

How much of a compile time reduction can you expect? The next section should give you an idea.

Performance

Let's take a look at how the Beta version of incremental compilation compares to the Alpha version from last September. We'll be using regex 0.2.1 as our test crate. The charts below show how long a recompile takes after various changes have been made in the source code. The first chart shows timings for debug builds (unoptimized + debuginfo):

Regex update compile times (debug)

_{(All timings done on a quad-core CPU without hyper-threading)}

As you can see, incremental compilation has gotten faster across the board and in some cases -- like when compile::Compiler::new() is changed -- the more fine-grained dependency tracking of the Beta version really pays off. You can also see that an incremental build takes only 35-50% percent of the time a regular build would. Next up, timings for release builds (optimized + no debuginfo):

Regex update compile times (optimized)

Release builds follow the same trend: the Beta is always faster than the Alpha. This also goes to show that incremental compilation interacts very well with optimization. In most cases you will even save more time because the artifacts that are being re-used are that much more expensive. Recompiling in release mode will often be five times as fast with incremental compilation.

Does this mean that incremental compilation is always a win? Let's take a look at the worst case scenario for incrementality: when there is nothing cached yet. The following charts show compile times for the initial build of various crates, starting again with debug builds:

Compile times for initial build (debug)

As was already the case for the Alpha version, the initial incremental build can be slightly slower than a regular build (syntax and futures). This is because dependency tracking is not entirely free. Incremental builds can also be faster (as in the regex case here) because of the smaller amount of inlining and increased parallelism in this mode.

For release builds the picture looks different again:

Compile times for initial build (optimized)

For codegen-heavy crates such as regex and syntax the inlining and parallelism effects are much more pronounced and the incremental build is much faster than a regular build in these cases. (Note that you can get the same effect in a regular build by setting the number of codegen units to roughly the number of modules in your crate, e.g. with rustc -Ccodegen-units=140). For the futures crate, not much has changed between debug and release builds since this is a code base that mostly consists of generic definitions for which no machine code is generated. However, these kinds of crates are usually the ones that already compile comparatively quickly.

In summary: there is sometimes a small up-front price to pay for incrementality but that's an investment should amortize itself already the first time you recompile after a change.

What's new since the Alpha release?

Development since September has focused on two major areas:

Stability and Testing - We have invested a lot of energy into building up a strong suite of regression tests for incremental compilation. This includes a ton of hand-written tests (with the very much appreciated help of @oldmanmike, @MathieuBordere, @eulerdisk, and @flodiebold), as well as the rust-icci effort which constitutes a suite of continously tested code bases, for each of which we incrementally compile substantial amounts of the project's git history. With every new nightly version of Rust we re-compile thousands of change sets and at each step we check, bit for bit, that the LLVM IR and object code generated by incremental compilation is the same as the code generated by from-scratch builds. It's not a fool-proof method, but in addition to the handwritten tests, it gives us confidence that there are no gaping holes in our dependency tracking.
Improved dependency tracking granularity - There have also been some very important changes to the compiler's internals that allow for much more accurate dependency tracking:
- #37660 made sure that changing one method of an impl does not result in all methods of that impl being invalidated.
- #37918 allows to track function signatures independently from function bodies, which means that call-sites of functions don't get invalidated anymore when the body of the function is changed. This has been implemented in an heroic effort by @flodiebold!
- #38944 changes how and where machine code is generated for generic functions, allowing for much more re-use in code bases that invoke lots of generic code across different modules.

Still Room for Improvement

Incremental compilation is not implemented to its full potential yet. Here are some areas that we will work on going forward:

We still only cache object files and LLVM IR, so performance gains are mostly to be expected for crates where the compiler spends a lot of time generating those. Libraries that mostly contain generic definitions won't profit from incremental compilation yet. (Note though that users of those generic definitions will profit). Progress on cache expansion is tracked on the type-checking milestone.
We have temporarily turned off fine-grained cross-crate dependency tracking, since the current implementation can cause quadratic compile time blow up for some crates. Multi-crate projects still work without any problems, incremental compilation will just be a bit less effective when changes are made to non-leaf crates. Progress on this front is tracked on the cross-crate milestone.
Runtime performance of incrementally built code is expected to be worse than that of regularly built code. We therefore don't recommend building release binaries in incremental mode.

Incremental Compilation Rollout Plan

The current release marks a version where we invite those who are already using nightly Rust to enable incremental compilation for their non-production builds and enjoy the shorter compile times. There are still a few steps to go before we make the feature available on the stable channel:

Right now, we are in the "opt-in" phase. Enable the feature as described at the beginning of this post.
Next, when we feel that we have reached all correctness and performance goals, we will enable incremental builds by default for dev builds on nightly. That is, when you run cargo build (without the --release) you'll get incremental compilation unless you explicity opt out.
When default-on-nightly has shown to be reliable, we'll make incremental compilation available in the stable compiler.

Until then...

Want to make incremental compilation better?

If you are interested in lending a hand, there are quite a few things you can do without investing tons of time:

Use incremental compilation and tell us about any crashes or bugs you run into. This will combine business with pleasure, since incremental compilation should almost always be a win for your compile times during development, and often a substantial one.
If you do any benchmarking in your projects, we'd be very interested in seeing how much of an impact incremental compilation has on the runtime performance of your code. Note that incremental compilation works just as well for optimized code as it does for debug builds. The compile time reductions will even be greater for the former.
If you know of a project that would be a good fit for our continuous testing, tell us about it. We are looking for realistic, open-source code bases that don't take too long to compile and that have an interesting mix of changes in their git history (i.e. small, uncommon changes that are likely to uncover holes in our dependency tracking).
If you would like to develop a fuzzing tool that generates and evolves a bunch of Rust code and tests it against incremental compilation, you'd be sure to have our attention and support. (This would also make a great GSoC project)

You can reach us here in the Rust Internals Forum, on the #rustc IRC channel, or by opening an issue on Github.

Nashenas88 · February 3, 2017, 9:42pm

Does the compiler build system (x.py) support this yet? I mainly use an 8yo macbook pro, and this would be SO useful (currently ~40 minutes for rebuilds after a git pull/rebase).

SimonSapin · February 3, 2017, 10:41pm

I think I’ve seen --incremental in ./x.py --help, but I’m on mobile atm so you’ll have to check yourself

SimonSapin · February 3, 2017, 10:44pm

Use incremental compilation and tell us about any crashes or bugs you run into.

Properly trying incremental compilation in Servo is blocked on "serialize dep graph" with CARGO_INCREMENTAL=1 takes 26 minutes and 18 GB RSS · Issue #39208 · rust-lang/rust · GitHub

brson · February 3, 2017, 11:05pm

Great writeup. Thanks @michaelwoerister.

sgrif · February 3, 2017, 11:07pm

No but seriously, Diesel would probably be a good candidate here if it's not already. We tend to exercise parts of the compiler that not a ton of crates do, especially WRT performance.

nikomatsakis · February 4, 2017, 12:05am

You can do ./x.py --incremental – though the compiler has its own internal complications that limit its effectiveness. (In particular, the cross-crate tracking would be helpful here.)

michaelwoerister · February 4, 2017, 12:49am

I’ll look into it!

michaelwoerister · February 4, 2017, 12:52am

This is issue should be gone in the most recent nightly. So whenever Servo updates to that, you should see some gains.

GolDDranks · February 4, 2017, 4:39am

I’m running into a bug: tried to build the release version of Tokei with it, but got a bunch of linker errors: https://github.com/Aaronepower/tokei/issues/102

Without the incremental flag it builds successfully.

SimonSapin · February 4, 2017, 11:05am

Indeed!

My benchmark is: build Servo, run touch components/style/lib.rs (so nothing other than a timestamp actually changes), then build again. With -C codegen-units=4 on a fast machine, this second build takes 147 seconds. With -C codegen-units=4 and CARGO_INCREMENTAL=1 (for both builds) on the same machine, the second build takes 108 seconds.

So incremental compilation cuts 26% of the time. On one hand this is significant, on the other hand I kind of hoped for more since no source code has actually changed.

michaelwoerister · February 4, 2017, 2:11pm

Thanks for the report! I opened an issue on Github: https://github.com/rust-lang/rust/issues/39534

michaelwoerister · February 4, 2017, 2:21pm

This sounds like you are no getting any re-use, just the speed-up from the increased number of codegen units. If you specify -Zincremental-info via RUSTFLAGS, can you look for lines like incremental: re-using 0 out of 1 modules?

aochagavia · February 4, 2017, 5:15pm

I would be interested in the fuzzing tool, but don’t have enough time to do a full GSOC project. Is there anyone who would like to mentor?

SimonSapin · February 4, 2017, 6:05pm

I’m skeptical this is just codegen units, since the non-incremental result was with -C codgen-units=4 on a 4 cores (8 threads) machine. With -C codegen-units=200 non-incremental, I get 161 seconds, slower than 147 seconds for 4 units.

Back to incremental builds, -Zincremental-info says everything is reused: incremental: re-using 1252 out of 1252 modules. Here is the output with RUSTFLAGS="-Zincremental-info -Ztime-passes" for the largest crate (script) https://gist.github.com/anonymous/996ddd7dae5da732c02e517f368a08bf Extract:

time: 3.367; rss: 830MB	expansion
time: 2.508; rss: 1086MB	compute_incremental_hashes_map
time: 16.472; rss: 1607MB	load_dep_graph
time: 7.488; rss: 2957MB	item-bodies checking
time: 7.160; rss: 4005MB	translation
time: 17.666; rss: 3945MB	serialize dep graph
time: 9.442; rss: 2305MB	linking
    Finished dev [unoptimized + debuginfo] target(s) in 79.13 secs

alexcrichton · February 4, 2017, 7:52pm

@michaelwoerister I’ve tested this out on Cargo and unfortunately get an ICE when compiling tests (but normal builds work great!). Would it be useful to open an issue for this and post details, or are we still at early enough stages that such large projects end up being too large to reduce in a reasonable amount of time?

nikomatsakis · February 4, 2017, 9:17pm

The enormous time spent in load_dep_graph and serialize_dep_graph is certainly a problem. We have a pending PR that will help, but I’d like to dig more into it.

UPDATE: it landed, actually.

SimonSapin · February 5, 2017, 1:32pm

This helps. I built Rust master locally since that change is not in Nightly yet. Rebuilding the script crate with incremental compilation after touch components/script/lib.rs now takes 52 seconds instead of 79. Output like before: https://gist.github.com/anonymous/a6baf2e4794acfc7961a3a4bf51464e1 and extract (2+ seconds lines):

   Compiling script v0.0.1 (file:///home/simon/servo/components/script)
time: 2.681; rss: 797MB	expansion
time: 2.314; rss: 1102MB	compute_incremental_hashes_map
time: 2.014; rss: 1153MB	load_dep_graph
time: 7.397; rss: 1942MB	item-bodies checking
time: 7.088; rss: 3274MB	translation
time: 8.728; rss: 3140MB	serialize dep graph
time: 8.303; rss: 1662MB	linking
    Finished dev [unoptimized + debuginfo] target(s) in 52.24 secs

Rebuilding all of Servo with incremental compilation after touch components/style/lib.rs now takes 81 seconds instead of 108 seconds.

Edit: “instead” here means compared to rustc from a few days ago, not compared to non-incremental compilation.

Edit 2: my second benchmark is therefore: 81 seconds with incremental compilation vs 117 without, which is ~30% less time.

eulerdisk · February 5, 2017, 9:24pm

That’s interesting. I wonder if, in general, recompiling after a very small change (for a project having a big dependency graph to [de]serialize) will still be relatively slow cause all the incremental stuff management cost. Is it an absolute limiting factor or is only a problem with the current implementation ?

michaelwoerister · February 6, 2017, 8:31am

@alexcrichton Yes, please open an issue. We do some testing of compiling and running unit tests but not a lot. We should change that!

Topic		Replies	Views
Help us benchmark incremental compilation!	48	12211	March 25, 2019
Help us test incremental ThinLTO!	3	860	March 25, 2019
~3x compilation speedup, is any movement into this direction? compiler	9	2616	August 31, 2019
Adapting rustbuild to incremental compilation compiler	2	940	March 25, 2019
Want to help develop tool for testing incremental compilation? internals	1	2160	March 25, 2019