Parallel codegen has landed, and I’d like to enable it by default for certain build configurations.
Summary of proposal
- Set
codegen-units=2
by default when usingrustc
with--opt-level=0
or--opt-level=1
. - Set
codegen-units=1
(the current default) when usingrustc
with--opt-level=2
(a.k.a-O
) or--opt-level=3
. - When bootstrapping
rustc
for a development build, usecodegen-units=4
for all stages. - When bootstrapping
rustc
for a release build, usecodegen-units=4
for stage0 andcodegen-units=1
for stage1 and stage2. - Use
codegen-units=4
for crate unit tests, and usecodegen-units=1
for other types of tests.
General info
Building with parallel codegen enabled (-C codegen-units=2
or higher) usually results in faster compile times but slightly worse performance in the generated code. Specifically, parallel codegen speeds up the build by running some LLVM passes on several parts of the crate in parallel, but dividing up the crate to make this possible prevents some uses of inlining and other optimizations. Both effects become more significant (relative to codegen-units=1
) at higher optimization levels: more time is spent in LLVM, so there is more benefit to parallelizing that work, and there are more optimizations that LLVM would ordinarily perform, but can’t because of compilation-unit boundaries introduced by parallel codegen. Finally, parallel codegen introduces overhead that grows with the number of codegen units, and in some cases this can result in slower build times overall. This happens mainly for small crates when --opt-level=0
and codegen-units >= 4
.
User code
When rustc
is compiling user code, it should use two codegen units for low optimization levels (0 and 1), and one codegen unit for high optimization levels (2 and 3). This gives 10-20% build time reduction at level 0, and up to 30% at level 1.
At low optimization levels, the effect of parallel codegen on performance of the compiled code is less noticeable. The slowdown is mostly due to preventing inlining, and at low optimization levels LLVM does less inlining regardless. In particular, at level 0 inlining is almost completely disabled, and the slowdown from parallel codegen should be negligible. Furthermore, the compiler should avoid using parallel codegen on high optimization levels, because the user (by enabling optimization level 2 or 3) has clearly expressed a preference for performance of the generated code over fast compile times.
The compiler defaults to two codegen units because higher settings cause noticeably slower compile times for small crates at low optimization levels. In particular, I have observed 10% increases in compile times with four codegen units. With two codegen units the worst increase is 3%, and slowdowns occur only on crates that take less than a second to build regardless.
rustc
and tests
The build system for Rust itself should use four codegen units for all stages of development builds, and for crate unit tests. The build system should use one codegen unit for stage1 and stage2 of release builds, and for other types of tests.
Building with four codegen units reduces the overall bootstrapping time by 35%. However, the Rust components of the bootstrapped compiler will take about 25% longer to run. (Much of the build time is still spent in LLVM code, so the actual effect on build times will be a 5-20% increase, depending on the code being compiled and the optimization level.)
Release builds should use parallel codegen only when building stage0. The compiler built during stage1 and the libraries built during stage2 are distributed as part of the release, so they should be as highly optimized as possible. Using parallel codegen for stage0 still gives a roughly 10% reduction in bootstrapping time.
Parallel codegen should be enabled for crate unit tests, but not for other types of tests such as run-pass
and run-make
. Crate tests spend most of their time on compiling the crate with --test
, so they get the most benefit from parallel codegen. compiletest
-based tests already run in parallel, so there is no benefit to be gained there. run-make
tests (and also compiletest
-based tests) usually consist of single-module crates, which gain no benefit from parallel codegen. Enabling parallel codegen for crate tests only provides a 25% reduction in the time taken for make check
, with most of the benefit coming from reducing libsyntax
and librustc
build times.
The “build type” setting (development vs. release) will be set with a ./configure
flag. The default will be “development”, on the theory that people building from source are more likely doing compiler development rather than building a copy of rustc
to install permanently. (People who do want to install rustc
would more likely use an official release instead of building from source.) As an alternative, the build system could decide based on the new “release channel” flag whether to use development mode (--release-channel=source
) or release mode (any other setting).
The ./configure
script will also need a separate flag to set the number of codegen units to a custom value (including 1, to disable parallel codegen). Users who want more complicated behavior can get it by overriding Makefile variables (such as RUSTFLAGS
).
Note: Currently codegen-units >= 2
produces libraries that can’t be used when linking with LTO. This means that building stage2 libraries with codegen-units=4
will break LTO tests. I think this limitation will not be too hard to fix, and I plan to do so in the next few days.
Build automation
I’m not sure if the buildbot hardware will handle multithreaded builds well. If not, the builders may need to override the default behavior described above to get good performance. (If it does work well, build times could be reduced by up to 25%.)