Default settings for parallel codegen


#1

Parallel codegen has landed, and I’d like to enable it by default for certain build configurations.

Summary of proposal

  • Set codegen-units=2 by default when using rustc with --opt-level=0 or --opt-level=1.
  • Set codegen-units=1 (the current default) when using rustc with --opt-level=2 (a.k.a -O) or --opt-level=3.
  • When bootstrapping rustc for a development build, use codegen-units=4 for all stages.
  • When bootstrapping rustc for a release build, use codegen-units=4 for stage0 and codegen-units=1 for stage1 and stage2.
  • Use codegen-units=4 for crate unit tests, and use codegen-units=1 for other types of tests.

General info

Building with parallel codegen enabled (-C codegen-units=2 or higher) usually results in faster compile times but slightly worse performance in the generated code. Specifically, parallel codegen speeds up the build by running some LLVM passes on several parts of the crate in parallel, but dividing up the crate to make this possible prevents some uses of inlining and other optimizations. Both effects become more significant (relative to codegen-units=1) at higher optimization levels: more time is spent in LLVM, so there is more benefit to parallelizing that work, and there are more optimizations that LLVM would ordinarily perform, but can’t because of compilation-unit boundaries introduced by parallel codegen. Finally, parallel codegen introduces overhead that grows with the number of codegen units, and in some cases this can result in slower build times overall. This happens mainly for small crates when --opt-level=0 and codegen-units >= 4.

User code

When rustc is compiling user code, it should use two codegen units for low optimization levels (0 and 1), and one codegen unit for high optimization levels (2 and 3). This gives 10-20% build time reduction at level 0, and up to 30% at level 1.

At low optimization levels, the effect of parallel codegen on performance of the compiled code is less noticeable. The slowdown is mostly due to preventing inlining, and at low optimization levels LLVM does less inlining regardless. In particular, at level 0 inlining is almost completely disabled, and the slowdown from parallel codegen should be negligible. Furthermore, the compiler should avoid using parallel codegen on high optimization levels, because the user (by enabling optimization level 2 or 3) has clearly expressed a preference for performance of the generated code over fast compile times.

The compiler defaults to two codegen units because higher settings cause noticeably slower compile times for small crates at low optimization levels. In particular, I have observed 10% increases in compile times with four codegen units. With two codegen units the worst increase is 3%, and slowdowns occur only on crates that take less than a second to build regardless.

rustc and tests

The build system for Rust itself should use four codegen units for all stages of development builds, and for crate unit tests. The build system should use one codegen unit for stage1 and stage2 of release builds, and for other types of tests.

Building with four codegen units reduces the overall bootstrapping time by 35%. However, the Rust components of the bootstrapped compiler will take about 25% longer to run. (Much of the build time is still spent in LLVM code, so the actual effect on build times will be a 5-20% increase, depending on the code being compiled and the optimization level.)

Release builds should use parallel codegen only when building stage0. The compiler built during stage1 and the libraries built during stage2 are distributed as part of the release, so they should be as highly optimized as possible. Using parallel codegen for stage0 still gives a roughly 10% reduction in bootstrapping time.

Parallel codegen should be enabled for crate unit tests, but not for other types of tests such as run-pass and run-make. Crate tests spend most of their time on compiling the crate with --test, so they get the most benefit from parallel codegen. compiletest-based tests already run in parallel, so there is no benefit to be gained there. run-make tests (and also compiletest-based tests) usually consist of single-module crates, which gain no benefit from parallel codegen. Enabling parallel codegen for crate tests only provides a 25% reduction in the time taken for make check, with most of the benefit coming from reducing libsyntax and librustc build times.

The “build type” setting (development vs. release) will be set with a ./configure flag. The default will be “development”, on the theory that people building from source are more likely doing compiler development rather than building a copy of rustc to install permanently. (People who do want to install rustc would more likely use an official release instead of building from source.) As an alternative, the build system could decide based on the new “release channel” flag whether to use development mode (--release-channel=source) or release mode (any other setting).

The ./configure script will also need a separate flag to set the number of codegen units to a custom value (including 1, to disable parallel codegen). Users who want more complicated behavior can get it by overriding Makefile variables (such as RUSTFLAGS).

Note: Currently codegen-units >= 2 produces libraries that can’t be used when linking with LTO. This means that building stage2 libraries with codegen-units=4 will break LTO tests. I think this limitation will not be too hard to fix, and I plan to do so in the next few days.

Build automation

I’m not sure if the buildbot hardware will handle multithreaded builds well. If not, the builders may need to override the default behavior described above to get good performance. (If it does work well, build times could be reduced by up to 25%.)


Parallel codegen plans
#2

For the distinction between ‘release’ and ‘development’ builds I suggest tying this to the --release-channel configure flag I’m adding. The ‘source’ channel could maybe be renamed to ‘development’.


#3

Big +1 for all of this :slight_smile: I agree with brson, it makes sense to tie the compilation option/mode to the channel.

As well as the parallel codegen settings, I would like to build dev builds of rustc at -O1 and release at -O3. So dev builds are basically the fastest build they possibly can be and release builds are the best quality builds.

We should also consider setting the number of codegen threads inside a crate as an attribute (as suggested by Jack).

Finally, we need a Cargo flag to pass the number of codegen threads to rustc.


#4

This all sounds great to me! The builders are all fairly large instances on EC2 with plenty of cores, so I don’t think we’ll run into too many problems there.


#5

The default should be catered towards people who are building for themselves or to package it for others. The group of people working on the compiler can be expected to pass configure flags, but it can’t be expected of distribution packagers and users who are not going to be familiar with the build system. The end result will be that distribution packages perform poorly, and that will reflect poorly on Rust.