It's that time of year again and we've got a small gift for all y'all for the holidays! The parallel compiler working group has implemented a plan for you to test out a build of rustc
which has far more parallelism than the current rustc
does today. To cut straight to the chase, the perf improvements are looking great and we're curious to compare two nightly compilers against each other:
-
nightly-2019-12-18
- this compiler has more parallelism -
nightly-2019-12-17
- this compiler has less parallelism
You can acquire, test and run these compilers with:
$ rustup update nightly-2019-12-18
$ rustup update nightly-2019-12-17
$ cargo +nightly-2019-12-18 build
$ cargo +nightly-2019-12-17 build
(etc)
What is parallel rustc?
But wait, you may be saying, isn't rustc already parallel! You're correct, rustc already has internal parallelism when it comes to codegen units and LLVM. The compiler, however, is not parallel at all when it's typechecking, borrow-checking, or running other static analyses on your crate. These frontend passes of the compiler are completely serial today. In development for quite some time now is a compiler that can run nearly every single step of the compiler in parallel.
Enabling parallelism in rustc, however, drastically changes internal data structures (think using Arc
instead of Rc
). For this reason previous builds of rustc do not have the ability to support frontend parallelism. A special nightly build, nightly-2019-12-18
, has been prepared which has support compiled in for parallelism. This is experimental support we're still evaluating, though, so the commit has already been reverted and subsequent nightlies will be back to as they were previously.
What information to gather?
The parallel compiler working group is keen to get widespread feedback on the parallel mode of the compiler. We're interested in basically any feedback you have to offer, but some specifics to help you get started we're interested in are:
- Have you found a bug? Please report it!
- For example did rustc crash?
- deadlock?
- produce a nondeterministic result?
- exhibiting any other weirdness when compiling?
- Is parallel rustc faster?
- When comparing, please compare
nightly-2019-12-18
(parallel) andnightly-2019-12-17
(not parallel) - Is a full build faster?
- Is a full release build faster?
- Is a check build faster?
- How about incremental builds?
- Single-crate builds?
- When comparing, please compare
- How does parallelism look to you?
- Did rustc get slower from trying to be too parallel?
Time measuring tools like the time
shell built-in as well as /usr/bin/time
are extra useful here because they give insight to a number of statistics we're interested in watching. For example kernel time, user time, wall time, context switches, etc. If you've got info, we're happy to review it!
Some example commands to compare are:
# full build
$ cargo clean && time cargo +nightly-2019-12-18 build
$ cargo clean && time cargo +nightly-2019-12-17 build
# full release build
$ cargo clean && time cargo +nightly-2019-12-18 build --release
$ cargo clean && time cargo +nightly-2019-12-17 build --release
# full check
$ cargo clean && time cargo +nightly-2019-12-18 check
$ cargo clean && time cargo +nightly-2019-12-17 check
# ... (etc)
When you report data it'd also be very helpful if you indicated what your system looks like. For example:
- What OS do you have? (Windows/Mac/Linux)
- How many CPUs do you have? (cores/threads/etc)
- How much memory do you have?
We've already seen some widely varying data across different system layouts, for example 28-thread machines have shown very different performance characteristics than 8-thread machines. Most testing has happened on Linux so far so we're very interested to get more platforms into the pipeline too!
Known issues
- The compiler will max out at 4 parallelism. We've hit some issues with rustc scaling to many threads causing slowdowns for various reasons. We're working on a solution and have a number of ideas of how to solve this. If you've got a 128 core system and only 4 are in use, fear not! We'll soon be able to make use of everything
- If you pass
-jN
(which defaults to the number of cores you have) Cargo may end up spawning more thanN
rustc processes. No more thanN
should actually be doing work, but it may be the case that more processes are spawned. We plan to fix this before shipping parallel rustc.
Thanks in advance for helping us out! We hope to turn at least some parallelism on by default early next year (think January) with full parallelism coming soon after. That all depends on the feedback we get from this thread, though, and we'd like to weed out any issues before we turn this on by default!