Evaluating pipelined rustc compilation

gbutler · May 19, 2019, 12:37pm

Unfortunately, you’ll need to re-run the test as it was supposed to be CARGO_BUILD_PIPELINING (see Alex C’s comment later in this thread).

gbutler · May 19, 2019, 12:42pm

Unfortunately, you’ll need to re-run the test. The environment variable is CARGO_BUILD_PIPELINING (as Alex C. mentioned in a comment later in this thread correcting the mistake).

nlopes · May 19, 2019, 4:10pm

Some of my own repos here, all with some gains.

Processor info:

Processor Name:	Intel Core i7
Processor Speed:	3.1 GHz
Number of Processors:	1
Total Number of Cores:	4
L2 Cache (per Core):	256 KB
L3 Cache:	8 MB

Repository: https://github.com/nlopes/avro-schema-registry

cargo +nightly build

Finished dev [unoptimized + debuginfo] target(s) in 3m 02s

717.53s user 56.80s system 424% cpu 3:02.31 total

CARGO_BUILD_PIPELINING=true cargo +nightly build

Finished dev [unoptimized + debuginfo] target(s) in 2m 53s

723.05s user 57.20s system 450% cpu 2:53.19 total

cargo +nightly build --release

Finished release [optimized] target(s) in 6m 15s

2185.83s user 66.93s system 599% cpu 6:15.75 total

CARGO_BUILD_PIPELINING=true cargo +nightly build --release

Finished release [optimized] target(s) in 5m 41s

2137.35s user 62.52s system 644% cpu 5:41.46 total

Repository: https://github.com/nlopes/arq

cargo +nightly build

Finished dev [unoptimized + debuginfo] target(s) in 28.39s

106.63s user 10.62s system 412% cpu 28.429 total

CARGO_BUILD_PIPELINING=true cargo +nightly build

Finished dev [unoptimized + debuginfo] target(s) in 24.68s

95.99s user 10.19s system 429% cpu 24.715 total

cargo +nightly build --release

Finished release [optimized] target(s) in 47.06s

253.93s user 11.03s system 562% cpu 47.088 total

CARGO_BUILD_PIPELINING=true cargo +nightly build --release

Finished release [optimized] target(s) in 45.88s

259.32s user 11.13s system 589% cpu 45.911 total

mersinvald · May 19, 2019, 6:17pm

Tried it on a small pure rust library: https://github.com/ethereumproject/evm-rs

Full crate graph

mode	pipelined	time	change
debug	no	13.28	-
debug	yes	12.65	4.75%
release	no	15.52	-
release	yes	13.81	11.02%

Incremental Builds

touch src/lib.rs before each build:

mode	pipelined	time	change
debug	no	0.63	-
debug	yes	0.63	-
release	no	1.77	-
release	yes	1.77	-

find . -name '*.rs' | xargs touch before each build: same, difference within margin of error.

Impressive improvements on full graph builds, thanks for you work!

tesuji · May 20, 2019, 4:17am

@alexcrichton I think the correct configuration in ~/.cargo/config is

[build]
pipelining = true # since nightly-2019-05-17

Otherwise, I got this error:

error: invalid configuration for key `alias.build`
expected a list, but found a table for `alias.build` in /home/lzutao/.cargo/config

jespersm · May 20, 2019, 2:21pm

Great work – tried it in a private repository (on 2016 MBP, 2,9 GHz Intel Core i7):

Dev build went from 3m22.914s to 2m35.61s (35%) Release build went from 6m42.843s to 6m25.439s (4%)

Incremental dev build went from 0m16.556s to 0m16.238s (2%) Incremental release build went from 1m20.420s to 1m12.734s (12%)

kornel · May 20, 2019, 2:35pm

On MBP 13 (with just 2 cores + 2 HT cores) it didn’t do much.

debug without: 6m39.514s
debug with: 6m39.335s
release without: 6m34.476s
release with: 6m34.876s

In my case the debug build is slower than release, because I use [profile.dev] opt-level = 1 plus llvm-dsymutil is that slow on macOS.

alexcrichton · May 20, 2019, 2:53pm

Thanks again everyone for continuing to post data!

If you’re curious I’ve been collating the data here so far in a spreadsheet which shows the following statistics:

Across the board there appears to be no regressions. The reductions in build time here I can’t reproduce locally and may have been PIPELINING/PIPELINED confusion (sorry!) and may also just be normal variance. In either case there hasn’t yet been a reliably reproducible regression!
Build speeds can get up to almost 2x faster in some cases
Across the board there’s an average 10% reduction in build times. The standard deviation though is pretty and it seems to confirm that you’re either seeing large-ish reductions or very little.

In any case, to me at least this is some very strong data that we should continue on the stabilization path for this cargo/rustc feature!

KillTheMule · May 20, 2019, 3:04pm

Development branch of nvimpam, slightly outdated commit & dependencies, beefy machine

Build Type	Default	Pipelined	Difference
debug	13.2s	12s	9% faster
release	55.2s	43.8s	20.6% faster
debug incr	1.2s	1.2s	no change
release incr	24.5s	24.5s	no change

erahm · May 20, 2019, 6:02pm

Updated numbers w/ the right env var, still not a big change for Firefox debug builds. These are the averages for 6 builds each using small build script to automate things.

erahm@shetland:~/dev/mozilla-unified$ ./build.sh
info: using existing install for 'nightly-x86_64-unknown-linux-gnu'
info: default toolchain set to 'nightly-x86_64-unknown-linux-gnu'

  nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.36.0-nightly (7d5aa4332 2019-05-16)

rustc 1.36.0-nightly (7d5aa4332 2019-05-16)
cargo 1.36.0-nightly (c4fcfb725 2019-05-15)
Initial run to populate ccache...done
Baseline run without pipelining
===============================
CARGO_BUILD_PIPELINING=
done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...===> multitime results
1: -r "echo done && echo -n clobbering... && ./mach clobber && echo done && echo -n building..." -q ./mach -l build.log build
            Mean        Std.Dev.    Min         Median      Max
real        494.185     4.303       488.568     492.967     500.837     
user        4410.067    11.823      4396.210    4407.836    4430.236    
sys         183.863     1.372       181.518     184.094     185.909     

Running with pipelining
=======================
CARGO_BUILD_PIPELINING=true
done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...===> multitime results
1: -r "echo done && echo -n clobbering... && ./mach clobber && echo done && echo -n building..." -q ./mach -l build.log build
            Mean        Std.Dev.    Min         Median      Max
real        482.320     8.123       472.841     482.124     492.503     
user        4480.873    13.970      4454.818    4484.566    4495.999    
sys         183.942     1.294       182.310     184.074     185.812     
info: using existing install for 'stable-x86_64-unknown-linux-gnu'
info: default toolchain set to 'stable-x86_64-unknown-linux-gnu'

  stable-x86_64-unknown-linux-gnu unchanged - rustc 1.34.1 (fc50f328b 2019-04-24)

We’re looking at maybe a 2.4% (12s) improvement on average, but with a std dev of 8s, this is basically a wash.

gbutler · May 20, 2019, 8:18pm

That’s for a full-build right? What about with the following:

After a full build, just touch main.rs/lib.rs and try again
After a full build, touch all the src files in the main package and try again

jsgf · May 20, 2019, 8:48pm

Thanks for working on this! It's great! I posted about this a few months ago so I'm pleased to see you've implemented more or less exactly what I was thinking about.

I'm glad that there are no perf regressions, but I'd be surprised to see much improvement on machines with <=4 cores. But the 10+ core machines often end up starving when there's a bottleneck crate in the middle of the dep graph.

Even though the .rmeta is being used as the interface, I'm assuming that it has internal implementation details, since Rust relies a lot on its inlining, and it also needs to make sure the SVH is propagated through properly. I'm going to implement pipelining in Buck when I get the chance, and since Buck is entirely content/content-hash driven, it will demonstrate this one way or the other.

erahm · May 20, 2019, 9:09pm

For Firefox there’s not really a main lib, so I’m not sure what we’d want to mess with. I currently have ~10,000 .rs files and ~400 lib|main.rs files under my tree.

gbutler · May 20, 2019, 9:21pm

What about “Release” builds? My understanding is that it is expected for bigger gains there.

vi0 · May 20, 2019, 11:27pm

Tried building websocat, multiple times both with CARGO_BUILD_PIPELINING=false and CARGO_BUILD_PIPELINING=true and saw no difference, both in release and in debug mode.

GregBowyer · May 21, 2019, 3:50am

Is there a cheat sheet for how this calls rustc, it would be good to pull this potentially into the bazel rust rules.

erahm · May 21, 2019, 4:16am

Firefox “release” (as in non-debug), basically same results: 12s (2.7%) w/ stdev of 7s.

djc · May 21, 2019, 7:33am

For quinn (146 targets to build):

$ cargo +nightly clean && cargo +nightly build
real	0m36.728s
user	3m33.460s
sys	0m17.615s
$ cargo +nightly clean && CARGO_BUILD_PIPELINING=true cargo +nightly build
       33.34 real       223.06 user        17.78 sys
$ cargo +nightly clean && time cargo +nightly build --release
real	1m11.022s
user	9m18.169s
sys	0m19.508s
$ cargo +nightly clean && CARGO_BUILD_PIPELINING=true time cargo +nightly build --release
       64.22 real       579.72 user        19.39 sys

So that’s a ~10% speedup for both release builds and debug builds (macOS laptop).

euclio · May 21, 2019, 5:32pm

Does this work with the compiler bootstrap? Has anyone profiled it?

SimonSapin · May 22, 2019, 5:51pm

Compiling Servo with time ./mach build --dev (which pretty calls cargo build) with rustc 1.36.0-nightly (50a0defd5 2019-05-21) on a desktop CPU with 4 cores / 8 threads.

First after cargo fetch and cargo clean. This involves 626 crates and significant amount and C and C++ code.
```
real	9m30,827s	user	47m59,862s	sys	2m20,221s
```
Then after touch component/style/lib.rs. This rebuilds 10 crates, mostly sequentially because that part of the dependency graph is narrow.
```
real	1m18,871s	user	1m3,592s	sys	0m9,204s
```

Same again with export CARGO_BUILD_PIPELINING=true:

Two full builds:

real	9m22,346s	user	48m32,564s	sys	2m20,378s
real	9m28,264s	user	48m24,739s	sys	2m20,078s

Two incremental builds:

real	1m22,314s	user	1m5,825s	sys	0m9,117s
real	1m19,668s	user	1m5,520s	sys	0m9,466s

Any difference seems to be within noise level. I expect that incremental compilation (which is enabled by default) with a perfect cache would cause ~zero time to be spent in LLVM, which reduces or cancels the benefit of pipelining.

In real day-to-day work with an actual code change and not just touch the incremental cache hit rate would not be 100%, but I expect it should still be pretty high since there is so much code and we’re typically only touching a small part at a time.

Topic		Replies	Views
Exploring Crate Graph Build Times with `cargo build -Ztimings` cargo	37	15984	December 22, 2024
Incremental Compilation Beta compiler	37	30358	March 25, 2019
Proposal: Add "cargo:rustc-compile-crate-without-waiting-for-build-rs" for build.rs compiler	11	1055	November 28, 2022
Help us benchmark incremental compilation!	48	12281	March 25, 2019
Could rustc compile dependencies in parallel? compiler	5	3104	June 5, 2019

Evaluating pipelined rustc compilation

Related topics