Evaluating pipelined rustc compilation

Unfortunately, you’ll need to re-run the test as it was supposed to be CARGO_BUILD_PIPELINING (see Alex C’s comment later in this thread). :frowning:

Unfortunately, you’ll need to re-run the test. The environment variable is CARGO_BUILD_PIPELINING (as Alex C. mentioned in a comment later in this thread correcting the mistake).

Some of my own repos here, all with some gains.

Processor info:

Processor Name: Intel Core i7
Processor Speed: 3.1 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 8 MB

Repository: https://github.com/nlopes/avro-schema-registry

cargo +nightly build

Finished dev [unoptimized + debuginfo] target(s) in 3m 02s

717.53s user 56.80s system 424% cpu 3:02.31 total

CARGO_BUILD_PIPELINING=true cargo +nightly build

Finished dev [unoptimized + debuginfo] target(s) in 2m 53s

723.05s user 57.20s system 450% cpu 2:53.19 total

cargo +nightly build --release

Finished release [optimized] target(s) in 6m 15s

2185.83s user 66.93s system 599% cpu 6:15.75 total

CARGO_BUILD_PIPELINING=true cargo +nightly build --release

Finished release [optimized] target(s) in 5m 41s

2137.35s user 62.52s system 644% cpu 5:41.46 total


Repository: https://github.com/nlopes/arq

cargo +nightly build

Finished dev [unoptimized + debuginfo] target(s) in 28.39s

106.63s user 10.62s system 412% cpu 28.429 total

CARGO_BUILD_PIPELINING=true cargo +nightly build

Finished dev [unoptimized + debuginfo] target(s) in 24.68s

95.99s user 10.19s system 429% cpu 24.715 total

cargo +nightly build --release

Finished release [optimized] target(s) in 47.06s

253.93s user 11.03s system 562% cpu 47.088 total

CARGO_BUILD_PIPELINING=true cargo +nightly build --release

Finished release [optimized] target(s) in 45.88s

259.32s user 11.13s system 589% cpu 45.911 total

Tried it on a small pure rust library: https://github.com/ethereumproject/evm-rs

Full crate graph

mode pipelined time change
debug no 13.28 -
debug yes 12.65 4.75%
release no 15.52 -
release yes 13.81 11.02%

Incremental Builds

touch src/lib.rs before each build:

mode pipelined time change
debug no 0.63 -
debug yes 0.63 -
release no 1.77 -
release yes 1.77 -

find . -name '*.rs' | xargs touch before each build: same, difference within margin of error.

Impressive improvements on full graph builds, thanks for you work!

@alexcrichton I think the correct configuration in ~/.cargo/config is

[build]
pipelining = true # since nightly-2019-05-17

Otherwise, I got this error:

error: invalid configuration for key `alias.build`
expected a list, but found a table for `alias.build` in /home/lzutao/.cargo/config

Great work – tried it in a private repository (on 2016 MBP, 2,9 GHz Intel Core i7):

Dev build went from 3m22.914s to 2m35.61s (35%) Release build went from 6m42.843s to 6m25.439s (4%)

Incremental dev build went from 0m16.556s to 0m16.238s (2%) Incremental release build went from 1m20.420s to 1m12.734s (12%)

On MBP 13 (with just 2 cores + 2 HT cores) it didn’t do much.

  • debug without: 6m39.514s
  • debug with: 6m39.335s
  • release without: 6m34.476s
  • release with: 6m34.876s

In my case the debug build is slower than release, because I use [profile.dev] opt-level = 1 plus llvm-dsymutil is that slow on macOS.

Thanks again everyone for continuing to post data!

If you’re curious I’ve been collating the data here so far in a spreadsheet which shows the following statistics:

  • Across the board there appears to be no regressions. The reductions in build time here I can’t reproduce locally and may have been PIPELINING/PIPELINED confusion (sorry!) and may also just be normal variance. In either case there hasn’t yet been a reliably reproducible regression!
  • Build speeds can get up to almost 2x faster in some cases
  • Across the board there’s an average 10% reduction in build times. The standard deviation though is pretty and it seems to confirm that you’re either seeing large-ish reductions or very little.

In any case, to me at least this is some very strong data that we should continue on the stabilization path for this cargo/rustc feature!

8 Likes

Development branch of nvimpam, slightly outdated commit & dependencies, beefy machine

Build Type Default Pipelined Difference
debug 13.2s 12s 9% faster
release 55.2s 43.8s 20.6% faster
debug incr 1.2s 1.2s no change
release incr 24.5s 24.5s no change

Updated numbers w/ the right env var, still not a big change for Firefox debug builds. These are the averages for 6 builds each using small build script to automate things.

erahm@shetland:~/dev/mozilla-unified$ ./build.sh
info: using existing install for 'nightly-x86_64-unknown-linux-gnu'
info: default toolchain set to 'nightly-x86_64-unknown-linux-gnu'

  nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.36.0-nightly (7d5aa4332 2019-05-16)

rustc 1.36.0-nightly (7d5aa4332 2019-05-16)
cargo 1.36.0-nightly (c4fcfb725 2019-05-15)
Initial run to populate ccache...done
Baseline run without pipelining
===============================
CARGO_BUILD_PIPELINING=
done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...===> multitime results
1: -r "echo done && echo -n clobbering... && ./mach clobber && echo done && echo -n building..." -q ./mach -l build.log build
            Mean        Std.Dev.    Min         Median      Max
real        494.185     4.303       488.568     492.967     500.837     
user        4410.067    11.823      4396.210    4407.836    4430.236    
sys         183.863     1.372       181.518     184.094     185.909     

Running with pipelining
=======================
CARGO_BUILD_PIPELINING=true
done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...===> multitime results
1: -r "echo done && echo -n clobbering... && ./mach clobber && echo done && echo -n building..." -q ./mach -l build.log build
            Mean        Std.Dev.    Min         Median      Max
real        482.320     8.123       472.841     482.124     492.503     
user        4480.873    13.970      4454.818    4484.566    4495.999    
sys         183.942     1.294       182.310     184.074     185.812     
info: using existing install for 'stable-x86_64-unknown-linux-gnu'
info: default toolchain set to 'stable-x86_64-unknown-linux-gnu'

  stable-x86_64-unknown-linux-gnu unchanged - rustc 1.34.1 (fc50f328b 2019-04-24)

We’re looking at maybe a 2.4% (12s) improvement on average, but with a std dev of 8s, this is basically a wash.

That’s for a full-build right? What about with the following:

  • After a full build, just touch main.rs/lib.rs and try again
  • After a full build, touch all the src files in the main package and try again

Thanks for working on this! It's great! I posted about this a few months ago so I'm pleased to see you've implemented more or less exactly what I was thinking about.

I'm glad that there are no perf regressions, but I'd be surprised to see much improvement on machines with <=4 cores. But the 10+ core machines often end up starving when there's a bottleneck crate in the middle of the dep graph.

Even though the .rmeta is being used as the interface, I'm assuming that it has internal implementation details, since Rust relies a lot on its inlining, and it also needs to make sure the SVH is propagated through properly. I'm going to implement pipelining in Buck when I get the chance, and since Buck is entirely content/content-hash driven, it will demonstrate this one way or the other.

2 Likes

For Firefox there’s not really a main lib, so I’m not sure what we’d want to mess with. I currently have ~10,000 .rs files and ~400 lib|main.rs files under my tree.

What about “Release” builds? My understanding is that it is expected for bigger gains there.

Tried building websocat, multiple times both with CARGO_BUILD_PIPELINING=false and CARGO_BUILD_PIPELINING=true and saw no difference, both in release and in debug mode.

Is there a cheat sheet for how this calls rustc, it would be good to pull this potentially into the bazel rust rules.

Firefox “release” (as in non-debug), basically same results: 12s (2.7%) w/ stdev of 7s.

For quinn (146 targets to build):

$ cargo +nightly clean && cargo +nightly build
real	0m36.728s
user	3m33.460s
sys	0m17.615s
$ cargo +nightly clean && CARGO_BUILD_PIPELINING=true cargo +nightly build
       33.34 real       223.06 user        17.78 sys
$ cargo +nightly clean && time cargo +nightly build --release
real	1m11.022s
user	9m18.169s
sys	0m19.508s
$ cargo +nightly clean && CARGO_BUILD_PIPELINING=true time cargo +nightly build --release
       64.22 real       579.72 user        19.39 sys

So that’s a ~10% speedup for both release builds and debug builds (macOS laptop).

1 Like

Does this work with the compiler bootstrap? Has anyone profiled it?

3 Likes

Compiling Servo with time ./mach build --dev (which pretty calls cargo build) with rustc 1.36.0-nightly (50a0defd5 2019-05-21) on a desktop CPU with 4 cores / 8 threads.

  • First after cargo fetch and cargo clean. This involves 626 crates and significant amount and C and C++ code.
    real	9m30,827s	user	47m59,862s	sys	2m20,221s
    
  • Then after touch component/style/lib.rs. This rebuilds 10 crates, mostly sequentially because that part of the dependency graph is narrow.
    real	1m18,871s	user	1m3,592s	sys	0m9,204s
    

Same again with export CARGO_BUILD_PIPELINING=true:

  • Two full builds:

    real	9m22,346s	user	48m32,564s	sys	2m20,378s
    real	9m28,264s	user	48m24,739s	sys	2m20,078s
    
  • Two incremental builds:

    real	1m22,314s	user	1m5,825s	sys	0m9,117s
    real	1m19,668s	user	1m5,520s	sys	0m9,466s
    

Any difference seems to be within noise level. I expect that incremental compilation (which is enabled by default) with a perfect cache would cause ~zero time to be spent in LLVM, which reduces or cancels the benefit of pipelining.

In real day-to-day work with an actual code change and not just touch the incremental cache hit rate would not be 100%, but I expect it should still be pretty high since there is so much code and we’re typically only touching a small part at a time.

1 Like