Unfortunately, you’ll need to re-run the test as it was supposed to be CARGO_BUILD_PIPELINING (see Alex C’s comment later in this thread).
Unfortunately, you’ll need to re-run the test. The environment variable is CARGO_BUILD_PIPELINING (as Alex C. mentioned in a comment later in this thread correcting the mistake).
Some of my own repos here, all with some gains.
Processor info:
Processor Name: | Intel Core i7 |
---|---|
Processor Speed: | 3.1 GHz |
Number of Processors: | 1 |
Total Number of Cores: | 4 |
L2 Cache (per Core): | 256 KB |
L3 Cache: | 8 MB |
Repository: https://github.com/nlopes/avro-schema-registry
cargo +nightly build
Finished dev [unoptimized + debuginfo] target(s) in 3m 02s
717.53s user 56.80s system 424% cpu 3:02.31 total
CARGO_BUILD_PIPELINING=true cargo +nightly build
Finished dev [unoptimized + debuginfo] target(s) in 2m 53s
723.05s user 57.20s system 450% cpu 2:53.19 total
cargo +nightly build --release
Finished release [optimized] target(s) in 6m 15s
2185.83s user 66.93s system 599% cpu 6:15.75 total
CARGO_BUILD_PIPELINING=true cargo +nightly build --release
Finished release [optimized] target(s) in 5m 41s
2137.35s user 62.52s system 644% cpu 5:41.46 total
Repository: https://github.com/nlopes/arq
cargo +nightly build
Finished dev [unoptimized + debuginfo] target(s) in 28.39s
106.63s user 10.62s system 412% cpu 28.429 total
CARGO_BUILD_PIPELINING=true cargo +nightly build
Finished dev [unoptimized + debuginfo] target(s) in 24.68s
95.99s user 10.19s system 429% cpu 24.715 total
cargo +nightly build --release
Finished release [optimized] target(s) in 47.06s
253.93s user 11.03s system 562% cpu 47.088 total
CARGO_BUILD_PIPELINING=true cargo +nightly build --release
Finished release [optimized] target(s) in 45.88s
259.32s user 11.13s system 589% cpu 45.911 total
Tried it on a small pure rust library: https://github.com/ethereumproject/evm-rs
Full crate graph
mode | pipelined | time | change |
---|---|---|---|
debug | no | 13.28 | - |
debug | yes | 12.65 | 4.75% |
release | no | 15.52 | - |
release | yes | 13.81 | 11.02% |
Incremental Builds
touch src/lib.rs
before each build:
mode | pipelined | time | change |
---|---|---|---|
debug | no | 0.63 | - |
debug | yes | 0.63 | - |
release | no | 1.77 | - |
release | yes | 1.77 | - |
find . -name '*.rs' | xargs touch
before each build: same, difference within margin of error.
Impressive improvements on full graph builds, thanks for you work!
@alexcrichton I think the correct configuration in ~/.cargo/config
is
[build]
pipelining = true # since nightly-2019-05-17
Otherwise, I got this error:
error: invalid configuration for key `alias.build`
expected a list, but found a table for `alias.build` in /home/lzutao/.cargo/config
Great work – tried it in a private repository (on 2016 MBP, 2,9 GHz Intel Core i7):
Dev build went from 3m22.914s to 2m35.61s (35%) Release build went from 6m42.843s to 6m25.439s (4%)
Incremental dev build went from 0m16.556s to 0m16.238s (2%) Incremental release build went from 1m20.420s to 1m12.734s (12%)
On MBP 13 (with just 2 cores + 2 HT cores) it didn’t do much.
- debug without: 6m39.514s
- debug with: 6m39.335s
- release without: 6m34.476s
- release with: 6m34.876s
In my case the debug build is slower than release, because I use [profile.dev] opt-level = 1
plus llvm-dsymutil
is that slow on macOS.
Thanks again everyone for continuing to post data!
If you’re curious I’ve been collating the data here so far in a spreadsheet which shows the following statistics:
- Across the board there appears to be no regressions. The reductions in build time here I can’t reproduce locally and may have been PIPELINING/PIPELINED confusion (sorry!) and may also just be normal variance. In either case there hasn’t yet been a reliably reproducible regression!
- Build speeds can get up to almost 2x faster in some cases
- Across the board there’s an average 10% reduction in build times. The standard deviation though is pretty and it seems to confirm that you’re either seeing large-ish reductions or very little.
In any case, to me at least this is some very strong data that we should continue on the stabilization path for this cargo/rustc feature!
Development branch of nvimpam, slightly outdated commit & dependencies, beefy machine
Build Type | Default | Pipelined | Difference |
---|---|---|---|
debug | 13.2s | 12s | 9% faster |
release | 55.2s | 43.8s | 20.6% faster |
debug incr | 1.2s | 1.2s | no change |
release incr | 24.5s | 24.5s | no change |
Updated numbers w/ the right env var, still not a big change for Firefox debug builds. These are the averages for 6 builds each using small build script to automate things.
erahm@shetland:~/dev/mozilla-unified$ ./build.sh
info: using existing install for 'nightly-x86_64-unknown-linux-gnu'
info: default toolchain set to 'nightly-x86_64-unknown-linux-gnu'
nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.36.0-nightly (7d5aa4332 2019-05-16)
rustc 1.36.0-nightly (7d5aa4332 2019-05-16)
cargo 1.36.0-nightly (c4fcfb725 2019-05-15)
Initial run to populate ccache...done
Baseline run without pipelining
===============================
CARGO_BUILD_PIPELINING=
done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...===> multitime results
1: -r "echo done && echo -n clobbering... && ./mach clobber && echo done && echo -n building..." -q ./mach -l build.log build
Mean Std.Dev. Min Median Max
real 494.185 4.303 488.568 492.967 500.837
user 4410.067 11.823 4396.210 4407.836 4430.236
sys 183.863 1.372 181.518 184.094 185.909
Running with pipelining
=======================
CARGO_BUILD_PIPELINING=true
done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...done
clobbering...done
building...===> multitime results
1: -r "echo done && echo -n clobbering... && ./mach clobber && echo done && echo -n building..." -q ./mach -l build.log build
Mean Std.Dev. Min Median Max
real 482.320 8.123 472.841 482.124 492.503
user 4480.873 13.970 4454.818 4484.566 4495.999
sys 183.942 1.294 182.310 184.074 185.812
info: using existing install for 'stable-x86_64-unknown-linux-gnu'
info: default toolchain set to 'stable-x86_64-unknown-linux-gnu'
stable-x86_64-unknown-linux-gnu unchanged - rustc 1.34.1 (fc50f328b 2019-04-24)
We’re looking at maybe a 2.4% (12s) improvement on average, but with a std dev of 8s, this is basically a wash.
That’s for a full-build right? What about with the following:
- After a full build, just touch main.rs/lib.rs and try again
- After a full build, touch all the src files in the main package and try again
Thanks for working on this! It's great! I posted about this a few months ago so I'm pleased to see you've implemented more or less exactly what I was thinking about.
I'm glad that there are no perf regressions, but I'd be surprised to see much improvement on machines with <=4 cores. But the 10+ core machines often end up starving when there's a bottleneck crate in the middle of the dep graph.
Even though the .rmeta
is being used as the interface, I'm assuming that it has internal implementation details, since Rust relies a lot on its inlining, and it also needs to make sure the SVH is propagated through properly. I'm going to implement pipelining in Buck when I get the chance, and since Buck is entirely content/content-hash driven, it will demonstrate this one way or the other.
For Firefox there’s not really a main lib, so I’m not sure what we’d want to mess with. I currently have ~10,000 .rs
files and ~400 lib|main.rs
files under my tree.
What about “Release” builds? My understanding is that it is expected for bigger gains there.
Tried building websocat
, multiple times both with CARGO_BUILD_PIPELINING=false
and CARGO_BUILD_PIPELINING=true
and saw no difference, both in release and in debug mode.
Is there a cheat sheet for how this calls rustc, it would be good to pull this potentially into the bazel rust rules.
Firefox “release” (as in non-debug), basically same results: 12s (2.7%) w/ stdev of 7s.
For quinn (146 targets to build):
$ cargo +nightly clean && cargo +nightly build
real 0m36.728s
user 3m33.460s
sys 0m17.615s
$ cargo +nightly clean && CARGO_BUILD_PIPELINING=true cargo +nightly build
33.34 real 223.06 user 17.78 sys
$ cargo +nightly clean && time cargo +nightly build --release
real 1m11.022s
user 9m18.169s
sys 0m19.508s
$ cargo +nightly clean && CARGO_BUILD_PIPELINING=true time cargo +nightly build --release
64.22 real 579.72 user 19.39 sys
So that’s a ~10% speedup for both release builds and debug builds (macOS laptop).
Does this work with the compiler bootstrap? Has anyone profiled it?
Compiling Servo with time ./mach build --dev
(which pretty calls cargo build
) with rustc 1.36.0-nightly (50a0defd5 2019-05-21)
on a desktop CPU with 4 cores / 8 threads.
- First after
cargo fetch
andcargo clean
. This involves 626 crates and significant amount and C and C++ code.real 9m30,827s user 47m59,862s sys 2m20,221s
- Then after
touch component/style/lib.rs
. This rebuilds 10 crates, mostly sequentially because that part of the dependency graph is narrow.real 1m18,871s user 1m3,592s sys 0m9,204s
Same again with export CARGO_BUILD_PIPELINING=true
:
-
Two full builds:
real 9m22,346s user 48m32,564s sys 2m20,378s real 9m28,264s user 48m24,739s sys 2m20,078s
-
Two incremental builds:
real 1m22,314s user 1m5,825s sys 0m9,117s real 1m19,668s user 1m5,520s sys 0m9,466s
Any difference seems to be within noise level. I expect that incremental compilation (which is enabled by default) with a perfect cache would cause ~zero time to be spent in LLVM, which reduces or cancels the benefit of pipelining.
In real day-to-day work with an actual code change and not just touch
the incremental cache hit rate would not be 100%, but I expect it should still be pretty high since there is so much code and we’re typically only touching a small part at a time.