Evaluating pipelined rustc compilation

Compiling Noria master as of 668785857874a10254cd6d2ba1ea764db842528d:


On my laptop, Intel i7-8650U (8 HT cores):

build default pipelined
debug 4m 48s 4m 43s
debug (incr) 55s 1m 03s
release 14m 00s 13m 56s
release (incr) 2m 45s 2m 33s

On a desktop machine, Intel Xeon E3-1240 v5 (8 HT cores):

build default pipelined
debug 4m 42s 4m 15s
debug (incr) 1m 02s 54s
release 11m 39s 11m 33s
release (incr) 2m 28s 2m 44s

On a server machine, Intel Xeon E5-2660 v3 (40 HT cores across 2 NUMA nodes):

build default pipelined
debug 2m 48s 2m 45s
debug (incr) 47s 38s
release 5m 09s 5m 07s
release (incr) 1m 40s 1m 40s

If anyone wants to replicate, this is the diff I applied for the incremental tests:

diff --git a/noria/src/data.rs b/noria/src/data.rs
index 26c99ed9..f2747e57 100644
--- a/noria/src/data.rs
+++ b/noria/src/data.rs
@@ -51,6 +51,7 @@ impl fmt::Display for DataType {
             DataType::Int(n) => write!(f, "{}", n),
             DataType::BigInt(n) => write!(f, "{}", n),
             DataType::Real(i, frac) => {
+               {}
                 if i == 0 && frac < 0 {
                     // We have to insert the negative sign ourselves.
                     write!(f, "-0.{:09}", frac.abs())
2 Likes

I have some numbers lark, but they seem crazy (compare the plain debug and release builds).

Sorry for the presentation, but there’s too many numbers and I’m not sure how to show them properly.

# cargo build  1176.61s user 88.65s system 879% cpu 2:23.93 total
touch src/main.rs
# cargo build  4.01s user 1.82s system 100% cpu 5.815 total
find . -name '*.rs' | xargs touch
# cargo build  20.69s user 6.73s system 122% cpu 22.452 total

# cargo build --release  1055.34s user 62.75s system 807% cpu 2:18.53 total
touch src/main.rs
# cargo build --release  1.76s user 0.50s system 101% cpu 2.222 total
find . -name '*.rs' | xargs touch
# cargo build --release  418.70s user 13.84s system 618% cpu 1:09.97 total

# CARGO_BUILD_PIPELINING=true cargo build  1180.08s user 93.53s system 981% cpu 2:09.72 total
touch src/main.rs
# CARGO_BUILD_PIPELINING=true cargo build  9.65s user 2.19s system 100% cpu 11.825 total
time CARGO_BUILD_PIPELINING=true cargo build
# CARGO_BUILD_PIPELINING=true cargo build  20.84s user 6.72s system 135% cpu 20.384 total

# CARGO_BUILD_PIPELINING=true cargo build --release  1302.40s user 80.71s system 1251% cpu 1:50.51 total
touch src/main.rs
# CARGO_BUILD_PIPELINING=true cargo build --release  1.25s user 0.42s system 102% cpu 1.639 total
find . -name '*.rs' | xargs touch
# CARGO_BUILD_PIPELINING=true cargo build --release  559.51s user 20.39s system 1322% cpu 43.861 total

For Zola (the next branch):

cargo clean && cargo build

real 2m11.074s
user 16m23.520s
sys 1m3.213s

After find . -name '*.rs' | xargs touch:

real 0m18.012s
user 0m16.792s
sys 0m5.348s

cargo clean && CARGO_BUILD_PIPELINING=true cargo build

real 2m8.212s
user 16m52.040s
sys 1m4.869s

After find . -name '*.rs' | xargs touch:

real 0m17.894s
user 0m18.153s
sys 0m6.028s

cargo clean && cargo build --release

real 4m41.741s
user 44m40.577s
sys 1m11.651s

After find . -name '*.rs' | xargs touch:

real 0m49.509s
user 4m32.972s
sys 0m5.965s

cargo clean && CARGO_BUILD_PIPELINING=true cargo build --release

real 4m37.233s
user 45m19.495s
sys 1m10.882s

After find . -name '*.rs' | xargs touch:

real 0m38.826s
user 5m30.089s
sys 0m6.621s

It doesn’t look like there are any noticeable changes except the incremental in --release.

1 Like

for my real-world project which contains a lot of deps crates (~227) including database orm like diesel and tokio, here is the result

Build Type Default Pipelined Difference
Debug 7m 17s 7m 04s 2.97% :arrow_up:
Release 13m 37s 12m 30s 8.20% :arrow_double_up:

Compiled on Windows 10 System x64 with Intel Core 2 Quad CPU Q9550 @ 2.83 GHz

I will try this again on my linux machine with the same spec.

Notice: Most of this compilation time in diesel crate, i'm using diesel v1.3.3 with postgres

Another Notice: I'm using these settings in my Cargo.toml for this project workspace:

[profile.release]
opt-level = 3
debug = false
lto = true # this maybe slow down ?
debug-assertions = false
codegen-units = 16
panic = 'unwind'
incremental = false
overflow-checks = false
1 Like

Another project I can think of which hits the sweet spot here is compilation of wasm-bindgen’s macro:

Full build Default Pipelined Difference
debug 10.28 7.88 23% faster
release 19.25 10.49 46% faster

after touching all *.rs files

Incremental build Default Pipelined
debug 2.15 2.00
release 10.04 6.35

This is a crate where syn takes quite some time to codegen, but metadata comes out quickly and other crates can progress quickly too.

1 Like

Another data point, which matches the expectations: I ran this for volta and these were the results:

$ hyperfine \
    --prepare 'cargo +nightly build --all && fd ".*.rs" | xargs touch' \
    'cargo +nightly build --all --release' \
    'env CARGO_BUILD_PIPELINED=true cargo +nightly build --all --release'
Benchmark #1: cargo +nightly build --all --release
  Time (mean ± σ):     20.924 s ±  0.803 s    [User: 90.686 s, System: 2.266 s]
  Range (min … max):   20.243 s … 22.306 s    10 runs
 
Benchmark #2: env CARGO_BUILD_PIPELINED=true cargo +nightly build --all --release
  Time (mean ± σ):     20.601 s ±  0.584 s    [User: 90.473 s, System: 2.216 s]
  Range (min … max):   20.145 s … 21.903 s    10 runs
 
Summary
  'env CARGO_BUILD_PIPELINED=true cargo +nightly build --all --release' ran
    1.02 ± 0.05 times faster than 'cargo +nightly build --all --release'

I ran it in a variety of modes, and all came out like this: effectively no change. We have a couple internal libraries, but there’s nothing like syn in the mix, so it just doesn’t really affect things in this case. It basically doesn’t matter. And again, that’s exactly as expected given the dynamics of the codebase; I was not surprised at the outcome.

Ran this on a private code base that has about 268 dependencies, using a 6-core, 12 threads Intel i7-8750H on Windows 10. I’ll try to report back with incremental compilation time numbers when I get a chance.

Mode Default Pipelined Difference
Debug 110 90 18.18% faster
Release 202 187 7.43% faster

Doesn’t seem to be much difference for Firefox debug builds. Maybe 3.5% faster, but that’s in the noise.

$ unset CARGO_BUILD_PIPELINED && ./mach build && \
  ./mach clobber --full && time ./mach -l build.log build > /dev/null && \
  export CARGO_BUILD_PIPELINED=true && \
  ./mach clobber --full && time ./mach -l build.log build > /dev/null

real	6m41.159s
user	39m14.728s
sys	2m31.432s

real	6m27.846s
user	40m7.269s
sys	2m33.312s

Building rust-analyzer (about 250 crates including dependencies), this commit on a quad-core laptop

  • debug 1m 57s
  • debug, pipelined 1m 50s
  • release 6m 29s
  • release pipelined 6m 49s (20 seconds slower)
λ rustc --version
rustc 1.36.0-nightly (73a3a90d2 2019-05-17)

Can not reproduce your results on my side.

I use 16 core (32 hyperthread) machine / 64GB RAM and

$ cargo --version
cargo 1.36.0-nightly (c4fcfb725 2019-05-15)
$ rustc --version
rustc 1.36.0-nightly (73a3a90d2 2019-05-17)

wasmtime@67edb00f29b62864b00179fe4bfa99bc29973285

I try build.pipelined = true in ~/.cargo/config and CARGO_BUILD_PIPELINED=true environment variable, but debug build time is the same: 42.79s with pipeline and 43.25s.

Is any way to see that pipeline option is really activated? Somewhere in CARGO_LOG=debug or anywhere?

Running cargo build --release I got 5m 48s without pipelining and 6m 05s with, but I don’t really trust these measurements because my laptop went into thermal throttling.


In other words, @alexcrichton, is it CARGO_BUILD_PIPELINING (as I assume) or CARGO_BUILD_PIPELINED? You mention both in the first post (also in the .cargo/config file). I suspect that some of the negative results posted here should be taken with a grain of salt.

@chriskrycho @erahm

Thansk, indeed, CARGO_BUILD_PIPELINED is useless, and CARGO_BUILD_PIPELINING works.

Can you report the times, then? :laughing:

@matklad Rust analyzer

|Full Build Type|Default|Pipelined| | --- | --- | --- | --- | |debug|1m 18s|1m 01s| |release|2m 54s|2m 21s|

wasmtime

|Full Build Type|Default|Pipelined| | --- | --- | --- | --- | |debug|41.42s|35.95s| |release|1m 01s|49.25s|

with CARGO_BUILD_PIPELINED instead of CARGO_BUILD_PIPELINING I can not see the difference at all.

2 Likes

Trust-DNS

from the root, all crates, on my older laptop

Nightly

cargo +nightly build --release

1256.42s user 33.73s system 678% cpu 3:10.17 total

Nightly Pipelined

CARGO_BUILD_PIPELINING=true cargo +nightly build --release

1265.26s user 31.58s system 696% cpu 3:06.32 total

I’ll be enabling in CI on nightly for testing purposes.

By the way, may be the idea is the same as for drug test. The first group of sick persons get real drug, the second group no drugs, and the third get fake drug (placebo), but they think that this is real drug.

4 Likes

FYI, for any confusion around the environment variable name, here is the cargo test for reference:

1 Like

I guess the really big gains would come if we switched from timestamp-based to content-based caching, where this would allow us to get cache hits on a crate if its dependencies only changed in implementation and not interface (which presumably is roughly what the metadata file contains)?

For the pants engine at sha 5f2d91d8, 8 cores / 16 hyperthreads:

clean debug: 3m6, pipelined: 3m3, improvement: 1.6%

one line interface change to the deepest crate in our workspace debug: 0m24, pipelined: 0m24, improvement: 0

clean release: 9m12, pipelined: 8m53, improvement: 3.4%

one line interface change to the deepest crate in our workspace release: 2m45, pipelined: 2m37, improvement: 4.8%

1 Like

Metadata files also contain the mir for generic functions and also a SVH (strict version hash) which depends on the contents of the whole crate.

Oh dear sorry for the confusion all! I’ve updated the OP to mention that the option is indeed CARGO_BUILD_PIPELINING, not “pipelined”

2 Likes