Help us benchmark incremental compilation!


#42

Thanks everyone for the great feedback so far! This is very valuable to us.

Some remarks that might be of interest:

  • The initial incremental build is faster than the from-scratch build because it uses all your CPU cores during optimization and code generation (the last part of the compilation pipeline). The price you pay for making this possible is decreased runtime performance.
  • It is expected that the runtime performance of incrementally compiled programs is worse. The main reason for this that the compiler can do less inlining. That being said, functions marked with #[inline] will be available for inlining even in incremental mode. So doing some profiling and then making hot functions #[inline] might help quite a bit.
  • ThinLTO will very probably be compatible with incremental compilation. My guess is that it will lead to longer compile times compared to non-LTO incremental compilation but the resulting binaries should have pretty good runtime performance.
  • The steps in the original post do not contain a measurement for non-incrementally rebuilding only the main crate, i.e. with dependencies already built. That means that build times from step 3 are not directly comparable to the times from steps 5 and 6. The build times from step 3 and step 4 are comparable however. @GolDDranks had the right idea with their steps 3.1 and 3.2. Those are the equivalent to 5 and 6.

#43

Not a compiler team member but I’ve been bitten by the poor accuracy of perf once already. Currently, a 20% difference in memory consumption (max-rss) and 10-15% difference in cycles is within the margin of error (this data is from a pull-request that changed nothing). This is too high. At the very least perf should do three runs and show an error estimate (min, max, mean, delta) or similar.

Otherwise we end up making performance decisions based on “noise”.

Ideally perf would include benchmarks that go beyond benchmarking the compiler and include other applications (servo, tokio servers, standard hpc benchmarks ported to rust: SPEC, Graph 500, HPC Challenge, …). Otherwise we might end up with a language that is great for writing fast compilers, but not that great for everything else.


#44

Project: https://github.com/cloudfoundry/jvmkill

Build command: cargo +nightly build --release -p jvmkill

Touch command: touch jvmkill/src/lib.rs

Results:

  1. 352.87s user 9.93s system 197% cpu 3:03.90 total
  2. 429.39s user 11.11s system 496% cpu 1:28.64 total
  3. 2.94s user 0.99s system 95% cpu 4.127 total
  4. 3.06s user 0.95s system 101% cpu 3.942 total

Without CARGO_INCREMENTAL=1:

3.1. 5.01s user 0.88s system 99% cpu 5.893 total

3.2. 5.03s user 0.89s system 99% cpu 5.919 total


#45

cargo commit c212f30d.

$unset CARGO_INCREMENTAL
$ rustc +nightly --version --verbose
rustc 1.23.0-nightly (02004ef78 2017-11-08)
binary: rustc
commit-hash: 02004ef78383cb174a41df7735a552823fa10b90
commit-date: 2017-11-08
host: x86_64-apple-darwin
release: 1.23.0-nightly
LLVM version: 4.0

# point 3
time cargo +nightly build --release
498.21s user 21.87s system 191% cpu 4:31.41 total

# point 4
CARGO_INCREMENTAL=1 time cargo +nightly build --release
617.95s user 27.10s system 526% cpu 2:02.53 total

# point 5
$ touch lib.rs 
$ CARGO_INCREMENTAL=1 time cargo +nightly build --release
17.30s user 2.16s system 94% cpu 20.628 total

# Point 6
# Add "let _x = 5;" to src/cargo/lib.rs
$ CARGO_INCREMENTAL=1 time cargo +nightly build --release
17.86s user 2.36s system 94% cpu 21.387 total

#46

strs commit 34ed3262

  1. 158,37s user 1,32s system 150% cpu 1:45,89 total
  2. 159,05s user 1,52s system 302% cpu 53,025 total
  3. 11,69s user 0,31s system 119% cpu 10,069 total
  4. 13,56s user 0,29s system 129% cpu 10,688 total

How I normally run with LTO:

  1. 275,98s user 1,72s system 160% cpu 2:53,49 total

#47

Like @GolDDranks I’m finding it hard to assess the magnitude of the QoL improvement here when all of the non-incremental benchmarks include the cost of building all dependencies.


#48

Wow! I was afraid that I wouldn’t be able to benefit from this due to working in a workspace, but it turns out that incremental sees through path dependencies just fine! This is incredible.

$ rustc +nightly --version --verbose
rustc 1.23.0-nightly (79cfce3d3 2017-11-12)
binary: rustc
commit-hash: 79cfce3d35cc8c2716c65650389d79551c2ea6ea
commit-date: 2017-11-12
host: x86_64-unknown-linux-gnu
release: 1.23.0-nightly
LLVM version: 4.0

$ cargo +nightly build --release
$ rm -rf target && time cargo +nightly build --release
real    6m29.245s
user    15m56.946s
sys     0m7.118s

$ touch src/lib.rs && time cargo +nightly build --release
real    0m0.472s
user    0m0.408s
sys     0m0.071s

$ # That's because this is a workspace!
$ touch src/structure/lib.rs && time cargo +nightly build --release
   (this time 7 packages in the source tree get recompiled...)

real    1m27.044s
user    1m35.722s
sys     0m0.424s

$ vim src/structure/core/lattice.rs   # add 'let () = ();' to a function
$ time cargo +nightly build --release
real    1m29.811s
user    1m38.614s
sys     0m0.485s

$ rm -rf target && time CARGO_INCREMENTAL=1 cargo +nightly build --release
real    4m7.995s
user    15m40.315s
sys     0m8.075s

$ touch src/lib.rs && time CARGO_INCREMENTAL=1 cargo +nightly build --release
real    0m0.526s
user    0m0.448s
sys     0m0.083s

$ touch src/structure/lib.rs && time CARGO_INCREMENTAL=1 cargo +nightly build --release
real    0m8.799s
user    0m10.860s
sys     0m0.459s

$ vim src/structure/core/lattice.rs   # add another 'let () = ();' to the function
$ time CARGO_INCREMENTAL=1 cargo +nightly build --release
real    0m11.471s
user    0m14.872s
sys     0m0.554s

Here it is again all table-like:

action non-incremental incremental
rm -rf target 389.245s 247.995s
touch root lib.rs 0.472s 0.526s
touch subcrate lib.rs 87.044s 8.799s
add let () = (); in subcrate 89.811s 11.471s

#49

fuzzy-pickles at abf5ae5b0f3090eb6b73ecd08a29e3023257d045:

$ cargo +nightly build --release
    Finished release [optimized] target(s) in 70.54 secs
$ rm -rf target/
$ time cargo +nightly build --release
    Finished release [optimized] target(s) in 65.70 secs
real	1m5.919s
user	1m9.412s
sys	0m2.051s
$ rm -rf target/
$ CARGO_INCREMENTAL=1 time cargo +nightly build --release
    Finished release [optimized] target(s) in 44.38 secs
       44.57 real        98.13 user         3.01 sys
$ touch src/lib.rs
$ CARGO_INCREMENTAL=1 time cargo +nightly build --release
    Finished release [optimized] target(s) in 8.74 secs
        8.97 real         9.51 user         1.26 sys
# Added let _a = 1+1;
$ CARGO_INCREMENTAL=1 time cargo +nightly build --release
    Finished release [optimized] target(s) in 20.29 secs
       20.50 real        23.55 user         1.47 sys
$ rustc +nightly --version
rustc 1.23.0-nightly (e97ba8328 2017-11-25)
$ cargo +nightly --version
cargo 0.24.0-nightly (abd137ad1 2017-11-12)

#50

Tested with this script on a small codebase (~1k loc) without external dependencies.

full build

real	0m1,694s
user	0m1,645s
sys	0m0,057s

full incremental build

real	0m1,612s
user	0m2,778s
sys	0m0,095s

touch lib.rs

real	0m0,822s
user	0m0,742s
sys	0m0,084s

add noop

real	0m0,186s
user	0m0,174s
sys	0m0,011s

touch and noop timings vary. I’ve seen the timings reversed and also both at 0,8s real in the same run.

Performance takes a nosedive though:

name                              bench_non_incr.txt ns/iter  bench_incr.txt ns/iter  diff ns/iter   diff %  speedup 
easy_sudokus_solve                618,866                     1,249,210                    630,344  101.85%   x 0.50 
easy_sudokus_solve_at_most_100    639,702                     1,296,359                    656,657  102.65%   x 0.49 
easy_sudokus_solve_one            617,948                     1,250,965                    633,017  102.44%   x 0.49 
easy_sudokus_solve_unique         641,940                     1,291,450                    649,510  101.18%   x 0.50 
hard_sudokus_solve                15,041,161                  37,084,056                22,042,895  146.55%   x 0.41 
hard_sudokus_solve_at_most_100    32,144,725                  85,086,005                52,941,280  164.70%   x 0.38 
hard_sudokus_solve_one            14,924,220                  39,948,986                25,024,766  167.68%   x 0.37 
hard_sudokus_solve_unique         32,019,912                  79,508,623                47,488,711  148.31%   x 0.40 
medium_sudokus_solve              904,421                     2,662,662                  1,758,241  194.41%   x 0.34 
medium_sudokus_solve_at_most_100  1,163,545                   3,316,556                  2,153,011  185.04%   x 0.35 
medium_sudokus_solve_one          903,868                     2,878,149                  1,974,281  218.43%   x 0.31 
medium_sudokus_solve_unique       1,164,439                   3,343,948                  2,179,509  187.17%   x 0.35