Help us benchmark incremental compilation!

These numbers are for Alacritty with release.lto = true. Numbers are generated with lto enabled since the project has this enabled by default.

  1. 763.81s user 13.75s system (Fresh build)
  2. 776.68s user 18.35s system (Clean build)
  3. (CARGO_INCREMENTAL=1 and clean target)

Oops, apparently this doesn’t work with LTO.

error: can't perform LTO when compiling incrementally

Starting over without LTO flag, but this basically means incremental won’t be usable by default for Alacritty.

  1. 626.46s user 17.02s system (Fresh build)
  2. 630.91s user 18.37s system (Clean build)
  3. 589.52s user 19.27s system (CARGO_INCREMENTAL=1 and clean target)
  4. 12.40s user 2.18s system (CARGO_INCREMENTAL=1 and touch src/main.rs)
  5. 55.97s user 1.88s system (CARGO_INCREMENTAL=1 and noop change in fn main())

Regarding the LTO issue, it would be nice if this either worked with incremental compilation, or if there could be an additional build profile. For Alacritty, the use cases would be

  • Release: LTO, optimizations
  • Dev: Want optimizations, incremental build
  • Debug: No optimizations; helpful for viewing complete call stacks when digging into perf issues.
1 Like

From what I hear it is indeed planned to make this work with ThinLTO, yes. I’ll be very interested to see combined compiletime/runtime benchmarks once all that is ready. :slight_smile:

Yeah, I suggested this a while ago too. This would perfectly capture common workflows. You would use dev most of the time while developing. When you encounter a bug, you want to preserve all the debugging information and so you use debug to find the bug (due to no optimizations, this runs way slower, so you don't want it during normal development). And then when releasing you use the release build.

4 Likes

Results for livesplit-core

Test Non-Incremental Incremental
Full Rebuild 2m02.637s 1m23.640s
Touch 0m23.350s 0m09.660s
Small Change 0m23.043s 0m09.780s

Results for https://github.com/geeny/linux-hub-sdk/, specifically commit 009f3bf

Test Non-incremental Incremental
Full rebuild 3:11.08 total 2:29.43 total
Touch 30.142 total 1.902 total
Small change 30.183 total 2.181 total

TL;DR: 15x-ish speedup. Awesome :slight_smile:

Rust and Cargo versions used:

$ rustc +nightly --version
rustc 1.23.0-nightly (2be4cc040 2017-11-01)
$ cargo +nightly --version
cargo 0.24.0-nightly (859c2305b 2017-10-29)

NOTES: the file I touched was src/lib.rs, and the noop was to add let _x = 5; to the new() method of src/interface/sdk.rs.

EDIT3: Originally, I reported no speedup from incremental, however, I was not properly exporting the incremental flag. My results above now reflect this. Previous edits were removed.

rustc 1.23.0-nightly (2be4cc040 2017-11-01)

regular:     2m11.174s real     5m3.467s user        20.46s sys
incremental:     61.13 real       323.72 user        23.16 sys
touch:            6.15 real        16.88 user         0.80 sys
trivial change:   3.63 real         2.64 user         0.85 sys
$ cargo --version
cargo 0.24.0-nightly (859c2305b 2017-10-29)
$ rustc --version
rustc 1.23.0-nightly (2be4cc040 2017-11-01)

bindgen

Fresh release build

$ rm -rf target/
$ time cargo +nightly build --release
<snip>
    Finished release [optimized] target(s) in 168.22 secs

real	2m48.579s
user	4m33.232s
sys	0m3.316s

Fresh incremental+release build

$ rm -rf target/
$ CARGO_INCREMENTAL=1 time cargo +nightly build --release
<snip>
    Finished release [optimized] target(s) in 44.76 secs

real	0m45.112s
user	4m41.256s
sys	0m3.988s

touch src/lib.rs and incremental rebuild

$ touch src/lib.rs
$ time cargo +nightly build --release
   Compiling bindgen v0.31.3 (file:///home/fitzgen/rust-bindgen)
    Finished release [optimized] target(s) in 8.29 secs

real	0m8.639s
user	0m8.220s
sys	0m0.504s

Add a public no-op method and incremental rebuild

$ time cargo +nightly build --release
   Compiling bindgen v0.31.3 (file:///home/fitzgen/rust-bindgen)
    Finished release [optimized] target(s) in 9.47 secs

real	0m9.820s
user	0m11.708s
sys	0m0.564s

Really impressed and surprised that fresh incremental builds are so much faster than fresh non-incremental builds!

I second that we should consider adding a --debug flag to Cargo for this workflow, to guarantee no runtime regressions due to incremental compilation. However, if ThinLTO really is as good as promised, maybe that won’t be an issue? We’ll have to wait and see. :stuck_out_tongue:

Slight nit: instead of CARGO_INCREMENTAL=1 time cargo +nightly build --release, use time CARGO_INCREMENTAL=1 cargo +nightly build --release (the former uses /usr/bin/time, the latter uses the shell builtin time)

That being said, here are the numbers for sccache (rev cbb72b80df248eed18b399da7e0d77cbaedb3b07):

$ rustc +nightly --version
rustc 1.23.0-nightly (2be4cc040 2017-11-01)
$ time cargo +nightly build --release
    Finished release [optimized + debuginfo] target(s) in 298.96 secs

real	4m59.237s
user	14m25.412s
sys	0m13.580s
$ time CARGO_INCREMENTAL=1 cargo +nightly build --release
    Finished release [optimized + debuginfo] target(s) in 159.19 secs

real	2m39.467s
user	16m7.767s
sys	0m18.661s
$ touch src/main.rs; time CARGO_INCREMENTAL=1 cargo +nightly build --release
    Finished release [optimized + debuginfo] target(s) in 9.58 secs

real	0m9.855s
user	0m9.314s
sys	0m0.580s
# add println to main
$ time CARGO_INCREMENTAL=1 cargo +nightly build --release
    Finished release [optimized + debuginfo] target(s) in 42.21 secs

real	0m42.489s
user	3m31.391s
sys	0m2.315s

Is it only me, or are the instructions missing the measurements for non-fresh non-incremental builds? The number 3. contains all the work for building the crate itself and all the dependencies, and then there’s the incremental stuff, but the non-fresh incremental builds can’t hardly be compared to the full build time, only to the single-crate (dependencies already built) time. I added touching and trivial changes in non-incremental mode as numbers 3.1 and 3.2 to account for this.

I tried compiling the rls crate which I have been playing with lately.

Non-incremental builds

3. rm -rf target && CARGO_INCREMENTAL=0 time cargo +nightly build --release

358.96 real      1126.37 user        40.12 sys

3.1. touch src/main.rs && CARGO_INCREMENTAL=0 time cargo +nightly build --release

81.81 real        80.89 user         0.85 sys

3.2. adding a println! statement and doing CARGO_INCREMENTAL=0 time cargo +nightly build --release

89.60 real        86.64 user         1.57 sys

Incremental builds

4. rm -rf target && CARGO_INCREMENTAL=1 time cargo +nightly build --release

194.50 real      1158.00 user        43.83 sys

5. touch src/main.rs && CARGO_INCREMENTAL=1 time cargo +nightly build --release

8.11 real         7.27 user         0.92 sys

6. adding a println! statement and doing CARGO_INCREMENTAL=1 time cargo +nightly build --release

8.01 real         7.32 user         0.86 sys

I’d like to get a confirmation from @aturon, is missing touch and trivial changes with non-incrementals in the instructions just an omission or is there some reason not to do comparisons/not to be interested with that?

Edit: I almost forgot to say: the incremental builds seem to provide HUGE wins. So big kudos to the implementors! I already said this in twitter, but… wow, I feel this is like unlocking a superpower on Rust!

I made a todo list! It’s at https://github.com/gdox/agenda (Shameless self promotion)

It uses about six subcrates and a very weird parser, but anyway.

  1. Clean
real	1m29.321s
user	2m22.800s
sys	0m1.016s
  1. Clean, inc
real	0m54.704s
user	2m45.228s
sys	0m1.036s
  1. Clean, touch
real	0m1.834s
user	0m1.724s
sys	0m0.112s
  1. Clean, method change
real	0m2.542s
user	0m3.172s
sys	0m0.120s

What I find most interesting is that CARGO_INCREMENTAL apparently allows for more parallelism: even though the initial compilation takes more CPU-time, it does execute 50% faster.

For amethyst, https://github.com/amethyst/amethyst/.

  1. time cargo +nightly build --release
real	2m15.877s
user	7m33.305s
sys	0m15.524s
  1. CARGO_INCREMENTAL=1 time cargo +nightly build --release
      105.19 real       520.84 user        17.94 sys
  1. CARGO_INCREMENTAL=1 time cargo +nightly build --release
        9.93 real        11.42 user         2.65 sys
  1. CARGO_INCREMENTAL=1 time cargo +nightly build --release
       10.20 real        11.70 user         2.58 sys

For skim, commit eb60583

  • point 3

    268.11s user 1.46s system 133% cpu 3:22.68 total

  • point 4 (incremental)

    300.31user 3.45system 1:33.42elapsed 325%CPU (0avgtext+0avgdata 1596688maxresident)k 0inputs+137232outputs (0major+722874minor)pagefaults 0swaps

  • point 5 (touch main.rs)

    3.60user 0.35system 0:03.92elapsed 100%CPU (0avgtext+0avgdata 1412688maxresident)k 0inputs+39736outputs (0major+96947minor)pagefaults 0swaps

  • point 6 (add no-op)

    5.89user 0.35system 0:05.97elapsed 104%CPU (0avgtext+0avgdata 1391664maxresident)k 0inputs+40040outputs (0major+103578minor)pagefaults 0swaps

$ rustc +nightly -V
rustc 1.23.0-nightly (e340996ff 2017-11-02)

For https://github.com/reproto/reproto, this is a multi-module project.

Touching and modifications are done in the core module which everything (an additional 8 modules) depend on.

$> rm -rf target
$> time cargo +nightly build --release
...
cargo +nightly build --release  904.66s user 10.51s system 214% cpu 7:05.93 total
$> export CARGO_INCREMENTAL=1
$> rm -rf target
$> time cargo +nightly build --release
...
cargo +nightly build --release  1133.48s user 14.63s system 321% cpu 5:56.75 total
$> export CARGO_INCREMENTAL=1
$> touch core/src/lib.rs
$> time cargo +nightly build --release
...
cargo +nightly build --release  28.76s user 2.30s system 153% cpu 20.287 total
$> export CARGO_INCREMENTAL=1
$> *edit function*
$> time cargo +nightly build --release
...
cargo +nightly build --release  30.07s user 2.18s system 155% cpu 20.725 total

Project is an Ethereum client by Parity: https://github.com/paritytech/parity

$ git show | head -1
commit 39e27076ad1547b6db6fa0b7e46ebcfdb14ea0f1

$ rustc --version --verbose
rustc 1.21.0 (3b72af97e 2017-10-09)
binary: rustc
commit-hash: 3b72af97e42989b2fe104d8edbaee123cdf7c58f
commit-date: 2017-10-09
host: x86_64-unknown-linux-gnu
release: 1.21.0
LLVM version: 4.0

$ time cargo +nightly build --release
real    18m34.595s
user    42m51.428s
sys     0m51.128s

$ time CARGO_INCREMENTAL=1 cargo +nightly build --release
real    8m51.282s
user    44m34.888s
sys     0m53.996s

$ touch util/network/src/lib.rs
$ time CARGO_INCREMENTAL=1 cargo +nightly build --release
real    1m0.223s
user    1m1.328s
sys     0m2.864s

$ echo "fn _dummy() {}" >> util/network/src/lib.rs
$ time CARGO_INCREMENTAL=1 cargo +nightly build --release
real    0m59.437s
user    1m0.364s
sys     0m2.576s

My machine is

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz
Stepping:              2
CPU MHz:               1210.917
CPU max MHz:           3700,0000
CPU min MHz:           1200,0000
BogoMIPS:              6996.24
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-11
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

It belongs to the company I work for. I can run some commands if that would help.

This is for one of our projects at work. It has a fair number of deps (internal and especially external).

3) NO CARGO_INCREMENTAL - full
    Finished release [optimized] target(s) in 220.66 secs

real	3m40.889s
user	15m22.129s
sys	0m11.130s


4) CARGO_INCREMENTAL - full
    Finished release [optimized] target(s) in 177.1 secs

real	2m57.240s
user	15m48.252s
sys	0m12.928s


5) CARGO_INCREMENTAL - touch
    Finished release [optimized] target(s) in 8.9 secs

real	0m8.322s
user	0m7.764s
sys	0m0.556s


6) CARGO_INCREMENTAL - edit (was a different file than lib.rs)
    Finished release [optimized] target(s) in 8.15 secs

real	0m8.383s
user	0m8.043s
sys	0m0.555s

For rustling-ontology: (Probabilistic parser for entity detection) https://github.com/snipsco/rustling-ontology commit: 3969500

2 - cargo +nightly build --release 1200.62s user 12.21s system 103% cpu 19:30.76 total

3 - cargo +nightly build --release 664.10s user 11.36s system 221% cpu 5:05.49 total

4 - cargo +nightly build --release 1.06s user 0.42s system 83% cpu 1.767 total

5 - cargo +nightly build --release 2.96s user 0.39s system 127% cpu 2.638 total

(noop change: fn _dummy() {} in one function of the main library)

For youtube3-util:

3: non-incremental: 5:24

4: fresh incremental: 3:33

5: “touch” incremental: 2.46s

6: trivial change incremental: 2.45s

Thanks everyone for the great feedback so far! This is very valuable to us.

Some remarks that might be of interest:

  • The initial incremental build is faster than the from-scratch build because it uses all your CPU cores during optimization and code generation (the last part of the compilation pipeline). The price you pay for making this possible is decreased runtime performance.
  • It is expected that the runtime performance of incrementally compiled programs is worse. The main reason for this that the compiler can do less inlining. That being said, functions marked with #[inline] will be available for inlining even in incremental mode. So doing some profiling and then making hot functions #[inline] might help quite a bit.
  • ThinLTO will very probably be compatible with incremental compilation. My guess is that it will lead to longer compile times compared to non-LTO incremental compilation but the resulting binaries should have pretty good runtime performance.
  • The steps in the original post do not contain a measurement for non-incrementally rebuilding only the main crate, i.e. with dependencies already built. That means that build times from step 3 are not directly comparable to the times from steps 5 and 6. The build times from step 3 and step 4 are comparable however. @GolDDranks had the right idea with their steps 3.1 and 3.2. Those are the equivalent to 5 and 6.
1 Like