These numbers are for Alacritty with release.lto = true. Numbers are generated with lto enabled since the project has this enabled by default.
763.81s user 13.75s system (Fresh build)
776.68s user 18.35s system (Clean build)
(CARGO_INCREMENTAL=1 and clean target)
Oops, apparently this doesn’t work with LTO.
error: can't perform LTO when compiling incrementally
Starting over without LTO flag, but this basically means incremental won’t be usable by default for Alacritty.
626.46s user 17.02s system (Fresh build)
630.91s user 18.37s system (Clean build)
589.52s user 19.27s system (CARGO_INCREMENTAL=1 and clean target)
12.40s user 2.18s system (CARGO_INCREMENTAL=1 and touch src/main.rs)
55.97s user 1.88s system (CARGO_INCREMENTAL=1 and noop change in fn main())
Regarding the LTO issue, it would be nice if this either worked with incremental compilation, or if there could be an additional build profile. For Alacritty, the use cases would be
Release: LTO, optimizations
Dev: Want optimizations, incremental build
Debug: No optimizations; helpful for viewing complete call stacks when digging into perf issues.
From what I hear it is indeed planned to make this work with ThinLTO, yes. I’ll be very interested to see combined compiletime/runtime benchmarks once all that is ready.
Yeah, I suggested this a while ago too. This would perfectly capture common workflows. You would use dev most of the time while developing. When you encounter a bug, you want to preserve all the debugging information and so you use debug to find the bug (due to no optimizations, this runs way slower, so you don't want it during normal development). And then when releasing you use the release build.
NOTES: the file I touched was src/lib.rs, and the noop was to add let _x = 5; to the new() method of src/interface/sdk.rs.
EDIT3: Originally, I reported no speedup from incremental, however, I was not properly exporting the incremental flag. My results above now reflect this. Previous edits were removed.
regular: 2m11.174s real 5m3.467s user 20.46s sys
incremental: 61.13 real 323.72 user 23.16 sys
touch: 6.15 real 16.88 user 0.80 sys
trivial change: 3.63 real 2.64 user 0.85 sys
I second that we should consider adding a --debug flag to Cargo for this workflow, to guarantee no runtime regressions due to incremental compilation. However, if ThinLTO really is as good as promised, maybe that won’t be an issue? We’ll have to wait and see.
Slight nit: instead of CARGO_INCREMENTAL=1 time cargo +nightly build --release, use time CARGO_INCREMENTAL=1 cargo +nightly build --release (the former uses /usr/bin/time, the latter uses the shell builtin time)
That being said, here are the numbers for sccache (rev cbb72b80df248eed18b399da7e0d77cbaedb3b07):
$ time cargo +nightly build --release
Finished release [optimized + debuginfo] target(s) in 298.96 secs
real 4m59.237s
user 14m25.412s
sys 0m13.580s
$ time CARGO_INCREMENTAL=1 cargo +nightly build --release
Finished release [optimized + debuginfo] target(s) in 159.19 secs
real 2m39.467s
user 16m7.767s
sys 0m18.661s
$ touch src/main.rs; time CARGO_INCREMENTAL=1 cargo +nightly build --release
Finished release [optimized + debuginfo] target(s) in 9.58 secs
real 0m9.855s
user 0m9.314s
sys 0m0.580s
# add println to main
$ time CARGO_INCREMENTAL=1 cargo +nightly build --release
Finished release [optimized + debuginfo] target(s) in 42.21 secs
real 0m42.489s
user 3m31.391s
sys 0m2.315s
Is it only me, or are the instructions missing the measurements for non-fresh non-incremental builds? The number 3. contains all the work for building the crate itself and all the dependencies, and then there’s the incremental stuff, but the non-fresh incremental builds can’t hardly be compared to the full build time, only to the single-crate (dependencies already built) time. I added touching and trivial changes in non-incremental mode as numbers 3.1 and 3.2 to account for this.
I tried compiling the rls crate which I have been playing with lately.
5. touch src/main.rs && CARGO_INCREMENTAL=1 time cargo +nightly build --release
8.11 real 7.27 user 0.92 sys
6. adding a println! statement and doing CARGO_INCREMENTAL=1 time cargo +nightly build --release
8.01 real 7.32 user 0.86 sys
I’d like to get a confirmation from @aturon, is missing touch and trivial changes with non-incrementals in the instructions just an omission or is there some reason not to do comparisons/not to be interested with that?
Edit: I almost forgot to say: the incremental builds seem to provide HUGE wins. So big kudos to the implementors! I already said this in twitter, but… wow, I feel this is like unlocking a superpower on Rust!
It uses about six subcrates and a very weird parser, but anyway.
Clean
real 1m29.321s
user 2m22.800s
sys 0m1.016s
Clean, inc
real 0m54.704s
user 2m45.228s
sys 0m1.036s
Clean, touch
real 0m1.834s
user 0m1.724s
sys 0m0.112s
Clean, method change
real 0m2.542s
user 0m3.172s
sys 0m0.120s
What I find most interesting is that CARGO_INCREMENTAL apparently allows for more parallelism: even though the initial compilation takes more CPU-time, it does execute 50% faster.
Touching and modifications are done in the core module which everything (an additional 8 modules) depend on.
$> rm -rf target
$> time cargo +nightly build --release
...
cargo +nightly build --release 904.66s user 10.51s system 214% cpu 7:05.93 total
$> export CARGO_INCREMENTAL=1
$> rm -rf target
$> time cargo +nightly build --release
...
cargo +nightly build --release 1133.48s user 14.63s system 321% cpu 5:56.75 total
$> export CARGO_INCREMENTAL=1
$> touch core/src/lib.rs
$> time cargo +nightly build --release
...
cargo +nightly build --release 28.76s user 2.30s system 153% cpu 20.287 total
$> export CARGO_INCREMENTAL=1
$> *edit function*
$> time cargo +nightly build --release
...
cargo +nightly build --release 30.07s user 2.18s system 155% cpu 20.725 total
This is for one of our projects at work. It has a fair number of deps (internal and especially external).
3) NO CARGO_INCREMENTAL - full
Finished release [optimized] target(s) in 220.66 secs
real 3m40.889s
user 15m22.129s
sys 0m11.130s
4) CARGO_INCREMENTAL - full
Finished release [optimized] target(s) in 177.1 secs
real 2m57.240s
user 15m48.252s
sys 0m12.928s
5) CARGO_INCREMENTAL - touch
Finished release [optimized] target(s) in 8.9 secs
real 0m8.322s
user 0m7.764s
sys 0m0.556s
6) CARGO_INCREMENTAL - edit (was a different file than lib.rs)
Finished release [optimized] target(s) in 8.15 secs
real 0m8.383s
user 0m8.043s
sys 0m0.555s
Thanks everyone for the great feedback so far! This is very valuable to us.
Some remarks that might be of interest:
The initial incremental build is faster than the from-scratch build because it uses all your CPU cores during optimization and code generation (the last part of the compilation pipeline). The price you pay for making this possible is decreased runtime performance.
It is expected that the runtime performance of incrementally compiled programs is worse. The main reason for this that the compiler can do less inlining. That being said, functions marked with #[inline] will be available for inlining even in incremental mode. So doing some profiling and then making hot functions #[inline] might help quite a bit.
ThinLTO will very probably be compatible with incremental compilation. My guess is that it will lead to longer compile times compared to non-LTO incremental compilation but the resulting binaries should have pretty good runtime performance.
The steps in the original post do not contain a measurement for non-incrementally rebuilding only the main crate, i.e. with dependencies already built. That means that build times from step 3 are not directly comparable to the times from steps 5 and 6. The build times from step 3 and step 4 are comparable however. @GolDDranks had the right idea with their steps 3.1 and 3.2. Those are the equivalent to 5 and 6.