Intuition for rustc current incremental compilation?

matklad · December 3, 2020, 11:52am

I wonder if there's any write-up about what current incremental compilation can and can't do? I want to understand it better, so that I know which changes are "fast" and which changes are "slow", and to optimize the structure of my project for fast incremental compiles.

For example, at the moment I can't explain the following: changing a (non-doc) comment in rust-analayzer's middle-end crate and rebuilding the analyzer in "debug" mode takes 10s. I would expect this no-op change to be effectively no-op tbh.

Here's what I am doing.

At this line I added // ... and I add or remove the . (so the line numbers are intact). I then build the leaf crate using

λ export RUSTC_BOOTSTRAP=1
λ rustc --version
rustc 1.49.0-beta.2 (bd26e4e54 2020-11-24)
λ cargo build -p rust-analyzer --bin rust-analyzer -Ztimings=info

There's debug=0 in Cargo.toml profile.

With all that, this is what I am seeing:

   Compiling hir v0.0.0 (/home/matklad/projects/rust-analyzer/crates/hir)
   Compiling ide_db v0.0.0 (/home/matklad/projects/rust-analyzer/crates/ide_db)
   Completed hir v0.0.0 in 0.7s
   Compiling assists v0.0.0 (/home/matklad/projects/rust-analyzer/crates/assists)
   Compiling completion v0.0.0 (/home/matklad/projects/rust-analyzer/crates/completion)
   Compiling ssr v0.0.0 (/home/matklad/projects/rust-analyzer/crates/ssr)
   Completed ide_db v0.0.0 in 0.9s
   Completed ssr v0.0.0 in 0.5s
   Completed completion v0.0.0 in 0.6s
   Compiling ide v0.0.0 (/home/matklad/projects/rust-analyzer/crates/ide)
   Completed assists v0.0.0 in 0.9s
   Compiling rust-analyzer v0.0.0 (/home/matklad/projects/rust-analyzer/crates/rust-analyzer)
   Completed ide v0.0.0 in 0.8s
   Completed rust-analyzer v0.0.0 in 1.6s
   Completed rust-analyzer v0.0.0 bin "rust-analyzer" in 5.8s
    Finished dev [unoptimized] target(s) in 10.06s

THe hir is the middle crate being edited, the rust-analyzer is the final leaf binary, and other crates are some front-end IDE bits.

I am surprised that middle crates take up to a second to compile, as those should be no-ops...

michaelwoerister · December 3, 2020, 1:19pm

Unless something has changed since March, incremental compilation in rustc starts after HIR construction and ends before the linking step. That means that parsing, macro expansion, and name resolution are executed unconditionally, and that the linker will always be invoked with the full set of object files (i.e. linking is not incremental). This results in a certain amount of fixed overhead once cargo decides that rustc must be invoked.

There are also a number of additional inefficiencies that will become more prominent the less work actually has to be redone. For example crate metadata is always entirely rebuilt, so that even if nothing changes everything that goes into crate metadata has to be at least loaded from the incr. on-disk cache. The same is true for the dependency graph and the on-disk cache itself, they also are entirely re-created for each succeeding rustc invocation.

These are all just general guesses though. self-profiling would give you a clearer picture on where the compiler actually spends its time (self-profiling can also collects data on overhead introduced by incremental compilation, like time spent loading or persisting things from/to disk).

matklad · December 3, 2020, 2:54pm

Yeah, I think this is what I am observing here, monomorphization_collector_graph_walk and generate_crate_metadata| are the biggest time-sinks according to -Ztime-passes

gist.github.com

https://gist.github.com/matklad/013d1fbb690e90727dc3f5c676b69610

passes.txt

  Compiling hir v0.0.0 (/home/matklad/projects/rust-analyzer/crates/hir)
time: 0.000; rss: 153MB	check_unused_macros
time: 0.000; rss: 153MB	maybe_building_test_harness
time: 0.000; rss: 153MB	maybe_create_a_macro_crate
time: 0.000; rss: 160MB	blocked_on_dep_graph_loading
time: 0.000; rss: 160MB	prepare_outputs
time: 0.000; rss: 189MB	dep_graph_tcx_init
time: 0.000; rss: 198MB	looking_for_derive_registrar
time: 0.000; rss: 198MB	looking_for_entry_point
time: 0.000; rss: 198MB	looking_for_plugin_registrar

This file has been truncated. show original

michaelwoerister · December 3, 2020, 3:30pm

Yes, the monomorphization collector is a monolithic pass that causes a lot of MIR to be loaded (from upstream crates and the incr. cache). Somehow subdividing the work there, so that it can be made incremental might speed certain scenarios up quite a bit.

bjorn3 · December 3, 2020, 8:59pm

The spans stored in for example mir are relative to the start of the source file, which means that any changes to a file will invalidate all functions defined after it.

In my experience most of the time attributed to the monomorphization collector is actually from the optimized_mir queries, which can easily be invalidated by a changed span.

You should probably use -Zself-profile. This is a much more reliable way of attributing the time to specific actions than -Ztime-passes for queries.

michaelwoerister · December 4, 2020, 9:15am

Yes, the performance of incremental compilation depends a lot on the form of the dependency graph and unnecessary/accidental dependencies can have a big impact, with span information being a particularly bad case because it also changes quite frequently (see https://github.com/rust-lang/rust/issues/47389).

system · March 4, 2021, 9:15am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Incremental Compilation Beta compiler	37	30265	March 25, 2019
Stepping away from the Rust project	9	2939	June 23, 2020
Dynamic linking for compilation speed improvement? compiler	10	2411	March 19, 2021
Slow incremental compilation when changing small things or comments	11	645	December 21, 2024
Help us benchmark incremental compilation!	48	12211	March 25, 2019

Intuition for rustc current incremental compilation?

Related topics