Slow incremental compilation when changing small things or comments

Get a big crate. Change let x = "foo"; for let x = "bar;", or add a comment. Watch as your life slips between your fingers after you type cargo check, or even worse with cargo build.

One would thing that this should be instant, especially with incremental compilation, but it's not. This is even worse when changing a crate that other crates depend of in a workspace.

Is it possible that this kind of change could be a special case in the incremental compilation?

Changing a constant and adding a comment differ a lot in their potential consequences.

In many cases, most of the time of lang incremental builds is spent linking. The easiest way to improve on that is to switch to dynamic linking during development which is much more performant: https://robert.kra.hn/posts/2022-09-09-speeding-up-incremental-rust-compilation-with-dylibs/

2 Likes

In many cases, most of the time of lang incremental builds is spent linking

Not in check builds, where it also happens

One of the tricky bits of incremental compilation is that some things depend on spans (line and column) of source text: panic sites and debugger information. For example,

fn foo() {
    // delete this comment line?
}

fn bar() {
    panic!("hello world");
}

If you change the height of foo’s text, then the panic message produced by bar() changes its line number. Then because that implicit string literal has changed, other things dependent on that have be recomputed too. So, if you avoid any span changes (e.g. by using more, smaller files so fewer things are after the code you edit), you can reduce the rebuild time. In my previous testing (unfortunately, I can’t recall where I might have written down the results) this was, I think, something like a 10% speedup in the scenario I was testing.

It might also help to disable debug info to further reduce dependence on line numbers (and reduce the work of compilation in general) but I haven’t tried this.

There probably is also potential to make the compiler better at partitioning this kind of thing from other queries that don’t need to be recomputed, but I don’t know the details of the compiler’s subsystems so I can’t comment on how feasible that is. I know that it wouldn’t be a matter of “well, don’t recompute that when you don't need to” because the query system automatically handles that for everything; it would involve changing what queries exist and how they depend on each other (probably breaking up queries into smaller ones).

2 Likes

This is being tracked in Downstream dependencies of a crate are rebuilt despite the changes not being public-facing · Issue #14604 · rust-lang/cargo · GitHub

1 Like

iirc there was discussion on changing spans for proc-macro expansion caching. Unsure if its limited to that case or will help in more of these cases.

@davidlattimore has done some investigation on incremental compilation performance. iirc one of the problems is with the query system.

The span issue, mentioned by @kpreid, is something I've thought a bit about before. I'd really like it if spans could be made relative to the named item that contains them. So for example in the code given above, the panic message would contain a span relative to the start of the function bar and a DefId of bar, or some stable ID derived from the item path. At runtime, or when a panic occurs, the actual line number could be looked up by calling some function, passing the relative span, the DefId and a reference to a table that maps DefIds to their spans. Structured like this, when you make an edit, all that would need to change would be the function you edited and the table mapping DefIds to spans.

A related issue is the size of the codegen unit. At the moment, even in debug builds a lot of functions are packed together into a codegen unit. At least for a non-optimised (dev build), it'd be ideal if each function was a separate codegen unit. That way when a single function gets changed, only that one function needs to be recompiled. If that changed function was inlined into another function, then it too would been to be recompiled. But other functions that weren't changed, shouldn't need to be. I think I recall @bjorn3 mentioning that the cranelift backend compiles each function separately.

Another thing related to incremental compilation that I've thought a bit about is whether it's primarily pull-based or push-based. In a pull-based model (which is what is there now), the compiler starts by effectively asking what it needs in order to build the binary. It parses everything, then runs queries, reusing cache hits from previous runs. One issue with this is that if you have a very large tree of queries, you need to traverse the tree right down to the leaves before you can determine that those leaves and thus their parents in the tree haven't actually changed. Another problem is that some things don't lend themselves to queries like this at all because they always change. An example of this is the list of monomorphised items. i.e. the list of all functions that need to be passed to codegen. Any code edit might have changed this list, so it doesn't make sense to cache it. Recomputing it from scratch every compile takes time and is something that often shows up in -Ztime-passes.

The alternative model (although potentially both models can be used together) is push-based. In a push-based model, the compiler starts by determining what inputs have changed. e.g. it looks at all its input files and finds that just one file has changed. It then reparses just that one file. Taking the parsed items, it pushes these changes through the stages of the compiler. Only the items that have actually changed need to be pushed. So once you get to say the list of monomorphised items, rather than computing it from scratch, you've got some deltas, adding, removing or redefining some functions.

A push-based model is also ideal for integrating with an incremental linker, since it can pass just the bits that have changed to the linker rather than passing everything and making the linker figure out what has changed. I've been writing a linker called Wild with the plan to make incremental, so this has been on my mind.

3 Likes

Cranelift compiles one function at a time, however cg_clif currently does still compile and cache a single object file for each codegen unit the same way as cg_llvm. A single object file for each individual mono item would likely have too much overhead. In the future I may add caching for individual functions however.

1 Like

That's a really interesting point! For a long time I've assumed that pull-based would be ideal, but you're right that if there's clear information for dependencies, push-based has the potential to eliminate a lot of "is this up to date" checks.

3 Likes

Note that there's no need to re-architect the compiler to accomplish that output for the benefit or an incremental linker. We can instead keep everything as is except the monomorphization/codegen steps, where we can check if the DefId of the item being generated has (transitively) changed and if not do not further evaluate it. This doesn't bring all of the theoretical performance benefit, as some redundant work is still happening before those stages, but it could significantly cut on the work the linker would have to do without as big an engineering lift to get it working to begin with. I think it makes sense to attempt that first, and once codegen and Wild are working correctly with each other, we can go back and reduce redundant work one stage at a time, earlier and earlier in the process. For the record, I don't think that parsing the full crate is ever a significant drain, but would love us if we ever got to the point of "we only reparse files that got edited and take it from there".

5 Likes

Can we upstream this, or some version of this, into a cargo subcommand?

The unfortunate thing is that in this version you have to specially add your dependencies (and it basically just creates a wrapper crate)... but surely cargo could just do this transparently with your normally added dependencies.