Idea: Improving parallelism with metadata generation improvements

Recent timing visualization work made the compilation process more visible. And it also reveals the potential of more parallelism during the whole building process.

This is just me thinking aloud. Personally i think to maximum parallelism, the whole build process for a crate graph should be split into five stages:

  • Stage 1: Generating compiler plugins, like proc-macros etc; processing build.rs etc.
  • Stage 2: Generating absolutely minimum metadata just enough for downstream crates to use, and reduce validation as much as possible.
  • Stage 3: Generating metadata, roughly equivalent to the cargo check process.
  • Stage 4: Generate codegen artifacts, roughly equivalent to the part of cargo build minus cargo check minus linking.
  • Stage 5: Generate linking artifacts.

The idea is to make the stage 3 and stage 4 fully performed in parallel, i.e. no need to wait for dependency crates to complete. Instead, it may be actually more feasible if we can start from the root crate. Since that's the crate the developer is doing work and there's more chance it contains an error, etc.

In this way, i think the processors won't stay idle at all during the stage 3 & 4. So maybe will improve the overall compilation time.

Instead, it may be actually more feasible if we can start from the root crate.

This may be reasonable even in isolation, we can go at least until name resolution without needing crate dependencies, and even perform resolution partially.

Crate loader is this case is the single interception point in the compiler at which we need to block and wait for crates' metadata files (or full library files) to be ready.

The primary question is how much time it will save, the initial compilation stages are usually relatively fast.

More than that, we have all the import/macro resolution indeterminacy infra ready.
I suspect that extern crate items and paths mostly just need to return indeterminate resolutions until the respective crate files are ready.

I wonder if depdendencies even have to be compiled with LLVM. If I have a dependency that has functions a(), b(), c(), but I only call dep::a(), then .dylib for the dep wastefully compiles b and c only to dead-code-eliminate them later.

So maybe the procedure should be to build MIR-only crates for everything up to the final product, then dead code eliminate on MIR, then call LLVM on what's left? (with granular caching of course).

1 Like

There's been some exploration of this idea in https://github.com/rust-lang/rust/issues/38913

2 Likes