Idea: Improving parallelism with metadata generation improvements

crlf0710 · October 17, 2019, 3:14am

Recent timing visualization work made the compilation process more visible. And it also reveals the potential of more parallelism during the whole building process.

This is just me thinking aloud. Personally i think to maximum parallelism, the whole build process for a crate graph should be split into five stages:

Stage 1: Generating compiler plugins, like proc-macros etc; processing build.rs etc.
Stage 2: Generating absolutely minimum metadata just enough for downstream crates to use, and reduce validation as much as possible.
Stage 3: Generating metadata, roughly equivalent to the cargo check process.
Stage 4: Generate codegen artifacts, roughly equivalent to the part of cargo build minus cargo check minus linking.
Stage 5: Generate linking artifacts.

The idea is to make the stage 3 and stage 4 fully performed in parallel, i.e. no need to wait for dependency crates to complete. Instead, it may be actually more feasible if we can start from the root crate. Since that's the crate the developer is doing work and there's more chance it contains an error, etc.

In this way, i think the processors won't stay idle at all during the stage 3 & 4. So maybe will improve the overall compilation time.

petrochenkov · October 17, 2019, 8:32am

Instead, it may be actually more feasible if we can start from the root crate.

This may be reasonable even in isolation, we can go at least until name resolution without needing crate dependencies, and even perform resolution partially. https://rust-lang.zulipchat.com/#narrow/stream/195180-t-compiler.2Fwg-pipelining/topic/Starting.20crate.20build.20before.20the.20dependencies.20are.20ready

Crate loader is this case is the single interception point in the compiler at which we need to block and wait for crates' metadata files (or full library files) to be ready.

The primary question is how much time it will save, the initial compilation stages are usually relatively fast.

petrochenkov · October 17, 2019, 8:39am

More than that, we have all the import/macro resolution indeterminacy infra ready.
I suspect that extern crate items and paths mostly just need to return indeterminate resolutions until the respective crate files are ready.

kornel · October 17, 2019, 2:22pm

I wonder if depdendencies even have to be compiled with LLVM. If I have a dependency that has functions a(), b(), c(), but I only call dep::a(), then .dylib for the dep wastefully compiles b and c only to dead-code-eliminate them later.

So maybe the procedure should be to build MIR-only crates for everything up to the final product, then dead code eliminate on MIR, then call LLVM on what's left? (with granular caching of course).

wesleywiser · October 17, 2019, 2:35pm

There's been some exploration of this idea in Tracking issue for MIR-only RLIBs · Issue #38913 · rust-lang/rust · GitHub

system · January 15, 2020, 2:35pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[pre-RFC] Generate "headers" for greater parallelism	8	2278	March 25, 2019
Evaluating pipelined rustc compilation compiler	76	21407	October 16, 2019
Exploring Crate Graph Build Times with `cargo build -Ztimings` cargo	37	15976	December 22, 2024
Help test out ThinLTO! compiler	52	16474	March 25, 2019
Let's talk about parallel codegen compiler	49	9850	March 25, 2019

Idea: Improving parallelism with metadata generation improvements

Related topics