Recent timing visualization work made the compilation process more visible. And it also reveals the potential of more parallelism during the whole building process.
This is just me thinking aloud. Personally i think to maximum parallelism, the whole build process for a crate graph should be split into five stages:
- Stage 1: Generating compiler plugins, like proc-macros etc; processing build.rs etc.
- Stage 2: Generating absolutely minimum metadata just enough for downstream crates to use, and reduce validation as much as possible.
- Stage 3: Generating metadata, roughly equivalent to the
- Stage 4: Generate codegen artifacts, roughly equivalent to the part of
cargo checkminus linking.
- Stage 5: Generate linking artifacts.
The idea is to make the stage 3 and stage 4 fully performed in parallel, i.e. no need to wait for dependency crates to complete. Instead, it may be actually more feasible if we can start from the root crate. Since that's the crate the developer is doing work and there's more chance it contains an error, etc.
In this way, i think the processors won't stay idle at all during the stage 3 & 4. So maybe will improve the overall compilation time.