Lets take an example of three compilation units, A, B, and C, and look how C/C++ and Rust build systems manage those.
Suppose both B and C depend on A such that in C/C++ they are files a.c, b.c and c.c, including A’s header, and in Rust, they are crates a, b and c, where b and c do extern crate a;
, also specifying a in their Cargo.toml
.
Now what happens when you compile the three compilation units on a 4 core machine?
In the C/C++ world, make
launches 3 subprocesses simultaneously, and the compilation ends as fast as the slowest crates needs for compilation.
In the Rust world, cargo
launches one rust instance to compile a
, waiting for it to finish before it starts compilation of b
and c
simultaneously.
This means that Rust is having worse parallelism here.
In the C/C++ world, you don’t need anything else than the headers to depend on some compilation unit. Isn’t something similar true for Rust as well? In Rust too you should only need the type/struct/enum/trait definitions, the (pub) function declarations, stuff like reexports, the macro source code, and the constant definitions of its dependencies to compile a crate, no? All this stuff should already be available before LLVM codegen, maybe even before MIR is available, no?
For the scope of this thread, I’d like to call this info “header info”, just like the headers in C/C++.
If this is the case, I’d like to propose:
- A mode for rustc where it dumps the “header info” for a crate in a binary format
- Augmenting rustc to read that “header info” of its dependencies and use it for compilation
- Changing cargo to split up the compilation of each crate into those two steps. This will lead to much greater parallelism in the dependency graph, as only the (comparatively lightweight) “header info” steps are dependent on each other, the later steps only depend on their respective “header info” steps to be done.
Advantages
- The builds now execute in parallel to a higher degree. This should lead to faster builds from scratch of multi crate systems
Drawbacks
- for smaller crates, this might actually be harmful as there is some overhead in starting new processes, generating the “header info”, and parsing the source code multiple times. Of course, this can be somewhat alleviated by storing some intermediate results on disk that can be reused in the second step…
- new feature = more complexity, etc etc
- This is somewhat less useful once cargo build caches are available, but this feature is still useful e.g. when you have no internet, lots of private code, non-mainstream platforms, or are working on the compiler (note how everything waits for the
rustc
crate to be compiled).
Unresolved questions
- Will this approach actually work? Maybe some info about dependencies is required that is only available after LLVM codegen has happened?
- Maybe some earlier representations can be used instead? The more of the compilation that can be put into the second stage, the more will be parallelize-able.