[pre-RFC] Generate "headers" for greater parallelism


#1

Lets take an example of three compilation units, A, B, and C, and look how C/C++ and Rust build systems manage those.

Suppose both B and C depend on A such that in C/C++ they are files a.c, b.c and c.c, including A’s header, and in Rust, they are crates a, b and c, where b and c do extern crate a;, also specifying a in their Cargo.toml.

Now what happens when you compile the three compilation units on a 4 core machine?

In the C/C++ world, make launches 3 subprocesses simultaneously, and the compilation ends as fast as the slowest crates needs for compilation.

In the Rust world, cargo launches one rust instance to compile a, waiting for it to finish before it starts compilation of b and c simultaneously.

This means that Rust is having worse parallelism here.

In the C/C++ world, you don’t need anything else than the headers to depend on some compilation unit. Isn’t something similar true for Rust as well? In Rust too you should only need the type/struct/enum/trait definitions, the (pub) function declarations, stuff like reexports, the macro source code, and the constant definitions of its dependencies to compile a crate, no? All this stuff should already be available before LLVM codegen, maybe even before MIR is available, no?

For the scope of this thread, I’d like to call this info “header info”, just like the headers in C/C++.

If this is the case, I’d like to propose:

  1. A mode for rustc where it dumps the “header info” for a crate in a binary format
  2. Augmenting rustc to read that “header info” of its dependencies and use it for compilation
  3. Changing cargo to split up the compilation of each crate into those two steps. This will lead to much greater parallelism in the dependency graph, as only the (comparatively lightweight) “header info” steps are dependent on each other, the later steps only depend on their respective “header info” steps to be done.

Advantages

  • The builds now execute in parallel to a higher degree. This should lead to faster builds from scratch of multi crate systems

Drawbacks

  • for smaller crates, this might actually be harmful as there is some overhead in starting new processes, generating the “header info”, and parsing the source code multiple times. Of course, this can be somewhat alleviated by storing some intermediate results on disk that can be reused in the second step…
  • new feature = more complexity, etc etc
  • This is somewhat less useful once cargo build caches are available, but this feature is still useful e.g. when you have no internet, lots of private code, non-mainstream platforms, or are working on the compiler (note how everything waits for the rustc crate to be compiled).

Unresolved questions

  • Will this approach actually work? Maybe some info about dependencies is required that is only available after LLVM codegen has happened?
  • Maybe some earlier representations can be used instead? The more of the compilation that can be put into the second stage, the more will be parallelize-able.

Towards a second edition of the compiler
#2

These “headers” would need MIR, because of generics and #[inline]. If b or c calls an inline function or instantiates a generic function with specific type arguments, the compiler needs its MIR to generate an inline copy or monomorphize the generic.


#3

It sounds like what you want is saving the artifacts of incremental compilation caching. I would rather not call these artifacts “headers” since they’re binary rather than source, and it’d probably be a bad idea for them to be structured the same way as precompiled C/C++ header files, but as incremental compilation becomes more stable and more incremental I think it would be a good idea to explore persisting and publishing whatever intermediate artifacts do exist in order to speed up cold builds. Perhaps publishing to crates.io could someday autogenerate these artifacts for the last few stable versions (assuming we gain a way to specify what versions of Rust the crate supports so it knows what to skip).

My only concern with actually doing this once it becomes feasible is that, in my experience, the “more parallel” building of C++ is a double-edged sword that tends to produce painfully cryptic compiler or linker error messages if it goes awry. Essentially, C++ has a build-time advantage here only because it forces the programmer to spend time keeping header and source files in sync, and becomes very unhelpful if you mess up, so it’s not necessarily a fair comparison. I assume Rust would be far less vulnerable to this because it has a proper module system, completely removing the need for header files and their associated hacks like include guards and forward declarations, but it’s something to watch out for.


#4

Could the artifacts of the new cargo check be used somehow? AIUI, it’s roughly like a pre-compiled header. So you’d do this check “build” for everything, then build each crate for real using the check result of their dependencies. The final target would still need to serialize, much like any normal link stage. Since this would process each crate twice (check and build), it’s probably slower in total cputime, but possibly more parallel.


#5

This is similar to ideas we’ve had in the past to make rlib contain only mir and delay code generation until later. The later code generation is delayed the more opportunities there are to merge monomorphizations. Any changes here are heavily impacted by incremental compilation.

cc @michaelwoerister


#6

This proposal sounds a lot like what we would get if we had rlibs without machine code (as Brian mentioned), which is probably where we are headed. For this to also work without a major compile time degradation for leaf crates, we’ll need incremental compilation though. We’ll get there :)


#7

I think the thread got continued in https://github.com/rust-lang/rust/issues/38913


#8

Correct me if I’m wrong, but is the term you’re looking for “metadata”?