Rust has amazing potential to generate impressively fast projects. It seems like every week, we hear about new advancements in browsers, search engines, web servers, and more - all driven by the safety and performance that Rust provides.
At the same time, compile times are a common complaint for the Rust compiler. As projects grow, the compile time pains also grow. This is doubly true for programmers coming to Rust from dynamic languages, and from languages with faster compile times. Just to as one data point, building Servo for me takes 37 mins 20 secs, 10 mins 14 secs of which the build seemingly stalls as the crates that block the remainder of the build finish, with my CPU cores going mostly idle as it works. This story isn’t unique to Servo, but rather is becoming more common as Rust users build larger projects. And we’re hearing this more and more as new companies pick up Rust and consider it using it for their products.
We’ve been thinking a lot this year about what second editions of current tools looks like, which got me thinking about what a second edition of the compiler might look like. What if we leveraged modern Rust performance techniques to build the compiler itself? Can we use parallelism, laziness, and the network effect of crates.io to push compiler times down?
The current Rust compiler is single-threaded. We currently rely on Cargo to spawn multiple rustc instances to compile a project’s dependencies. This doesn’t help with single project compile times. To help address our raw compiler performance, we can introduce parallelism into the compiler itself.
Rust naturally has a unit of compilation that can be parallelized: the function. Rust’s type system is known as a modular type system. That is, each function can be checked in relative isolation, needing only the function signatures rather than repeatedly working through function bodies, as is the case with eg C++ templated functions.
We can use this modularity to our advantage. A second edition compiler would be able to type check, borrow check, and potentially convert to lower-level IR at function granularity. This lets the compiler spread out the workload across available cores.
This helps us parallelize the front “half” of the compiler, but what about LLVM? When we look at single project compile times, depending on the project,
trans and LLVM can account for half of the build time. Incremental compilation may help during recompiles, but what about first compiles? There are a couple possibilities here to use LLVM in parallel:
A relatively recent development on the LLVM side is ThinLTO, a way of doing codegen with smaller codegen units in parallel inside LLVM while maintaining the performance benefits of LTO (unlike the current codegen units functionality in the compiler). Anything to reduce the time in LLVM would be a win, assuming the output code is still of high quality.
We currently have the capability to do multiple codegen units in parallel. Unfortunately, one drawback of using this functionality is that using multiple codegen units loses optimization opportunities, like inlining, between the units. Ideally, we could use dependency information to pick units that would not benefit from optimization across codegen boundaries, allowing the work to be done in parallel with minimal impact to the speed of the output code. This would allow us to run with this on by default.
One of the ways that incremental compilation hopes to gain performance is by not redoing work we don’t need to do. We can apply this philosophy to the first compile as well. We shouldn’t build something we don’t need in the output artifact.
Lazy whole-program compilation
Currently, a compilation unit is the crate. This means that for a first compile of a project, all dependencies are fetched and fully compiled before the main crate is compiled. This ignores the fact that it’s possible (and highly likely) a lot of code being built is never used by your project.
Instead, we could approach compilation as whole-program (sometimes called whole-world) with a focus on only building the code that we need to. We could lazily pull in the definitions from dependencies as we use them rather than always building them.
Much of rustc’s optimization comes from LLVM, leaving it to do tasks like dead code elimination. We can avoid LLVM doing work it doesn’t need to by shaking our own trees and removing all dead code before handing off the code to LLVM for codegen.
Modern compilers aren’t just start-to-finish codegen machines. Often compilers also have to do double duty as ways to drive IDE experiences like goto-def, code completion, and refactoring. To do this efficiently, a compiler needs to be both be able to efficiently recover from user error (as often the code is incomplete when being typed in the IDE), and able to respond to user requests on the order of milliseconds. The latter of these two requires a type-checker that’s more incremental and lazy, recalculating the minimum amount to be able to answer the query at hand.
One technique used by some powerful scripting engines is to parse only enough to get a function’s signature and to know the start and end of its body. While I doubt doing this by itself would grant significant gains, as parsing is not often the dominating time in compilation, it could be coupled with techniques like whole-program compilation (above) to prevent doing even unnecessary parsing.
The current plan for incremental compilation is to cover the frontend to accommodate this case. I believe this can be coupled with a lazy approach to maximize our potential for IDEs.
A common refrain from fans of Rust is just how nice it is to use cargo and crates.io. Here, crates can get be added, separately optimized, and all consumers can benefit. Being more bite-sized helps more people contribute, too. We could do this with the compiler itself, offering modular parts of it as separate crates on crates.io (in much the same way as projects like Servo are separate crates that combine to create a web engine).
We have had attempts to spin off libsyntax as a separate crate, though those efforts have proven difficult. Still, I’m encouraged by just how powerful it is to be able to share crates and work on them independently. Contributors can focus optimization efforts in a more focused way, and see the results of their experiments much more quickly than having to wait for the compiler to build itself.
We’ve also seen a number of tools want to be able to consume parts of the compiler. Whether linting tools, procedural macros, IDE tools, you name it. These tools can drive additional improvements in the crates that make up the compiler.
I recognize that these proposals may be difficult, may require a rethink of our current approach, and may also be a bit naive. There may even been better improvements than I mention above. That’s awesome! It’s my hope that this might kick off the conversation of dreaming big, and then we can go after that dream.