Laziness in the compiler

In Towards a second edition of the compiler, @jntrnr proposed laziness in the compiler:

Laziness

One of the ways that incremental compilation hopes to gain performance is by not redoing work we don’t need to do. We can apply this philosophy to the first compile as well. We shouldn’t build something we don’t need in the output artifact.

Lazy whole-program compilation

Currently, a compilation unit is the crate. This means that for a first compile of a project, all dependencies are fetched and fully compiled before the main crate is compiled. This ignores the fact that it’s possible (and highly likely) a lot of code being built is never used by your project.

Instead, we could approach compilation as whole-program (sometimes called whole-world) with a focus on only building the code that we need to. We could lazily pull in the definitions from dependencies as we use them rather than always building them.

Tree-shaking

Much of rustc’s optimization comes from LLVM, leaving it to do tasks like dead code elimination. We can avoid LLVM doing work it doesn’t need to by shaking our own trees and removing all dead code before handing off the code to LLVM for codegen.

Frontend laziness

Modern compilers aren’t just start-to-finish codegen machines. Often compilers also have to do double duty as ways to drive IDE experiences like goto-def, code completion, and refactoring. To do this efficiently, a compiler needs to be both be able to efficiently recover from user error (as often the code is incomplete when being typed in the IDE), and able to respond to user requests on the order of milliseconds. The latter of these two requires a type-checker that’s more incremental and lazy, recalculating the minimum amount to be able to answer the query at hand.

One technique used by some powerful scripting engines is to parse only enough to get a function’s signature and to know the start and end of its body. While I doubt doing this by itself would grant significant gains, as parsing is not often the dominating time in compilation, it could be coupled with techniques like whole-program compilation (above) to prevent doing even unnecessary parsing.

The current plan for incremental compilation is to cover the frontend to accommodate this case. I believe this can be coupled with a lazy approach to maximize our potential for IDEs.

I wonder what happened afterwards.

Rust is known for its slow compilation speed as of this writing. I think a big reason is the ease to import external crates using Cargo—users usually only need to write a few hundred lines of code and import some popular crates to make the compiler compile hundreds of thousands of lines of code. So, not fully compiling code that are not used by the user's crate (or more aggressively, the binary) should drastically improve the overall compilation time.

Based on my limited knowledge, this would probably be feasible at some intermediate representation level.

To be clear, I am not being rigorous here, and I am simply trying bringing up this topic. I look forward to your more insightful discussions below :slightly_smiling_face:.

The biggest difference between 2017 and now is that the RLS project (IDE language server based on rustc) has been dropped in favor of rust-analyzer (separate intelligence engine reusing some components but focused on the IDE use case instead of compilation).

The compiler's also gotten meaningfully faster since we started properly keeping track in 2019 — around 30% faster on average for realistic workloads.

4 Likes

My impression is that the speed improvement in the compiler has been done by steady and incremental changes. Adding laziness, on the other hand, would probably require major refactoring.

On the other note, LSP not relying on rustc should free some of the burden when considering laziness.

The query infrastructure (laziness, memoization, etc) has already been moved closer to the compilation start several times since 2017.

Unfortunately, the pattern I've observed is that gains from "not redoing work" are often counterbalanced by costs of the query infrastructure, i.e. figuring out whether we need to redo the work or not.

8 Likes

I doubt that stuff like "start from the crate root and determine which parts of dependencies we actually need" would be beneficial. Dependency graphs are typically wide, not deep (and if yours isn't, it's an issue to fix). This means that the compiler can immediately start building all leaf crates in parallel, for major compile time reduction. If you try to determine which dependencies are necessary, you'd have to start from the crate root, stalling all compilation until the root's metadata is available. Similarly you'd stall while moving down the dependency tree, and you're likely to need all leaf dependencies anyway.

It would also significantly change the compilation model, sunce at the moment rustc and cargo are entirely decoupled. Rustc knows how to compile a single crate, cargo knows how to resolve, fetch and link dependencies. Piecewise lazy compilation would require a major change to that model, and the old model must be supported anyway, because of integration with alternative build systems and the need to support distributed build servers.

Also stuff like sys-crates, exported C FFI functions in dependencies and build scripts mean that many crates must be compiled regardless of downstream requirements.

Oh, and macros. Proc macro crate would need to be built before any of their dependents in any case, otherwise you can't get precise metadata from the dependent crates. And I have no idea how to deal with macro_rules! macros, which can also change metadata, but aren't neatly separated from the code, and can be declared and used anyway.

Overall, looks like way way more trouble than it's worth.

--‐-------

The issue of some code in a library often unused in dependencies is imho that library's problem. It should feature-gate such rarely used functionality. In that case the code will be compiled only if it's likely to be actually used.

3 Likes

I think you are right in that all the crates would probably need to be compiled. But, maybe they only need to be fully compiled to one of the IRs?

The reasoning is that LLVM is slow and contributes to a large portion of compilation time, and I think that it may be possible to drastically reduce LLVM's workload (and perhaps do more on higher IR levels).

There's a cost to doing so though, even independent of the developer cost of manually doing subcrate dependency tracking (and associated risk of using functionality without declaring a dependency). Specifically, if I later determine that a feature is now necessary, enabling it requires a clean rebuild of that dependency (and anything downstream of it); nonlocal dependencies are compiled without incremental since they're assumed to change infrequently, and while feature gates SHOULD be API additive, there's no such restriction on unexposed impl details.

For those reasons, it's generally recommended to only feature gate meaningful coarse chunks of a library crate; typically entire modules' worth of functionality and/or interop support with an upstream crate. The compilation/pipelining cost of smaller chunks of functionality typically isn't worth the cost of feature gating it.

IF the compiler did such tracking automatically, though, it could potentially be different.

This would generally fall under the banner of "MIR only RLIBs," IIUC. It's not that simple, though, because LLVM is (a significant part of) the slow part. Separate compilation of crates is "embarrassingly parallel," and slamming everything into one translation unit and then splitting it back into codegen units makes that parallelism harder to recover. Plus, now you need to track incremental for what you've already compiled from upstream if you want to avoid redoing that work, whereas with fully separate compilation it's just ambiently known what is available from the upstream libs (anything nongeneric and noninline plus maybe shared generics (opt-in, unstable)).

And especially on Windows/MSVC, codegen isn't necessarily the only slow part; for binaries with a lot of crates/CGUs, linking can also take a significant amount of time, and is typically redone from scratch for every build.

The high-level reality is that architectural improvements to benefit clean compiles typically hurt incremental and vice versa. While clean builds are still important, what really matters the most is low-to-medium optimization incremental builds, since that's what bottlenecks development iteration time. It's perhaps a bit reductive, but if you're only making a clean fullopt fatLTO build twice a month for distribution, it doesn't matter much if it takes more compute time to produce it.

XKCD #1205

1 Like

The high-level reality is that architectural improvements to benefit clean compiles typically hurt incremental and vice versa.

Well said. I think trying to achieve both is not as good as letting the user choose one—probably incremental for debug mode and clean compilation for release mode. Although, that will probably make the compiler much more complex as a whole.

And especially on Windows/MSVC, codegen isn't necessarily the only slow part; for binaries with a lot of crates/CGUs, linking can also take a significant amount of time, and is typically redone from scratch for every build.

That is itself another major topic.

Also keep in mind that zig which does lazy compilation had no errors reported for unused code. I'm not sure if there's any good way to avoid this?

4 Likes

And I think the biggest improvements here would need an even more drastic architectural change. Loading all the incremental compilation data is itself a huge cost, so really fast incremental might require doing something like daemonizing the compiler, similar to how R-A is a process that just stays running.

1 Like

Maybe not all that drastic? https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Rustc.20compiler.20daemon

Note that this is far from obviously desired. We already do not monomorphize functions that are never called, where "never" is determined after MIR optimizations. This leads to rather undesirable behavior: some errors appear only in debug builds! Also see this issue:

Going more lazy in the manner described above seems likely to cause more problems of this kind.

5 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.