Towards a second edition of the compiler


#1

Rust has amazing potential to generate impressively fast projects. It seems like every week, we hear about new advancements in browsers, search engines, web servers, and more - all driven by the safety and performance that Rust provides.

At the same time, compile times are a common complaint for the Rust compiler. As projects grow, the compile time pains also grow. This is doubly true for programmers coming to Rust from dynamic languages, and from languages with faster compile times. Just to as one data point, building Servo for me takes 37 mins 20 secs, 10 mins 14 secs of which the build seemingly stalls as the crates that block the remainder of the build finish, with my CPU cores going mostly idle as it works. This story isn’t unique to Servo, but rather is becoming more common as Rust users build larger projects. And we’re hearing this more and more as new companies pick up Rust and consider it using it for their products.

We’ve been thinking a lot this year about what second editions of current tools looks like, which got me thinking about what a second edition of the compiler might look like. What if we leveraged modern Rust performance techniques to build the compiler itself? Can we use parallelism, laziness, and the network effect of crates.io to push compiler times down?

Parallelism

The current Rust compiler is single-threaded. We currently rely on Cargo to spawn multiple rustc instances to compile a project’s dependencies. This doesn’t help with single project compile times. To help address our raw compiler performance, we can introduce parallelism into the compiler itself.

Rust naturally has a unit of compilation that can be parallelized: the function. Rust’s type system is known as a modular type system. That is, each function can be checked in relative isolation, needing only the function signatures rather than repeatedly working through function bodies, as is the case with eg C++ templated functions.

We can use this modularity to our advantage. A second edition compiler would be able to type check, borrow check, and potentially convert to lower-level IR at function granularity. This lets the compiler spread out the workload across available cores.

This helps us parallelize the front “half” of the compiler, but what about LLVM? When we look at single project compile times, depending on the project, trans and LLVM can account for half of the build time. Incremental compilation may help during recompiles, but what about first compiles? There are a couple possibilities here to use LLVM in parallel:

ThinLTO

A relatively recent development on the LLVM side is ThinLTO, a way of doing codegen with smaller codegen units in parallel inside LLVM while maintaining the performance benefits of LTO (unlike the current codegen units functionality in the compiler). Anything to reduce the time in LLVM would be a win, assuming the output code is still of high quality.

Parallel codegen

We currently have the capability to do multiple codegen units in parallel. Unfortunately, one drawback of using this functionality is that using multiple codegen units loses optimization opportunities, like inlining, between the units. Ideally, we could use dependency information to pick units that would not benefit from optimization across codegen boundaries, allowing the work to be done in parallel with minimal impact to the speed of the output code. This would allow us to run with this on by default.

Laziness

One of the ways that incremental compilation hopes to gain performance is by not redoing work we don’t need to do. We can apply this philosophy to the first compile as well. We shouldn’t build something we don’t need in the output artifact.

Lazy whole-program compilation

Currently, a compilation unit is the crate. This means that for a first compile of a project, all dependencies are fetched and fully compiled before the main crate is compiled. This ignores the fact that it’s possible (and highly likely) a lot of code being built is never used by your project.

Instead, we could approach compilation as whole-program (sometimes called whole-world) with a focus on only building the code that we need to. We could lazily pull in the definitions from dependencies as we use them rather than always building them.

Tree-shaking

Much of rustc’s optimization comes from LLVM, leaving it to do tasks like dead code elimination. We can avoid LLVM doing work it doesn’t need to by shaking our own trees and removing all dead code before handing off the code to LLVM for codegen.

Frontend laziness

Modern compilers aren’t just start-to-finish codegen machines. Often compilers also have to do double duty as ways to drive IDE experiences like goto-def, code completion, and refactoring. To do this efficiently, a compiler needs to be both be able to efficiently recover from user error (as often the code is incomplete when being typed in the IDE), and able to respond to user requests on the order of milliseconds. The latter of these two requires a type-checker that’s more incremental and lazy, recalculating the minimum amount to be able to answer the query at hand.

One technique used by some powerful scripting engines is to parse only enough to get a function’s signature and to know the start and end of its body. While I doubt doing this by itself would grant significant gains, as parsing is not often the dominating time in compilation, it could be coupled with techniques like whole-program compilation (above) to prevent doing even unnecessary parsing.

The current plan for incremental compilation is to cover the frontend to accommodate this case. I believe this can be coupled with a lazy approach to maximize our potential for IDEs.

Moving to crates.io

A common refrain from fans of Rust is just how nice it is to use cargo and crates.io. Here, crates can get be added, separately optimized, and all consumers can benefit. Being more bite-sized helps more people contribute, too. We could do this with the compiler itself, offering modular parts of it as separate crates on crates.io (in much the same way as projects like Servo are separate crates that combine to create a web engine).

We have had attempts to spin off libsyntax as a separate crate, though those efforts have proven difficult. Still, I’m encouraged by just how powerful it is to be able to share crates and work on them independently. Contributors can focus optimization efforts in a more focused way, and see the results of their experiments much more quickly than having to wait for the compiler to build itself.

We’ve also seen a number of tools want to be able to consume parts of the compiler. Whether linting tools, procedural macros, IDE tools, you name it. These tools can drive additional improvements in the crates that make up the compiler.

Conclusion

I recognize that these proposals may be difficult, may require a rethink of our current approach, and may also be a bit naive. There may even been better improvements than I mention above. That’s awesome! It’s my hope that this might kick off the conversation of dreaming big, and then we can go after that dream.


What if we didn't compile dead code?
#2

This sounds like a cool idea. Though, I’m curious, is the idea that this would be independent of the current compiler or an attempt to morph the current compiler into a form that follows all of these ideas?


#3

+ :100: to this!

I would like to expand a bit on the topic of libsyntax, because I feel like there’s a lot of room for improvement there for IDE support :slight_smile:

Tools like RLS, rustfmt and clippy should have convenient access to the tree representation of the source code. This tree should not be an abstract syntax tree, because in these tools, you want to modify source code, and so access to comments, whitespaces, punctuation and such is crucial. The current AST looks like a very tough to work with on the source code level. It has some spans for sure, but it does not faithfully mirror the tree structure of source code. QSelf is a good example of mismatch between the current AST structure and the original source code.

If you are interested, I have done an experimental implementation of syntax tree in Rust, which I think would be a good fit for IDEs and such: https://github.com/matklad/fall#tree-model :slight_smile:

One technique used by some powerful scripting engines is to parse only enough to get a function’s signature and to know the start and end of its body.

It’s definitely true that this won’t help compiler’s performance a lot. However, this feature is crucial for great parser recovery, which is a must for IDEs. It’s also already implemented in my experimental thing: https://github.com/matklad/fall#grammar :wink:

However, parser performance is important regardless of the compiler speed as a whole! In IDE, you can do a lot of nifty things as the user types, which need AST, but not types or name resolution. For example, you can hook into “join lines” handler to automatically remove commas when, after joining two lines, trailing comma and a closing brace are on the single line. Here’s the code for this in IntelliJ Rust: https://github.com/intellij-rust/intellij-rust/blob/master/src/main/kotlin/org/rust/ide/actions/RsJoinLinesHandler.kt#L34. Crucially, it needs the syntax tree. So, it should be possible to get a syntax tree for a file without compiling anything (the file might not even be part of the project, think about some scratch in-memory only rust buffer), and the time to parse the tree may be a part of a latency of keystroke! To make parsing fast, you can incrementalize the lexer (it’s easy) and parsing of block contents (this is more complex and less important).

Let’s rewrite libsyntax in Rust and put it on crates.io! :slight_smile:


#4

Question 1: is it truely worth trying to use the same syntax tree/parsing code for the compiler as for IDEs, rustfmt, etc.? As matklad says, requirements differ quite a bit.

Question 2: though LLVM is the only option (as far as I am aware) available in the near future to generate well-optimised code, is it the best option for the typical edit-compile-run testing/dev cycle? There’s already been talk of using Cretonne as an alternative backend. (Of course, speed ups to LLVM builds are still welcome.)


#5

Question 1

I think it is desirable, but not necessary. An interesting example here is Dart, which uses different parsers for the VM and language server (source: the last paragraph of http://mrale.ph/blog/2017/01/08/the-fear-of-dart-mirrors.html). Given that Rust has rather powerful macros, I think that perhaps having a macro oblivious IDE-only parser and a full token-tree based parser in the compiler (perhaps even a generated LR flavor) is not such a terrible idea.


#6

I’d’ve thought being able to handle macros correctly in the IDE would be one of the advantages of sharing code between both?


#7

Sure, sharing code would be ideal! It is not necessary though, and may or may not result in more overall complexity (features are somewhat orthogonal, so you might get a + b complexity with duplication and a * b complexity with unified approach).

Handling macros correctly consists of two things:

  • understanding the effects of the expanded code
  • understanding the contents of not expanded code (that is, that 1 + 2 in println!("{}", 1 + 2) is an expression, so that smart stuff like join lines works).

The former task should live in the compiler, and not in the IDE-frontend, so it needn’t use syntax tree at all, as long as there’s a way to go from internal compiler abstract data structures to surface syntax, and back (span <-> DefId bidirectional function).

The latter is important to have inside IDE, but is difficult, because macro definition can be anywhere, and in general can depend on the build system. That is, if you delete Cargo.toml from your project, you still need to parse macros somehow, despite the fact that you have no idea how macro definitions and usages connect. However, if you do have a definition of macro, it would be nice to parse it contents as expressions, items, e.t.c. This can be handled by injecting appropriate syntax tree after the main parse. That is, when IDE sees a file with println!("{}", 1 + 2), it first parses this as a token tree. Then, if a reasonable project structure is present, IDE asks (asynchronously) compiler: “Can you describe what is "{}", 1 + 2?” Compiler answers: “Sure, the first four characters are an expression, then there are two macro-specific tokens, and the next 5 characters are another expression”. IDE frontend then executes an expression parser again for this two fragments and injects the resulting trees into the main tree of the file.


#8

Oh, of course procedural macros are a different matter. But I think there’s no reason in principle why declarative macros (macro_rules!) couldn’t be handled directly?


#9

While Rust generally follows this model, you can also have stuff like impl Trait with specialisation enabled.

Also, last time I was proposing more parallelism I was informed that rustc typecheck heavily relies on caches internally. So splitting up into multiple threads may make your compilation faster, but it will draw more battery and cause more heat on mobile devices. So while I generally agree with the feature, it should be opt-in or cargo should check whether the system is on battery or not or something.


#10

Couldn’t the current stripped-down syntax-tree (T2) be generated with a transformation on a more complete tree (T1)?

A Parser would generate T1 (and by only providing information on syntax, which isn’t expected to change at a whim, unlike T2, could then be put in a stable userland crate along with T1). A transformation-step would then transform T1 into T2, adding all the additional semantic meaning to T2, that we have in current AST.

In the stable crate one could then provide modifying Visitors for T1 to do all kinds of source code transformations on the tree with whitespace, comments and all.

But then again I have no real experience with compiler engineering. :man_shrugging:

So, what’s wrong about my thinking here (apart from inefficiently generating trees that are dropped shortly after when just compiling)?


#11

macro by example requires name resolution, which requires knowledge of a build system (to know crate roots and cfg flags), and so is quite a sizable part of the compiler, not only the parser. In principle, it’s possible to handle everything on the IDE side (that’s what we do in IntelliJ) but if one want’s to split IDE and compiler, I would say that all syntax should live in the IDE, and all semantics, including all macros, should live in the compiler.

This will work perfectly in a language without macros. If you have macros, you’ll sometimes need to parse AST not from the source code, but from bits and pieces produced by other macro expansions. So the T1 parser must handle both parsing raw text, and parsing in-memory results of previous expansions. This could be done as well, but is more difficult.


#12

Ah, I see. Should have guessed it’d be macros. :sweat_smile: I knew there had to be something, 'cause otherwise the great minds behind rustc would have gone that path long ago. :yum:


#13

@ahmedcharles - I think these are good questions. There’s lots of good work in the current compiler, and lots of hours has gone into making it. I’m not proposing we throw it out or anything. That said, I hope that we can take a step back and think about what the fastest compiler might look like in terms of its architecture and capability and see what it would take to get there.


#14

@matklad - Indeed! I didn’t want to jam a bunch of IDE stuff in the proposal, but there are good reasons to be as lazy as possible, and that’s another one of them. I think just in general not doing work we don’t need to do is going to be a big save, and that’s especially important when we ask the compiler to be more interactive, error-recovering, and whatnot.

On the libsyntax front, yes I think that’s another area I think we can do a bit better on. I can empathize with the desire to not lock down the API/ABI for working with the compiler’s structures, but there is a definite need to standardize them so other folks can use them.

Another example here, in addition to Dart, is that the TypeScript compiler (at least last I checked) uses the same parser for serving IDE and compilation. Both are pretty darn fast. For me, it feels like so long as you build it with both use cases in mind from the start, you can keep both pretty fast.


#15

@dhardy - yes, cretonne is on the radar as a possible backend, though that’s still a ways out. Right now, the compiler, std lib, etc are all geared towards creating the fastest output binary. There definitely is a need for a “-O1” mode that aims to be both a fast compile and a reasonably fast output.

Some projects, like Jai (Jonathan Blow’s new language) I believe have a simple, custom codegen that can be used for fast edit-compile-run cycles. His videos show off him building and running his new game, which is currently at roughly 55kloc, in less than a second. The game runs at playable speeds. This dramatically increases his ability to test ideas and debug. Should we do something like that for Rust? It’s an interesting idea, at least.


#16

Yeah, sorry for derailing the thread quite a bit :stuck_out_tongue: That said, “IDE support” can make edit-compile cycle much faster by removing compile part altogether in some cases.

Exactly! Just to make sure, my point wasn’t “we need a separate parser for IDE” (I mostly was answering a question by @dhardy if they have to be the same). Ideally, IDE and compiler should use the same parser, which is super fast, incremental, aggressively recovers from errors and provides superb error messages.

That said, given that somebody tried to push libsyntax to crates.io and failed, and that recently rustfmt moved in-tree because it was easier than moving libsyntax out, I fear that the best way forward is to design a fresh syntax tree and a parser (syntax tree data structure itself being massively more important then the actual parser code) from the start, use it in RLS/rustfmt/clippy, and, if successful, move rustc over to it.


#17

Rust’s crate-based compilation model does seem inherently inimical to fast compilation. Also, compilers tend to get slower with age. Put these together and the situation is challenging.

Here’s a question. Imagine you have a program that consists of two crates, A and B, where B depends on A. Do Cargo and rustc currently finish building A before they start building B? If so, could they instead overlap the back-end compilation (codegen) of A with the front-end compilation (parsing, checking, etc.) of B?


#18

Yes, currently Cargo will wait for rustc to finish building A before it starts building B.

We already have a cargo check and rustc --emit=metadata which does checking, generate just enough for dependents, and skips code generation entirely. We could imagine a mode where rustc first generates A’s metadata, then somehow signals to Cargo that that metadata is ready, then continues with codegen while Cargo starts compiling B in parallel. But this doesn’t exist yet.


#19

That idea has come up several times in discussion and MIR only rlibs are a big step towards that (also see my earlier thread about this). There has been no official decision by the compiler team to get this going though.


#20

@jntrnr thank you for proposing all this! Beyond the concrete benefits you mentioned, I firmly believe this is the Right Way ™ / elegant thing to do.

In too many languages, the compile is a rather monolithic program divorced from the larger ecosystem (Rustc today even is far from the worst offender). I think we’ll see unprecedented itch-scratching in hard-to-foresee ways.

Certainly the overall framework for cached, parallel, lazy, persisted, etc compilation has uses many uses beyond the compiler. I would love to see that factored out.