Next steps for reducing overall compilation time


#1

Continuing the discussion from 2019 Strategy for Rustc and the RLS, let’s talk about ways to improve compilation time in a big way (over a 1-2 year time horizon, but with incremental wins along the way).

What are the big changes that make sense? What data can we gather to validate whether they make sense? What are the dependencies between them? Can we make these changes in a way that yields gains as we go, and not all at the end?


#2

Let me re-post a list of ideas from an earlier post, slightly extended:

  • End-to-end queries
    • Right now, parsing + macro-expansion + name-resolution are not part of the query system and incremental can’t do anything with them
    • Creating the incremental infrastructure from the start would let us start to change that
    • This would be very useful for RLS integration of queries, I think
    • However, this “session” code is grungy and old and I think this is probably non-trivial. We sketched out some plans at Rust All Hands, I’d have to go back to our notes, but I remember the conclusion was “this will be work”.
    • This is also one place where the libsyntax2 story definitely intesects
  • Parallelize the compiler (tracking issue)
    • @Zoxc did awesome work enabling us to parallelize the compiler
    • But the last mile isn’t done! There were various refactorings and architectural changes we wanted to make as well.
    • It is not integrated with incremental, either
    • We’ve not done much measuring of perf
  • Multicrate sessions
    • The “full vision” for queries is that we should be able to compile many crates at once, as I talked about earlier
    • This might allow us to skip a lot of work
  • MIR-only rlibs (tracking issue)
    • Right now, our rlibs contain actual executable code
    • The idea here is that we just store MIR in the rlibs so that we can do a better job reducing duplicates and so forth
  • Erasing types where possible
    • We’ve often discussed (most recently here) the idea of erasing types (to reduce monomorphization overhead) when we can tell it’s safe. This is a big job but a worthwhile one.
  • Optimizations:
    • Optimizing on the MIR level saves work for LLVM. Plus we can do things it can’t. This might start with inlining plus propagating stuff we know to be immutable because of &T types, as well as CSE and copy-prop like things that should eliminate a lot of memcopies.
    • The main goal here is improving compilation time: LLVM probably does most of this already, if not all of it, but it takes longer to do it, and it does it post-monomorphization (hence: many times).

#3

When it comes to parallelizing the compiler here are some of my idea / plans:

  • Test the existing code. Currently you can build rustc with Rayon supports which parallelizes some things. However this support is not tested. We need a way for end-users to easily be able to try out and test this. We should also do crater runs.
  • Extend the test suite to handle non-determinstic error message orderings so we can actually run the test suite with Rayon support enabled.
  • Run compiler passes in parallel. Instead of using Rayon internally in the compiler passes which has high overhead, try to run the compiler passes in parallel instead. This is a bit hard to do due to the early exits spread around in the passes to cut down error messages. There are also implicit dependencies between passes which would need to get removed.
  • Look for opportunities to parallelize serial parts of the compiler. This would mostly be in the front end, parsing, macro expansion, procedural macros, name resolution, etc.
  • Create a rustc server which cargo can use to run compilation sessions instead of spawning multiple rustc processes. This server would use a single Rayon thread pool and allow the compilation sessions to efficiently share CPU cores. There’s likely some overlap with multicrate sessions here.
  • Use Rayon to generate LLVM bitcode and run LLVM optimizations instead of the current custom thread pool used to run LLVM optimizations. This would be a major simplification of the code and would make the LLVM bitcode generation run in parallel.

Note that incremental compilation and parallel compilation can both be used at the same time, but there is a race condition which can cause non-ideal error message orderings.


#4

Another way incremental performance can be improved is to split compiler passes into queries operating on modules. Where we only run them if the module change instead of running them for the entire crate. This is useful for cheap passes where having a query per item would have too much overhead. I did try to introduce this for some passes, but ran into some issues. This probably needs some attention from @michaelwoerister. I also have a local branch which applies this to a nice number of passes, some of them in a very questionable way. It’s this approach very applicable to lints in particular, since the are usually cheap and local.

Doing this would also allow running the passes on the modules in parallel, without per-item overhead. I think this approach is probably one of the easiest ways to improve performance.


#5

Is it possible to have a distributed compiler like distcc? Instead of just parallelizing, this is much better and should be considered the next step after parallelizing. (I am dreaming to use 10 Raspberry Pi for speed up, or even 100 Amazon/GCloud/Azure VMs…)

What shall we call it? drustc or distrustc? Or I think we can just simply make rustc compatible with distcc and don’t reinvent the wheel?


#6

distrust

n. Lack of trust or confidence.

:rofl:


#7

Looks like a good name for a block-chain currency… Maybe we can create one that you gain credit when you share your host to serve compile requests?


#8

It is possible, yes. We’ve talked about it in rust-lang/rust#47518.

Another related idea is “pre-compiling” crates.io crates for a certain set of architectures and so forth.


#9

How much of the generated MIR code can be reused if one downloads a crate? And are there possible security issues with trusting that the compiled MIR code correctly corresponds with the source code?


#10

There are a lot of question marks here. In particular, the generated code depends very much on the precise versions of all the dependencies and so forth.

I’m not aware of any security holes but I wouldn’t be surprised if I’m overlooking things. (Modulo the obvious thing of needing to trust the builder.)

Personally, I think I lean towards improving the compiler vs caching results, but I’m not sure. And it’s at least plausible to do some of both. (Or maybe make it easy for projects to setup their own caching…?)


#11

Can you JIT the type system?


#12

My long-term plans for Chalk are to compile trait matching (and possibly more operations which we lower to logical predicates) to a byte-code and interpret it. This is pretty close. I probably wouldn’t actually want to jit code though hey never say never.


#13
<(Zero, One) as Plus<(Zero, One)>>::Result == (One, Zero)