Rust Compiler Performance Working Group


#1

Today, I am pleased to announce the new Rust Compiler Performance Working Group!

The purpose of this working group is threefold:

  • MEASURE The group will keep watch over how fast the Rust compiler is in the most common usage scenarios. This includes defining which scenarios those are and then, for each of them, defining a benchmark that allows us to measure how well we are doing. Eventually, we will have a compiler performance dashboard (based on perf.rlo) that (hopefully) will show compilation time graph lines steeply plummeting towards zero as Rust version numbers increase.
  • COORDINATE The group will keep track of the compiler’s implementation, as far as its performance is concerned, and the manifold opportunities and ideas for making it faster. To this end, as the group’s lead, I will curate and keep up-to-date the Compiler Performance Tracking Issue that serves as a landing page for all things compiler performance and will make discoverable the many things people do in order to make the Rust compiler faster and more efficient.
  • COMMUNICATE The group acts as a communication channel for its members, and to its members. There is the @rust-lang/WG-compiler-performance GitHub team for bringing relevant GitHub issues to the group’s attention. And there is the WG-compiler-performance Gitter channel that members of the group and interested individuals are invited to join.

The working group consists of what I think is a healthy mix of old hands who know the ins and outs of the Rust compiler; of more recent contributors who have nonetheless already left their mark on our compile-time graphs; of people who have experience building large and complicated Rust projects; and of people who know about profiling code and measuring performance in general. If you think you can contribute and are interested in joining, let me know in the comments below or drop me a line on the Gitter channel!

Let’s make this a great year for compiling Rust programs! :)


Announcing the codegen working group
#2

Compilation times have been one of my biggest pet peeves with rust. This is fantastic!!!


#3

It might be interesting to research how to make rust friendlier for distributed compiles. I was working with the venders of a distributed compilation tool on this and they commented that it was looking like rustc was inherently unfriendly for distribution compared to C++.

There are a couple of reasons for this.

First it is hard to discover the minimal set of .rs files (and potentially others via include_str!) that will be needed by any given compile (and thus sent to remote workers).

Second, unlike C++ which can distribute compiles of several individual files to hundreds of workers and then link them all together on the host machine, rust appears to invoke one copy of rustc per crate which includes linking. This puts not only the burden of sending input files to workers on the distribution tool, but also the burden of discovering and transmitting all input files needed by the linker itself, which is a completely unnecessary for distributed C++ compiles.

Because of this rather than amortizing away the cost of compiling individual source files, rust can only amortize away the cost of compiling individual crates. In a small project, this might be <50 crates so the benefits are hard to realize…

Distribution makes C++'s compile times a solved problem for my company, so it would be awesome if the Rust toolchain could become friendlier for this solution.


#4

Regarding distributed compilation, have you taken a look at https://github.com/rust-lang/rust/issues/47518 ? It wouldn’t be much of a stretch to go from multiprocess to multi-machine.


#5

Also see this modified approach, regarding distributed builds: https://github.com/rust-lang/rust/issues/47518#issuecomment-359145672

In general, supporting distributed builds in Rust will be a challenge. Depending on your use case though, something like sccache might fit the bill.


#6

I was working with the venders of a distributed compilation tool

Could you say which vendor it was?

This is by design.

C++'s .cpp files are translation units.

Rust’s .rs files are not translation units. The Rust translation unit is the crate.

First it is hard to discover the minimal set of .rs files (and potentially others via include_str!) that will be needed by any given compile (and thus sent to remote workers).

Because the Rust translation unit is the crate:

  1. trying to find the minimal set of .rs is an operation that is not necessary for building Rust source code,
  2. the minimal set of .rs files is not hard to find: it is impossible to find, at least, before compiling and executing build.rs first.

Second, unlike C++ which can distribute compiles of several individual files to hundreds of workers and then link them all together on the host machine, rust appears to invoke one copy of rustc per crate which includes linking.

Because Rust’s translation unit is the crate… one copy of rustc is invoked per crate. This might sound a bit like “duh”, but that’s what a translation unit basically means.


So which vendor of distributed compilation systems tries to develop one for a programming language without bothering about learning what the translation unit for that language actually is? That’s the real issue, and it is an easy one to fix.


#7

…wasn’t the point that, while all of this behavior is “by design,” it’s that design itself that makes it harder to distribute Rust builds? Learning what a Rust TU is (which I suspect they already know) doesn’t change that at all.


#8

…wasn’t the point that, while all of this behavior is “by design,” it’s that design itself that makes it harder to distribute Rust builds?

This design makes it actually way easier to do distributed builds. We give you a tool to compile a TU, and the TU dependency graph (which, btw, C++ does not give you).

The only problem here is that they were trying to compile .rs files independently from each other, which is something that does not make any sense.

Is like if I tried to make a C++ distributed build tool that tries to “compile” .cpp files independently from .h files, and then would complain that I can’t compile a .cpp file without reading the .h files it includes. You would tell me, “yeah, that’s by design, that’s not how it is supposed to be done, don’t try to do that it cannot ever work”.

This is the exact same thing. I can imagine a new Rust developer with a C++ background thinking: “if a .cpp file is a C++ translation unit, maybe a .rs file is a Rust translation unit?”. And that’s a fair question for a beginner to ask, but I definitely expect this to be the first question somebody working in a distributed build system for Rust to ask and clarify before proceeding with actually building anything.


#9

I mean, this is what incremental compilation does. So it makes some sense.


#10

You’re missing the point. Rust TUs are much larger and there are far fewer of them. This makes distributed builds less useful- the “hundreds of workers” @olson-dan mentioned now have nothing to do.

Besides, the compiler already parallelizes within crates even without incremental compilation like @steveklabnik mentioned! It’s not unreasonable to want that to work across machines. There has even been mention of doing exactly this.


#11

What you mention is subtly different: you are talking about doing this within rustc. I agree that it makes sense to do this, and many other things, within rustc (there was a thread about doing distributed codegen somewhere).

But as an external vendor trying to build a distributed compilation system for Rust? Sure, they could write their own Rust compiler that does this, but good luck with that.


#12

C++ has exactly the same problem, and the solution is exactly the same in both C++ and Rust. What do you do in C++ when a single translation unit significantly dominates your build time?

You split it into smaller translation units.

EDIT: IMO doing distributed builds, codegen, and what not, is all nice to have. I am not against any of it. But using distributed builds for a project with 50 translation units is not something anybody would suggest is a good idea in C++, and I don’t think it would be a particularly good idea for a Rust project.


#13

This is absurd. Someone brings up real obstacles to doing usefully-distributed builds of Rust code and you start nitpicking what a TU is.

The first post you responded to said “rather than amortizing away the cost of compiling individual source files, rust can only amortize away the cost of compiling individual crates.” They clearly know what a Rust TU is and are talking about the effects of that design.

There’s no reason to assume they’re just confused when we already see ways we could improve the situation beyond “oh just split up your crates into more crates.”


#15

My other response to this was incorrect. I thought I had read somewhere that the “module” was the TU for Rust, but, that is not the case.

However, I think you’re missing the point of the person that was explaining the TU. I think the confusion comes from C/C++ world seeing the “Crate” as equivalent to “Overall Project” or “Large Monolithic Library” whereas the intent of Crates is to be used for very small things as well and build up larger crates by “using/importing” other smaller crates. If this philosophy is followed, then the TU’s of Rust are rather small. In this case, parallelizing (even remotely) becomes much better.

So, think of it this way: If someone created some HUGE .cpp file with tons of headers and contained classes and implementations and everything. That would be a TU. It wouldn’t parallelize well. You’d break it up in perhaps .cpp files with one (or perhaps a couple of related) class per .cpp file. So, in C++ if you have too big a TU, you break it up.

In Rust, if you have a Crate that is too huge (too much in it). You break it into smaller, dependent crates thereby creating smaller TUs. Cargo produces the TU dependency graph easily and each TU can be compiled separately (even remotely).

I think that is the point the person was trying to make. They weren’t trying to nitpick on what is a TU. They were trying to explain that a TU in Rust is not an .rs. Just like in C++ a TU is not just a .cpp (it’s the .cpp and all it’s .h files).

In both cases, if an individual TU is too big, you break it up.


#16

Even “smaller, dependent crates” are larger than single source files. If Rust projects tended to have as many crates as C++ projects have source files, we wouldn’t be having this conversation to begin with.

The same reasoning applies to incremental compilation. If crates were as small as C++ source files, Cargo’s existing logic around rebuilding crates would be enough.


#17

I’m going to go out on a limb here and make the assertion that somewhere between 1 file (and headers which are generally numerous and nested) and 1 crate (which consists of numerous files) there is pretty much an equivalence. Perhaps, on average, Rust Crates tend to be larger than individual *.cpp and dependent .h files, but, that can often be attributed to failure to appropriately sub-divide the problem. Just like in *.cpp. If someone create a .cpp file with too much in it, they’ve created a monster TU. A monster TU is a monster TU and there is an effective method in both cases for make a TU smaller. Now, this doesn’t mean it should be the equivalent of 1 class per Crate (that would likely be ridiculous), but, it does mean a Crate should consist only of closely interoperating structs/impls/macros (aka Classes). If that is the case, it is unlikely that parallelizing further down to the equivalent of individual classes would buy much in terms of parallel compilation performance, and likely, due to linking, would be a negative.


#18

The compiler can split up a single crate into what we call “codegen units” (and does so by default these days), each of which is then translated separately into an object file. Crates, on the other hand, have a semantic meaning in Rust because they constitute boundaries that make a difference for trait coherence. There’s some interaction between the two concepts that restricts what we can do but since codegen units are considered an implementation detail, we also have some flexibility what goes where.

I’d personally like to see us supporting distributed builds in the future. We’ll probably need to get creative to also support them efficiently and in a scalable way though. I’m looking forward to that :)


#19

Oh, so, was I correct in thinking that the true TU is the module? Or is it even smaller than that?


#20

The vendor I was working with maintains a distributed build tool for C++ that is provided privately to their licensees. I was looking for ways to make Rust compiles use this tool and was working with their support on this project but as we observed how Rust builds it didn’t seem like it could go anywhere.

I don’t fault any of Rust’s choices at this point… they all seem like generally good choices, but they have the side effect of making distribution both harder to perform and less effective than C++.

Distribution performed in a different way than a C++ tool would is an interesting concept but it would be cool to be able to leverage the existing tooling in this area. I already run into resistance deploying Rust at all and I imagine there would be similar resistance if I had to deploy rust_compile_server to every dev machine or something.

Mostly I’m hoping people are just thinking about this. Some of the issues linked show that it is on the radar, but not necessarily in the way I was hoping. This is fine, but you know something like “provide a fast/easy way to enumerate the input files (source and otherwise) needed by a crate” would already be helpful for working with existing distribution tooling.


#21

when we already see ways we could improve the situation beyond “oh just split up your crates into more crates.”

The tracking issue about compiling a single crate with multiple rustc process is not as postivie as you are although @michaelwoerister sounds very positive here. IIUC the issue correctly, and michael please correct me, everything inside a crate can depend on everything else, so every process needs to have the whole crate in memory (increases memory consumption linearly), lot of frontend work must be done twice (parsing, macro expansion, …), and extra work must be done to join the results. What can be done in parallel is LLVM codegen which Rust already does.

From all the approaches to speed up compile-times, compiling a crate using multiple rustc processes simultaneously is the one that looks less promising to me (lot of work for unknown wins).


If Rust projects tended to have as many crates as C++ projects have source files, we wouldn’t be having this conversation to begin with.

This is a cultural issue.

FWIW my Rust programs compile way quicker than my equivalent C++ programs, maybe because, just as in C++, I structure my code in such a way that it can be compiled in isolation (using crates in the 1-5kLOC range).

I really don’t know what the people writing 30kLOC crates (rustc, I am looking at you) are actually expecting. They write code that must be compiled serially, and then complain that is not compiled in parallel, suffering huge compile-times in the meantime. Maybe one day the magical FTL compiler arrives and code that must be compiled serially will be compiled in parallel.