Rust Compiler Performance Working Group

For regular compilation the compiler defaults to splitting each crate as many CGUs as there are hardware threads. For incremental compilation we have two CGUs per module, one for the generic code and one for the non-generic code. However, any of this can change any time. Especially in the incremental case I suspect that there is lots of room for improvement.

It depends on what you mean by "the whole crate". The AST yes, but everything else is computed on-demand which means that only some of the type information would be duplicated.

In general it would be similar to how C/C++ duplicates work done for header files.

To be clear though: that GH issue is a description of one possible approach. It's the beginning of a discussion, not something that we plan to execute in exactly that form.

1 Like

I think you are looking for cargo metadata (JSON output) and cargo tree (pretty-print) here.

If you find that cargo metadata is not enough for your needs, the best idea would be to fill in a cargo bug, or ask for help in the cargo IRC channel.

This is missing a critical point: you canā€™t have circular dependencies between Rust crates. This can make splitting up a crate very very hard.

Servo has a huge crate that has takes minutes to build no matter how much money we throw at CPUs. (Current rustc has limited parallelism). The open bug about splitting it up just turned 4 years old. This task is now proposed for Google Summer of Code project.

Rustā€™s crates and modules system has many advantages, but thereā€™s no use denying that it is a downside when it comes to parallel and distributed compilation.

6 Likes

First off, I want to say thanks for starting this thread. I think compile times are one of the biggest practical issues facing large scale deployments of Rust. I also think that Rust has the potential to compile faster than C++. Rust doesn't have nasty lumps of textual header includes, and it only has to typecheck its generic classes one time no matter how many specializations it has.

In the "small dependencies" thread a few months ago, I made a post that inexpertly covered the same ground that's being covered here. I can't add much to the technical discussion, but as someone who regularly has to compile extremely large systems on a cluster, I can explain my needs as a user.

First and foremost, relying on the developer to have the good judgment (or "culture") to create small translation units is failure, in the same sense that relying on the user to manually allocate and deallocate memory correctly is a failure. The system should just make the creation of small and efficient TUs the easiest and most obvious thing to do.

The crate is both a unit of compilation and a unit of distribution, and these two are in tension with each other. Whatever happens to make compilation better, I hope that cleanly separating compilation from distribution is one of those things.

Besides, even if you expect your devs to have perfect judgment (which you shouldn't), you can't expect them to have perfect foresight. Sometimes crates will unpredictably grow in size beyond their original purpose, and it's difficult to break them apart after the fact. (I give a real world "user scenario" of this kind of problem in the linked post.)

One other thing I want to emphasize is that cold build times are at least as important as incremental. Cold builds are just a daily fact of life, and tackling the hard problem of making cold builds fast also tends to make incremental faster.

Finally, if there's any further service I can do in this effort, I am happy to help. Perhaps I could help get Rust compiling well with Bazel. I've given the problem some thought already, and I think the impedance mismatch between how Cargo does it and how Bazel does it makes that problem difficult.

9 Likes

@SimonSapin in C and C++ you can have circular dependencies between translation units, but this does not imply that the whole language can be used in the circular dependent parts of the translation units. In fact, if the interface of two dependent TUs depend on each other, there is actually very little that you can do there: e.g. types can only be used behind pointers, templates/inline functions and member functions cannot easily have circular dependencies because they must be implemented in headers, etc. In many cases (not all) circular dependencies in C and C++ are at least problematic and often require workarounds.

Rust has ways of expressing similar things. It might be easier or harder to do than in C or C++ (depending on the case), but I think it is worth pointing out that splitting code into TUs in C and C++ is not always easy either. Books have been written about how to build large C and C++ applications such that their builds do scale, idioms have been discovered (PIMPL wasn't a thing 20 years ago), etc.

IMO the main difference is cultural. The C and C++ communities actively dealt with the problem instead of living with the hopes that the compilers would become infinitely better some day. Maybe because the C and C++ communities are not really tightly connected to their compiler developers.

The Rust community is tightly connected to the Rust compiler developers, and I think this is for the better, but if the compiler developer say that "incremental compilation/parallel code gen/distributed builds/..." will fix all of Rust's compile-time problems then people just say "Rust compiles slowly because it is a young language" and live with it instead of actively exploring how the language can be used to improve compilation times.

I think it would be more helpful to use this close relationship with compiler developers to come up with strategies/guidelines that make code compile fast today. As the compiler gets better over time, some of these strategies might become deprecated, or who knows, some might even get language support. Otherwise it gives the impression that either you can live with long compile-times, or you can't use Rust today - maybe in N years, maybe never, who knows. But this impression is wrong, there are many many things one can do to improve Rust's compile-times today, and if somebody has a compile-time problem, we should have a resource that we can point at them to tell them: these are all the things you can do to make your code compile fast.

@masonk

The system should just make the creation of small and efficient TUs the easiest and most obvious thing to do.

I agree. But this is a hard problem. It is way easier to provide "system" support for it if the users have already discovered the idioms / pattern that work for this. This is basically what happened with C++ modules. They are just a way of enforcing what users of really large builds were already doing (linking TUs hierarchically instead linearly at the end, using compilation firewalls, using PCHs, not using macros in ways that PCHs break, etc.).

The crate is both a unit of compilation and a unit of distribution

I think this is a crates.io issue. There are many crates that are split into sub-crates, where these sub-crates (or some of them at least) are not intended to be used by anybody in isolation. However, crates.io forces you to publish each of these sub-crates as a standalone crate. I'd like to be able to publish a workspace as a single crate.io "crate", and have it include all the crates in my workspace, without "publishing" those independently. Obviously these sub-crates would still be uploaded to crates.io, but maybe they could be "hidden" or somehow marked as not intended for distribution, or...

4 Likes

Iā€™m new to Rust so please excuse my ignorance if this already exist.

For the issue you described with publishing all crates it might be worth looking at how Go handles internal packages. https://golang.org/s/go14internal

Mainly as a mechanism to indicate the package is not intended for external use. From what youā€™ve described designating a crate as internal use seems very useful.

1 Like

That's a great idea. It's of course very reasonable to expect large builds to receive some manual interventions and best practices. I would be delighted read this non-existent doc. In particular I would like the doc to explain the problem from the point of view of the compiler (what it needs to accomplish) as well as the solution (how we as developers can arrange our code to make the compiler's task easier).

That's an excellent idea indeed. There are many compiler settings already today that allow one to make a trade-off between runtime performance and compilation times. We should have a chapter about structuring your codebase with compilation times in mind and what compiler settings there are (cc @steveklabnik).

This should be a separate work, rather than a chapter of the book. I think itā€™d be great though!

This echoes the call for more intermediate (or advanced) level rust guides/tips from the 2018 roadmap.

  • Reduce compilation time
  • Reduce binary size
  • Improve runtime
  • How to structure large codebase
  • etc
4 Likes

Exactly!

Any suggestions of where something like this could be hosted?

https://doc.rust-lang.org/#the-rust-bookshelf

that is, we just add more books. A ā€œtuning performance of rust codeā€ book would be awesome. The process would basically be:

  • start an mdbook somewhere
  • ping me when itā€™s good enough to put in-tree, and we can make it so.
1 Like

The performance of the last Nightly is awful:

I replied to your original post. Iā€™d be interested in profiling your use case.

Iā€™d be interested in contributing, but am unsure where to start. I donā€™t have any experience with profiling code and would like to learn how to do so.

Does the @rust-lang/WG-compiler-performance team still exist? GitHub doesnā€™t highlight this name, and the link in the OP goes 404. I also cannot find a team list on GitHubā€¦

Yes, and theyā€™ve moved from gitter to discord. Thereā€™s a #wg-compiler-performance channel. I donā€™t know if thereā€™s a repo on github. i think not.

I learned that one can only ping GitHub teams of an org if one is in some team of the same org. So strangeā€¦

1 Like