I am currently use Rust for almost every thing, including developing software and data analysis. And Rust works pretty well. But Cargo is annoying for building dependencies separately. This is very annoying because:
- It slow down the build. Even a small project may spend a long time to build even with full parallelization.
- For my use case, I write a lot of data analyzing script/Adhoc projects with Rust. It blows up my disk space really quickly.
- When I need to clean the code or build release build,
Cargostarts from scratch again and again.
So I think it’s at least really useful to make a system that can shares the identical crate builds and speed up the build system.
ccache is an good example for similar system for other programming language, which checks if two builds are identical by examining the source code hash and the compiler parameter hash.
The basic assumption of this idea is software developers use a set of crate with similar configuration, features more often than others. So even though it’s true that Rust build system allows the root crate pass configurations and feature settings to the dependencies, which may lead the output of the dependency bindaries different. But it’s still true once we have a infrastructure that can cache the most commonly used <crate, version, cfg, feature…> and speed up the build.
I did a crude stat on crates in crate.io. It seems 1) there are some most commonly used crates, and 2) there are commonly used feature set for each commonly used crates. For example:
472 "*","features"  480 "^0.2.21","features"  506 "^0.1","features" [""] 711 "^0.1","features"  1004 "*","features" [""] 8460 "^0.2","features" 
309 ">= 0.3.0","features" [""] 356 "^1.0","features" ["derive"] 368 "^1.0.0","features"  1039 "^1.0.2","features"  1538 "^0.9","features"  1846 "^1","features"  1920 "^0.8","features"  8952 "^1.0","features" 
The both library can use the cache efficiently.
Distinguishing different binaries
cargo is able to hash different binaries. The metadata for each unit seems perfectly fits the need for the binary cache. It is mixed with the compilation parameters, all the dependent hashes. It seems with current
cargo infrastructure, it’s not too much work to add a cache layer between compiler and cargo.
Although some crate has
build.rs script, but the system will not cache any build script and it seems
metadata also reflects the changes in build script output.
On challenging issue is we need to make the cache use limited amount of disk space. And it’s generally bad if we put every in the cache and most of them won’t be used in the future. I suppose for this compiler cache, a LRU cache with limited size will fit the need perfectly (because the most commonly used crate + cfg will be always in fresh state)
I currently implemented a prototype without any cache management features, and it seems works pretty well for me. (And it speed up the build a lot after I compiled a few different crates)
I am not sure if there’s any further issue with the change. And I basically want to write an pre-RFC, but I don’t know if there’s any guidelines for writing RFCs.
Any thoughts about this idea?
[update] I’ve open this pr to Cargo.
It seems not I am not the only one really want this. In both PR and this thread has mentioned
RUSTC_WRAPPER. My understand is it is able to avoid recompile the same artifacts, but it still need to copy the artifacts to target dir because this is how cargo works today. (Tell me if I was wrong)
According to the discussion below, I think it’s reasonable to have the first version only sharing the dependency binaries controlled by environment var
CARGO_SHARED_TARGET_DIR. If this variable is passed to Cargo, Cargo will try to use (if binary presents already) or produce (if no binary available) shared dependencies.
Cache management/GC is wanted for some people as well, but as @Eh2406 mentioned, it’s hard to design this part working for most people.
If this works, I probably also want make Cargo tracking some usage information for each artifacts which may be useful for separate GC/cache management program/cargo plugin, etc…