Is a shared build cache a good fit for Cargo?


#1

Does a shared build cache for unrelated projects fit into the scope of Cargo?

Basically, I’d like cargo to search for and link against precompiled crates in a shared build cache. And by extension, that begs the question, how would such cache get populated? In my scenario, I actually prefer the project-defined artifacts NOT land in the cache (rather in a project-local target dir), where I suspect most build cache scenarios would want the final artifacts in the cache too. I could imagine options to tweaks where cargo outputs artifacts, or alternatively, separate cache manager (maybe a cargo plugin?) that manages a cache based on some cache manifest. I think some of these ideas overlap use cases of (but aren’t solved by) cargo workspaces, and on the other end of the spectrum, a distributed build cache seems ideal, but beyond the scope of my current needs.

I’m looking to understand how this might fit into or alongside cargo before I dive deeper into it and possibly contribute to a solution. The rest of my post here is just adding context to my motivation for such a feature.


I’m adding rust as a supported language to a lambda-like PaaS that focuses on hosting algorithms and ML projects as services (because I think there is some interesting ML work in the rust community). Fundamentally you start with a (WIP) template, transform input into output with arbitrary code, and publish it to an API. I’ve hit a wall in actually landing rust support because we’re very sensitive to build times (currently capped at 2 minutes) because of the impact to user experience, especially first experience.

The build environment, the build command, and some boilerplate interop code are fixed, but otherwise the project is user-managed (including Cargo.toml). There are 3 specific cases where we want to minimize user-perceived build time (in order of priority):

  1. First build of a newly created project (from a template that depends on hyper) should be <10 seconds
  2. After deploying rustc upgrades, users should not experience the long rebuild
  3. A lot of projects will add overlapping deps; it’s not a deal-breaker, but ideally we wouldn’t need to compile commonly added deps more than once

Currently, we have a compiled copy of the template project, and just copy its Cargo.lock and target dir into new projects. Anecdotally, it seems like a viable solution to #1. However, that leaves us recompiling every project as part of deploying rustc upgrades to address #2. Recompiling the same version of dependency crates again and again is a bit painful, which could be partially mitigated by first recompiling the template project and replacing target dir, but this sounds like a pretty lousy hack (and solves even less when the template’s Cargo.lock changes). It’s probably good enough to ship minimum viable rust support, but I’m super hesitant about keeping up with the 6-week-train-model until there is a robust cache story - I think the whole process is greatly improved with a shared build cache.

(Sorry if this seems like a re-post of a topic I recently made in the users forum; originally I was looking for immediate hacks, now I’m trying to get a sense of how this aligns with cargo as I consider contributing to a solution).


#2

I think so, but it’ll be a pretty difficult problem and probably require some fairly deep cargo integration. Having this kind of thing is pretty important for people doing lots of large-scale builds, particularly if its distributed.

Probably only @alexcrichton can tell you exactly what needs to happen, but at the least the cached artifacts are going to have to take into account various environmental factors that can lead to incompatible code generation.

For example, for every crate revision cargo builds it might pair it with a hash containing things like the compiler version, the target spec, defined cfgs, optimization flags, etc. Then only when the hash matches can it be considered compatible.

When we do go down the path of adding local caching I’d like to at least have a plan for extending it to distributed caching.


#3

I’m interested in working on such a thing. (Whether I get around to it is another question :wink:


#4

One other thing to consider when designing a binary cache for Cargo is the issue of binary distribution of cargo packages.

As an example, if Rust wanted to distribute a hypothetical “Rust platform” (ala the Haskell platform) that includes a bunch of best-in-class crates, we would want to compile all those crates ahead of time and distribute binary versions. Such a binary distribution would need to be integrated with Cargo so that it understands the binaries we’ve installed correspond to crates.io packages. Depending on the design of the cache, installation of such pre-compiled crates could be a matter of inserting them into the cache.


#5

Cargo’s had requests of this nature from time to time, but unfortunately there isn’t currently anything like this that exists today. The closest approximation (as you’ve found) is to share the target directory and so long as Cargo and the compiler haven’t changed it’ll reuse all of the previous artifacts.

One of the catches to caching Rust crates is that without a stable ABI each build must link to exactly the same version of upstream dependencies. That is, it’s impossible to provide a canonical artifact for a foo v0.0.1 of a crate unless it has 0 dependencies. Those dependencies can be crates like libstd or any upstream crates.io crate, and if anything about them changes you need to recompile locally.

With lock files this can become unwieldy quickly. For example hyper has 13 dependencies, and projects over time may update one or more of those. Each permutation of hyper and its dependencies would need to be separately cached. This would work ok for your “copy/paste Cargo.lock” scenario, but perhaps less so for upgrades of the compiler as projects may have started to diverge in what they’re actually building.

All that’s not to say though that caching isn’t possible! Simply that you may not get as much bang for your buck as you may have intended. My suspicion is that it would be possible to decouple this from Cargo, however, in the form of a ccache-like solution which is just a compiler proxy that manages a cache elsewhere. I also suspect that plugging in a distcc-like solution would also work very well for Rust as any Rust compiler can produce code for any target, you can just ship it the artifacts. Not that this is an easy project, unfortunately :frowning:


#6

I can accept the implications of compiler changes and upstream dependency changes resulting in needing to recompile deps. Sharing the target directory really only hits a wall for me because I need the project’s artifact to NOT reside in the cache location. That creates 2 problems that cargo doesn’t currently solve (and it seems that decoupled solutions would require reinventing much of cargo). To shrink the scope, while staying flexible to future work on more ambitious cache projects, would these 2 features be reasonable?

  1. Enabling cargo to search additional locations for dependencies, e.g. something like a CARGO_SEARCH_PATH and/or a setting in .cargo/config, but falling back to building and outputting to CARGO_TARGET_DIR if no fingerprint match is found. I assume CARGO_SEARCH_PATH would follow the same layout as the current target dir (i.e. deps/build/.fingerprint).

  2. A way to independently control the output location of dependencies versus the project artifact(s). I am concerned such a requirement may be somewhat unique to my scenario, and I tried to work around it with docker volume mounting hacks to separate them, but that falls apart (cross-volume renames, and the single fingerprint location). So even with a cargo search path feature, without a way to independently control the output location, I’d probably resort to a script that builds, and then moves deps/build/.fingerprint to a shared location, and then updating the hard-coded paths in every fingerprint file - definitely not elegant.


#7

Yeah we could bake in some Cargo-specific logic here and there, but I’d be concerned about how complicated that’d be when we’d perhaps have greater benefit from pursing a more general solution. Cargo could perhaps evolve support plugins enough to enable this externally, but that may also be difficult :frowning:


#8

Merkle DAG solves this just fine. Include final hash of deps when computing hash of current crate.


#9

But the target directory already is a cache. Once incremental compilation is written there’s ways to integrate Cargo’s and rustc’s caching, but until then I think just looking at Cargo is perfectly fine.


#10

Hi!

I’m trying out “cargo install xxx” for the first time and when installing racer and rustfmt I see that many deps are same package, same version that are being rebuilt and it certainly takes a long time. Is there any easy way to cache built dependencies for cargo install, or cargo in general?


#11

FYI, @alexcrichton, @glandium and I sat down two weeks ago and hashed out a plan for using sccache as a compiled crate cache from cargo. The initial use case will be for CI builds, which should immediately benefit Rust, Servo and Firefox builds, but I’ve done some work on sccache recently to make it suitable for use in local development, so it we get it all working it should be possible to use sccache for local Rust development as well.

We’re also working on making it possible for people doing local Firefox builds to get cache hits from the cache populated by our CI builders, so once we sort that out it is feasible that we could do the same for Rust projects. It doesn’t seem unreasonable that at the end of that work we could provide a public crate cache usable by anyone doing Rust development, but we might need to do some additional work to make that work well. The “shared cache for CI builds for a single project” use case is a lot smaller in scope!

I’m going to start working on the sccache implementation for caching Rust compilation in Q1 2017 (with my primary focus being to make sure we don’t regress Firefox build times significantly as we add a giant pile of Rust to the build), and Alex has already enabled sccache for Linux/OS X Rust builds in CI, so we should be able to test there fairly easily. I expect the Servo folks will be pretty interested in testing this as well.