[IDEA] cargo command for build cache within containers

Hello fellow rustaceans!

The TLDR is that there's no way to prefetch dependencies with Cargo, so within Docker images dependencies have to be fetched and built whenever the project source changes.


I've run into an inefficiency with cargo lately when building rust packages within Docker images. The naive process is to start with the official rust docker image with all the tooling, add all of the source code with something like COPY . ., and then use RUN cargo install --path . or something similar. Now whenever I change my project source, the COPY ... step is invalidated and every step after it, so on the subsequent build all of the dependencies have to be fetched and compiled again, even if they haven't changed.

In other languages, like JS for example, this is solved by copying over the project file first, installing the dependencies, and then copying over the source. By doing this, changes to the project source don't invalidate the step in which the dependencies are fetched/built, so the build time is dramatically reduced after the first build.

The best solution to this problem that I can foresee is to add a cargo subcommand like cargo prefetch whose purpose would be to fetch all of the dependencies defined in the Cargo.lock file and build them for the release target by default, without requiring access to anything in the src directory. If this subcommand existed, one could copy over all their project config, run cargo prefetch, and then copy over the rest of their source, and then finally run their build command of choice.

My workaround at the moment is basically just copying over all the project config and a src directory with a boilerplate main.rs, running a preliminary release build, then copying over all the actual source and running the build again.

I have considered the solution of just not developing within Docker images and instead only building within a Docker image when the project's mostly done, but it's just not tenable for me. Many of the projects I work on require complex infrastructure which demands the app be tested inside a compose setup, so testing builds inside of Docker images is a must.

If a cargo command already exists for this please let me know. I looked but might have missed it. Otherwise, I think this could be a useful command to add to cargo, and not one that I think would require groundbreaking work.

Please let me know your thoughts on this. I haven't been working with rust for long, only about a year or so, so I'd love to learn about any potential design patterns or policies rust employs that would bar it from such a subcommand. Anyways thanks for reading!

4 Likes

Does it not work for you to copy Cargo.lock, and then run cargo fetch?

Partially. It fetches the dependencies but doesn't build them. The problem here is that usually building dependencies takes far longer than fetching them, so even if I fetch all the dependencies beforehand it still takes 2 minutes to compile ≈300 crates on my hardware every time the source changes.

Perhaps though the existence of the fetch subcommand means the proposed subcommand should be named something like prebuild or dep_build.

"Build with a dummy src" is the usual workaround. The tricky part is that docker-style caching is only file-level granular, and Cargo.toml contains both information about upstream/dependencies and the local package, so it's not possible to say whether you need to rebuild upstream based only on looking at file mtime.

Additionally, cargo will refuse to do anything if you don't have any src at all, because it doesn't know how to interpret a misconfigured manifest, and a manifest without any local crates/targets is misconfigured. This holds even for commands that could otherwise function, like cargo metadata, or a hypothetical cargo build --only-deps.

If you're going to need to stub out the local build targets anyway in order to satisfy coarse file-based change tracking, there's little benefit to a command which only builds upstream, since the non-stub presence of the local targets' code will be tracked into your caching layer anyway.

The proper solution is a slightly smarter caching layer which can cooperate with Cargo's caching. The way the typical caching action for GitHub Actions CI works is by removing the local targets' build artifacts from the target cache (something which is easily knowable) and submitting what remains for the persistent cache (keyed on the lockfile). This gets correct caching behavior by getting Cargo to do the hard work instead of reimplementing a worse version of it.

Ideally for development, you'd use a different scheme altogether which allows you to take advantage of incremental, both when adding/changing upstream dependencies and for changes to local code.

[aside] nix probably fixes this, somehow [/aside]

1 Like

Hi, this requirement has been asked by a lot of people already (see cargo build --dependencies-only · Issue #2644 · rust-lang/cargo · GitHub). The problem is not straightforward, due to how Cargo works (just as an example, using Cargo.lock for compiling dependencies is not enough, as you need to also know the correct compiler flags).

I have documented the current state of "fast" Docker builds using Cargo here: Better support of Docker layer caching in Cargo - HackMD. TLDR: Currently the best workaround is to use cargo-chef.

10 Likes

Just throwing in a link to a recent series of fasterthanlime articles that gives a glimpse of how you'd do this using nix: Part 9 covers the nix build methodology, with the final docker build flake in part 11. In my limited experience with github:ipetov/crane, the dependency caching is done at the nix derivation level, not necessarily using docker layers. When you change a source file, it still recomputes the dependency list, but each built dependency is cached separately, so you get lots of cache hits.

I imagine the crane library is internally using tricks similar to those already mentioned.