Cargo fingerprinting

Hi!

I'm working on integrating nix into cargo, see libnix cargo-nix-backend for what I already did.

My current quest is to support faster builds but I struggle a lot with .fingerprint and the way nix evaluates builds. I can build a complete rust project from on nix build command, I love this very much as nix then does so much of the heavy lifting.

The fingerprint concept in the legacy cargo backend works because it uses the dep-info from the last build and in nix I don't have such a concept at all.

options

As a result there are a few options:

  1. rework the dep-info file-list scanner so it runs once per project and not as it is now per unit
  2. only for the interactive speedup: split the nix toolchain into many different parts and make cargo the scheduler and keep the already working toolchain for deployments where only the nix files are delivered

1. dep-info scanner per project

I looked into GitHub - dcchut/cargo-files: List all source files in a cargo crate. as a start. But what I actually want is:

  • from on mkDerivation with all the src
  • generate file-list(s) per unit
  • support macros
  • support features like cfg (which are on a per-unit basis)

And all of this in one mkDerivation, so I'd need to create a set of unit configurations which are then executed once. The result will be fed into the src filter for the other units. A unit is taken from the cargo nomenclature: smallest possible set of files which can be compiled.

Not sure if that is feasible as it would mean to replicate the rustc effort of creating a dep-info per unit in a separate tool.

Please think like a nix toolchain developer: In case we have a list, which is sorted as an input, we could exactly know what to recompile. If the list changes, recompile but if the list is the same but the file changed, recompile also as the list means that the src input is serialized into a NAR and this has a hash which is used to match the previous build.

This is an issue in the current toolchain I've created: With changing one line in the source code, for building cargo with the nix backend, currently means copying the source code 8 times, parsing & compiling it 8 times where sometimes 2 compiles: cargo(lib) and cargo(bin) would have sufficed.

Here an example:

[nixos@nixos:~/cargo]$ touch src/cargo/lib.rs
[nixos@nixos:~/cargo]$ time CARGO_BACKEND=nix ./cargo build
❄❄❄  snowflake edition ❄❄❄
Using 'nix' backend to build crates
   Compiling cargo
   Compiling cargo
   Compiling cargo-credential
   Compiling cargo-platform
   Compiling cargo-util
   Compiling crates-io
   Compiling cargo-util-schemas
   Compiling rustfix
   Compiling cargo-credential-libsecret
   Compiling cargo
...
real    0m48.824s
user    0m14.409s
sys     0m5.242s

2. rework toolchain

As a last resort I would split the toolchain and hand over store paths as inputs to mkDerivations. I don't like this idea so much because it would mean there is now two toolchains to support in the nix backend; a) the interatcive one and b) the complete one which can be shipped into nixpkgs and such. There is other reasons to not like this either, like the dep file discussion: Proposal: make cargo output dep-info

summary

If you have ideas on how to make 1.) work, please teach me!

@alexcrichton ideas?

2 Likes

This feels like this is all about Nix specific tooling. I'm having a hard time seeing what the relevance for Cargo is. We don't expose a dep-info scanner nor do we have a CARGO_BACKEND variable. Cargo does process dep-info files on a per-build-unit basis but that is because the output from one build unit can impact build units down the line.

2 Likes

Why 8 times there?


BTW, Cargo gets the dep-info files from rustc, Cargo doesn't generate them from scratch. That said, Cargo does have its own way to track source files in-memory for each package, but not as accurate as rustc's dep-info files.

1 Like

Thanks for your effort.

Hi @weihanglo, thanks for your interest!

It is 8 times because of the root-level targets (as I call them) which are:

  1. cargo-credential
  2. cargo-platform
  3. cargo-util
  4. crates-io
  5. cargo-util-schemas
  6. rustfix
  7. cargo-credential-libsecret
  8. cargo ( count this only once but it is build.rs build, build.rs execution and final build + linking)

In regards to your dep-info internals: I've read the src/cargo/core/compiler/fingerprint/mod.rs source code a couple of times and I wouldn't want to waste the time of you all here with clueless questions. I'm very much aware of the cost it takes to gather the filelist(s) per root-crate. It is 50% of the overall build cost for each because of all the things rustc does. The way the nix backend is implemented at the moment forces me to go a different path than what the legacy backend does.

motivation

I dream of a cargo run parser which executes right after 'cargo build' outside of a mkDerivation, then analyzes all the dependencies based on the central Cargo.toml. In Cargo terms: This might be a workspace with several projects. In rustc terms: we need to identify the file inputs for each unit and these are using macros and features imposing different file sets.

I'm currently looking into extending crates.io: Rust Package Registry

1 Like

I will try this implementation now:

As I said in https://x.com/joschelboschel/status/2003095033751191576 it is not as fast as cargo's legacy backend but it might be far better than not having it.

update

Since rustc logs the list of used files in the target/debug/deps/cargo_files_core-37d81df09a67365d.d for example, I hoped gathering this list would be easily.

I did many experiments for lib/bin builds! A very helpful finding is this:

The command below will create a correct cargo_files_core-37d81df09a67365d.d even when it is stripped of any -L and --extern references. It errors out of course with many error[E0433]: failed to resolve errors:

rustc --crate-name cargo_files_core --edition=2024 cargo-files-core/src/lib.rs --crate-type lib --emit=dep-info --check-cfg 'cfg(docsrs,test)' --check-cfg 'cfg(feature, values())' -C metadata=51e15e9c045fed19 -C extra-filename=-37d81df09a67365d --out-dir /home/nixos/ca
rgo-files/target/debug/deps
$ cat /home/nixos/cargo-files/target/debug/deps/cargo_files_core-37d81df09a67365d.d 
/home/nixos/cargo-files/target/debug/deps/cargo_files_core-37d81df09a67365d.d: cargo-files-core/src/lib.rs cargo-files-core/src/parser.rs

cargo-files-core/src/lib.rs:
cargo-files-core/src/parser.rs:

I haven't locked into build.rs but I fear it will be the biggest challenge yet.

thought experiment FUSE

Maybe we can come up with a new way to generate nix input hashes. The normal way is that a mkDerivation's src points to a directory. This directory is converted into a NAR and the resulting hash is the input hash basically.

Maybe we could delay that hashing and wrap the input using a FUSE module which tracks file/directory access and after the build for the mkDerivation retruns the input hash into the equation.

This way we would not have to copy any files into the builder, but use a read only mapping and we also would not need to qualify the files which should get into the builder.

flakes

Note: The problem to be be solved here is pretty much the same as for flakes in general, see the discussion at Flakes: Not including untracked files is confusing for unaware users. · Issue #7107 · NixOS/nix · GitHub and in flakes it is usually done using 'git add' and then the staging area is copied to the builder. This is done for two reasons: git already computed the checksums of the files and it will only copy tracked files and ignore huge build artefacts like target/ for rust or any other files which happen to be in the project root.

I am still confused by this. Cargo has no backends and that directory does not exist, see cargo/src/cargo/core/compiler at master · rust-lang/cargo · GitHub

1 Like

What are you confused about exactly? A lack of files, checkout this then: GitHub - nixcloud/cargo: The Rust package manager

Ok so that is a fork that has not renamed itself.

2 Likes

Interesting, isn't that a trademark violation? (Not that that should be the first approach to try to deal with this. More diplomatic approaches should be taken first.)

Do pull requests require a rename? (There is no PR yet but I would like to contribute this). But before even approaching the rust community I wanted to have something which is worth looking at. I also think that I made remarkable progress in this prototype but to get this into a code for a PR there needs to be some more effort.

In addition I explicitly wrote a WARNING section in the GitHub - nixcloud/cargo: The Rust package manager

My current effort is to get active and well established so that we can hopefull join forces!

The Rust Innovation Lab wrote me this:

Thank you for reaching out about hosting your project with the Rust Foundation in our Rust Innovation Lab!

The Lab has been created in order to provide fiscal hosting for open source projects within the Rust ecosystem that meet specific requirements. Projects should be active and well-established, with meaningful impact, clear governance structures, appropriate licensing, and existing or in-progress external funding.

You can find the full eligibility criteria here, which also links to plenty of other info about the Rust Innovation Lab. If, after reviewing our documentation, you're confident your project would be a good fit, then just reply back to me and I'll let you know about the next steps.

Abi Broom

Director of Operations

Rust Foundation

There is a common perception that open source should start with code but we've found it is best to start with alignment, which starts with a common understanding of the problem and agreement that it is in scope, and an agreed to solution. For this purpose, we explicitly ask people coordinate with us in Issues first, see Working on Cargo - Cargo Contributor Guide

Thanks for your guidance. As requested I've added this post New rust backend: libnix and I'll send an email to the foundation soon.

1 Like