My current quest is to support faster builds but I struggle a lot with .fingerprint and the way nix evaluates builds. I can build a complete rust project from on nix build command, I love this very much as nix then does so much of the heavy lifting.
The fingerprint concept in the legacy cargo backend works because it uses the dep-info from the last build and in nix I don't have such a concept at all.
options
As a result there are a few options:
rework the dep-info file-list scanner so it runs once per project and not as it is now per unit
only for the interactive speedup: split the nix toolchain into many different parts and make cargo the scheduler and keep the already working toolchain for deployments where only the nix files are delivered
support features like cfg (which are on a per-unit basis)
And all of this in one mkDerivation, so I'd need to create a set of unit configurations which are then executed once. The result will be fed into the src filter for the other units. A unit is taken from the cargo nomenclature: smallest possible set of files which can be compiled.
Not sure if that is feasible as it would mean to replicate the rustc effort of creating a dep-info per unit in a separate tool.
Please think like a nix toolchain developer: In case we have a list, which is sorted as an input, we could exactly know what to recompile. If the list changes, recompile but if the list is the same but the file changed, recompile also as the list means that the src input is serialized into a NAR and this has a hash which is used to match the previous build.
This is an issue in the current toolchain I've created: With changing one line in the source code, for building cargo with the nix backend, currently means copying the source code 8 times, parsing & compiling it 8 times where sometimes 2 compiles: cargo(lib) and cargo(bin) would have sufficed.
Here an example:
[nixos@nixos:~/cargo]$ touch src/cargo/lib.rs
[nixos@nixos:~/cargo]$ time CARGO_BACKEND=nix ./cargo build
❄❄❄ snowflake edition ❄❄❄
Using 'nix' backend to build crates
Compiling cargo
Compiling cargo
Compiling cargo-credential
Compiling cargo-platform
Compiling cargo-util
Compiling crates-io
Compiling cargo-util-schemas
Compiling rustfix
Compiling cargo-credential-libsecret
Compiling cargo
...
real 0m48.824s
user 0m14.409s
sys 0m5.242s
2. rework toolchain
As a last resort I would split the toolchain and hand over store paths as inputs to mkDerivations. I don't like this idea so much because it would mean there is now two toolchains to support in the nix backend; a) the interatcive one and b) the complete one which can be shipped into nixpkgs and such. There is other reasons to not like this either, like the dep file discussion: Proposal: make cargo output dep-info
summary
If you have ideas on how to make 1.) work, please teach me!
This feels like this is all about Nix specific tooling. I'm having a hard time seeing what the relevance for Cargo is. We don't expose a dep-info scanner nor do we have a CARGO_BACKEND variable. Cargo does process dep-info files on a per-build-unit basis but that is because the output from one build unit can impact build units down the line.
BTW, Cargo gets the dep-info files from rustc, Cargo doesn't generate them from scratch. That said, Cargo does have its own way to track source files in-memory for each package, but not as accurate as rustc's dep-info files.
It is 8 times because of the root-level targets (as I call them) which are:
cargo-credential
cargo-platform
cargo-util
crates-io
cargo-util-schemas
rustfix
cargo-credential-libsecret
cargo ( count this only once but it is build.rs build, build.rs execution and final build + linking)
In regards to your dep-info internals: I've read the src/cargo/core/compiler/fingerprint/mod.rs source code a couple of times and I wouldn't want to waste the time of you all here with clueless questions. I'm very much aware of the cost it takes to gather the filelist(s) per root-crate. It is 50% of the overall build cost for each because of all the things rustc does. The way the nix backend is implemented at the moment forces me to go a different path than what the legacy backend does.
motivation
I dream of a cargo run parser which executes right after 'cargo build' outside of a mkDerivation, then analyzes all the dependencies based on the central Cargo.toml. In Cargo terms: This might be a workspace with several projects. In rustc terms: we need to identify the file inputs for each unit and these are using macros and features imposing different file sets.
Since rustc logs the list of used files in the target/debug/deps/cargo_files_core-37d81df09a67365d.d for example, I hoped gathering this list would be easily.
I did many experiments for lib/bin builds! A very helpful finding is this:
The command below will create a correct cargo_files_core-37d81df09a67365d.d even when it is stripped of any -L and --extern references. It errors out of course with many error[E0433]: failed to resolve errors:
I haven't locked into build.rs but I fear it will be the biggest challenge yet.
thought experiment FUSE
Maybe we can come up with a new way to generate nix input hashes. The normal way is that a mkDerivation's src points to a directory. This directory is converted into a NAR and the resulting hash is the input hash basically.
Maybe we could delay that hashing and wrap the input using a FUSE module which tracks file/directory access and after the build for the mkDerivation retruns the input hash into the equation.
This way we would not have to copy any files into the builder, but use a read only mapping and we also would not need to qualify the files which should get into the builder.
flakes
Note: The problem to be be solved here is pretty much the same as for flakes in general, see the discussion at Flakes: Not including untracked files is confusing for unaware users. · Issue #7107 · NixOS/nix · GitHub and in flakes it is usually done using 'git add' and then the staging area is copied to the builder. This is done for two reasons: git already computed the checksums of the files and it will only copy tracked files and ignore huge build artefacts like target/ for rust or any other files which happen to be in the project root.
Interesting, isn't that a trademark violation? (Not that that should be the first approach to try to deal with this. More diplomatic approaches should be taken first.)
Do pull requests require a rename? (There is no PR yet but I would like to contribute this). But before even approaching the rust community I wanted to have something which is worth looking at. I also think that I made remarkable progress in this prototype but to get this into a code for a PR there needs to be some more effort.
My current effort is to get active and well established so that we can hopefull join forces!
The Rust Innovation Lab wrote me this:
Thank you for reaching out about hosting your project with the Rust Foundation in our Rust Innovation Lab!
The Lab has been created in order to provide fiscal hosting for open source projects within the Rust ecosystem that meet specific requirements. Projects should be active and well-established, with meaningful impact, clear governance structures, appropriate licensing, and existing or in-progress external funding.
You can find the full eligibility criteria here, which also links to plenty of other info about the Rust Innovation Lab. If, after reviewing our documentation, you're confident your project would be a good fit, then just reply back to me and I'll let you know about the next steps.
There is a common perception that open source should start with code but we've found it is best to start with alignment, which starts with a common understanding of the problem and agreement that it is in scope, and an agreed to solution. For this purpose, we explicitly ask people coordinate with us in Issues first, see Working on Cargo - Cargo Contributor Guide