I'm working for an open source project which cares a lot about building binaries which the community can reproduce by building themselves. Not only that, but we want historical binaries to be reproducible for a long time (years?). However, I've run into a cargo issue which makes this difficult.
Cargo produces hashes of metadata it then passes to rustc, which then uses it to compute crate disambiguators, which in turn uses it to compute symbol disambiguator suffixes. I've found in practice that symbol names (especially if every symbol name in a crate is changed) affect the final compiled output, even if those symbols are optimized out or are eventually stripped from the final binary, though I can't point to a particular reason why.
Cargo includes in its computed metadata hash the source url for the crate. This is problematic for reproducibility in posterity as git source urls can be relocated or become unavailable, and changing the URL in Cargo.toml will result in a different metadata hash, even if the same commit hash is available at a different url. Even the url of the registry crates.io depends on the stability of the British Indian Ocean Territories.
As far as I can tell, symbol mangling v2 does not change this, as it's out of scope of that RFC. I can see two potential solutions to this problem:
- Evaluate whether mixing a crate's source url into its metadata hash is really necessary. Alternatively, for git, can we mix in the ref instead, and for registry packages mix in some package checksum?
- (The Lazy Option) In the future, provide some Cargo mechanism to override git dependencies without changing the url used in the metadata hash. I believe this is already possible for registry packages, but not git dependencies.