Though, that may require validating it is correct. If you edit source files within the $CARGO_HOME/registry/src/
directory cargo currently does not notice (there are even crates that edit their own source files from build.rs
).
Sandboxing would help with that. And if you don't run builds in a fresh CI environment, an SBOM doesn't help anything against malicious modification anyway.
Given that the author field is discouraged nowadays, including it doesn't sound like a good idea.
What may be more relevant to include is the publisher, though that isn't available anywhere locally currently, only from the crates.io API.
A number of people publish from CI, so that may not be the most accurate either.
A SBOM (software bill of materials) is a list of all components and dependencies used to build a piece of software.
also note that this means it shouldn't include just Cargo Rust dependencies in the dependency tree, it needs to also have mechanisms to describe other language dependencies that are pulled in and built in build.rs
, such as C & C++ dependencies in a -sys
crate, take openssl
as an example.
External language dependencies is a black hole in Cargo today, ideally need some metadata to explicitly describe that in Cargo.toml
for inclusion and licensing perspective.
Another similar component is static libraries linked in (from crates, or from local system).
It's still attached to some account. What are we looking to get from it? Blame? Monitoring for oddities?
Not just "build" dependencies, but tools like protobuf and other executable code generation tools are also involved here.
imo we shouldn't be speculative hashing something "because someone somewhere might find it useful".
Thinking more on this, we should evaluate what we fingerprint for what we should included. For example, for local / mutable units we record the file paths and env variables. For file paths, callers can hash according to their needs. For env vars, we should include the values as those can be too transient
To make this (and the overlaid build graph) easier, I think we should consider tracking units/crates rather than packages within the graph.
Hashes
- registry dependencies: using the crate hash (same as what's in the lockfile) seems best.
- git dependencies: include the git commit hash
- path dependencies: include the path, then postprocessing tools can do choose to do hashing as needed.
Though, that may require validating it is correct. If you edit source files within the
$CARGO_HOME/registry/src/
directory cargo currently does not notice.
I think this issue is out of scope. SBOMs are not supposed to handle compromised build environments.
External dependencies
This is deliberately left in the future possibilities section, as getting agreement without a specification for external dependencies seems difficult enough. Some crates like openssl
have split the foreign source into a openssl-src
crate that would provide some visibility into the external dependency.
It may be useful to have a list of all additional libraries linked in via build.rs
's rustc-link-lib
.
When should this run?
The RFC as written suggests using the build.sbom
config key, which could be activated by the CARGO_BUILD_SBOM
environment variable (almost everything in Cargo's config can be controlled by environment variables). As @epage suggested, it might make more sense to be in [profile]
, or possibly on by default as @bjorn3 suggested.
No additional callbacks will have to be introduced if the file is already placed by the time
RUSTC_WORKSPACE_WRAPPER
runs
It seems reasonable that the file could be written by Cargo before the corresponding rustc
invocation to produce each binary.
There is a number of issues in cargo auditable
that stem from limitations of various parts of cargo
for SBOM use cases:
This last one is preventing adoption of cargo auditable
by Debian, so it is a pretty big deal.
Catching up, sorry for the late reply.
Some people and/or regulations will require timestamps, others won't. It'd be nice to give people the choice.
This is an honest question. Where is this discouraged? The current german SBOM technical guideline (only available in german right now) even says that contact information for the author should be included. I don't think this is reasonable for most projects and I've lobbied for a change here. For non-open-source things having an author might also make sense.

imo we shouldn't be speculative hashing something "because someone somewhere might find it useful".
That's a good point. I'll forward this discussion to some relevant people. Maybe we can get some clarification on their thinking.

Some people and/or regulations will require timestamps, others won't. It'd be nice to give people the choice.
For myself, I consider the format proposed for cargo to be a source format for SBOMs
- So it doesn't need to be configurable based on SBOM requirements
- Information that can be derived from another source should be
So from this
- We don't need bit-for-bit reproducibility, only the format generated from it, so timestamps should be fine
- If the timestamp is for when the binary or cargo's SBOM was created, the filesystem could be asked that, so we might not need timestamps

This is an honest question. Where is this discouraged? The current german SBOM technical guideline (only available in german right now) even says that contact information for the author should be included. I don't think this is reasonable for most projects and I've lobbied for a change here. For non-open-source things having an author might also make sense.
Hmm, looks like its only indirectly in that crates.io no longer shows it.
Authors has a couple of flaws
- The meaning is undefined, so it can't be assumed to be project contacts
- It is immutable within a version which breaks down for project handoffs, name changes, email address changes, etc
If a tool wants to put requirements on authors
for final artifacts, that is a separate concern, independent of cargo's operation. I would also say that this is information that can be derived from other sources and so it is not essential for cargo's SBOM according to my own personal idea I mentioned earlier in this post.
SBOM (Software Bill of Materials). Why should it contain IP addresses, hostnames, authors, temperature, ... It should only contain the dependencies and information about the build.

Why should it contain IP addresses, hostnames, authors, temperature, ... It should only contain the dependencies and information about the build.
I'm not sure what point you're trying to make here. Authors are the only one of those that have previously been brought up in the thread. And the justifications for that have been made pretty clear, namely that it's required by certain jurisdictions (although why is not clear) and that cargo has a field for getting the information (although one whose use is no longer encouraged). I'm not sure what point you're trying to make by bringing up the others.
I want to make the point that SBOMs ought to minimal and only contain the necessary information. It is not clear to me why the author should be included.

For myself, I consider the format proposed for cargo to be a source format for SBOMs
- So it doesn't need to be configurable based on SBOM requirements
- Information that can be derived from another source should be
So from this
- We don't need bit-for-bit reproducibility, only the format generated from it, so timestamps should be fine
- If the timestamp is for when the binary or cargo's SBOM was created, the filesystem could be asked that, so we might not need timestamps
Are we still considering use cases like cargo-auditable where the info is embedded into the binary? If so, wouldn't this run counter to reproducible builds, something that Linux distros for example care about. Though if the SOURCE_DATE_EPOCH
environment variable is properly supported, i guess that is fine.
For reference: SOURCE_DATE_EPOCH — reproducible-builds.org

Are we still considering use cases like cargo-auditable where the info is embedded into the binary? If so, wouldn't this run counter to reproducible builds, something that Linux distros for example care about. Though if the
SOURCE_DATE_EPOCH
environment variable is properly supported, i guess that is fine.
What is proposed here would not literally be embedded in the binary for cargo auditable
cases but might be able to be used as a source for that information. This is covered under Future Possibilities.

I want to make the point that SBOMs ought to minimal and only contain the necessary information. It is not clear to me why the author should be included.
I agree with you but we - in the Rust ecosystem - won't change the rules/regulations/guidelines/laws. We can make it easier for people to comply with them, for that we don't have to like it.
If you're based in Germany the correct place to complain would be the BSI. There is an E-Mail address listed on page two of the guidelines.
In the US you can reach out to the relevant mailing lists of the CISA.
I'm sure other countries have similar authorities.
Both of which I did by the way to ask about hashes, both of which I pointed at this thread but more voices can be helpful.
I agree with you. I live (check my GitHub account) too close to the BSI. My SBOM must contain the author. There is not much that I can do.