This is an initial draft at getting precise dependency information from Cargo for each compiled output. There are already a number of Cargo extensions (auditable, cyclonedx, bom) that could consume this information rather than reconstructing themselves.
I'm particularly interested in what information people need for an SBOM and whether this would be sufficient. This SBOM file generated by Cargo is intended as an intermediate format that could be processed by other tools into standardized formats such as CycloneDX, or SPDX. Feedback from authors & contributors of existing tools like
cargo-cyclonedx would also be very helpful.
- Feature Name:
- Start Date: 2023-11-01
- RFC PR: rust-lang/rfcs#0000
- Rust Issue: rust-lang/rust#0000
This RFC adds an option to Cargo to emit a Software Bill of Materials (SBOM) alongside compiled artifacts. Similar to how Cargo emits split debug info or "dep-info" (.d) files, this change emits an SBOM in a Cargo-specific format alongside outputs in the
target directory. External tooling can consume this Cargo SBOM file and transform it into other SBOM formats such as SPDX or CycloneDX.
A SBOM (software bill of materials) is a list of all components and dependencies used to build a piece of software. The two leading SBOM formats being adopted by industry are SPDX and CycloneDX. Both are still evolving and have multiple specification versions & data formats (JSON, XML).
New government initiatives aimed at improving the security of the software supply chain such as the US "Executive Order on Improving the Nation's Cybersecurity" or the EU "Cyber Resilience Act" require a Software Bill of Materials. Generating accurate SBOMs with Cargo is currently difficult because, depending on target selection or activated features, the dependencies may be different.
For workspaces that generate multiple compiled artifacts, each artifact may have different dependencies referenced. Existing tools (see prior art section) attempt to approximate the correct dependency set, however precise dependency information for each compiled artifact is difficult without built-in Cargo support. Generating the SBOM at the same time as the compiled artifact allows precise dependency information to be emitted for each compiled artifact.
The generation of SBOM information is controlled by Cargo's configuration. To enable SBOM generation, set the following:
[build] sbom = true
If enabled, an SBOM file will be placed next to each compiled artifact for
cdylib crate types in the
target directory with the name
<crate_name>.cargo-sbom.json. The SBOM will contain information about dependencies used to build the compiled artifact. If the performance impact is deemed low enough, this could be enabled by default.
The format will use JSON, but the exact format is not specified in this RFC.
The SBOM will include the following information (if available) for each crate:
- ID (opaque identifier)
- Source (registry / git / etc.)
- Dependencies (list of IDs)
- Type (normal, build)
- Activated features
Information about the current build environment:
- Rust toolchain version
- Current build profile name
- Selected profile values
If a crate is used as both a normal dependency and a build dependency that is separately compiled from resolver v2, then separate entries will exist in the dependency tree with the correct activated features listed for each instance.
It introduces yet another SBOM format. However, the format is specifically designed to be used as an intermediate, to be converted to an industry-standard format by external tooling.
Since there is no consensus on a single SBOM format within the software industry, and existing formats are still evolving, Cargo should not pick an existing SBOM format. If Cargo were to use existing SBOM formats, multiple formats (and multiple versions of each format) would need to be supported. The task of generating a specific SBOM format is best left to applications outside Cargo or Cargo extension.
Unfortunately it's difficult to extract accurate SBOM information with existing options. Using the
Cargo.lock file or
cargo metadata overincludes dependencies. Additionally, since Cargo has many different commands that produce compiled artifacts (build, test, bench, etc.) and each of these commands take arguments that can affect the dependency list it's difficult to ensure that the correct dependency list is used.
Adding an option to
cargo metadata to support resolver v2 would help with overinclusion of dependencies, but still makes it difficult to ensure the exact set of features, command-line arguments, and other options are taken into account.
Another alternative is to extract information by setting the
RUSTC_WRAPPER environment variable, then capture feature flags and dependencies via a wrapper tool. This would require the wrapper tool to parse the rustc command line arguments to capture the set of feature flags and referenced dependencies. This approach would prevent other uses of
RUSTC_WRAPPER, as well as being potentially fragile.
- RFC2801: proposes embedding dependency information directly into the binary. Implemented as the
- cargo-auditable: Cargo extension that embeds a subset of the information described in this RFC directly into the binary. The JSON format used by this RFC could be based on the cargo-auditable format.
- cargo-cyclonedx: Cargo extension to generate a CycloneDX SBOM.
- cargo-bom: Cargo extension to generate a BOM in an ASCII format including license information.
- cargo build-plan (#5579): provides an option to emit a JSON representation of the commands to execute, without actually running them. This option has poor integration with
build.rsand was planned for deletion in 2018.
The exact specifics about what will be included in the SBOM and the specific JSON format are subject to change during the implementation of the RFC.
If the software industry converges on a single, stable SBOM format, Cargo could directly emit it. The existing SBOM formats are currently changing too much at this time to standardize on a specific format.
Additional fields can be added to the SBOM without a breaking change.
Build scripts could communicate back to Cargo to inject additional dependencies into the SBOM. For example, if a crate builds
c code and then links with it, it could emit a message that causes Cargo to read in a file describing the
Cargo would then include the additional dependency information in the SBOM graph.
The implementation of RFC2801 could be based on the information provided by this RFC. A subset of this information could be embedded directly into the binaries.