Discussion: Enhanced License Compliance for crates.io

Discussion: Enhanced License Compliance for crates.io

Summary

I'd like to start a discussion about implementing enhanced license validation and compliance checking for crates published to crates.io to address critical gaps in license metadata accuracy that impact enterprise adoption, particularly in safety-critical and embedded systems.

Background & Problem Statement

The Current Issue

Rust's current package metadata system relies on developers manually declaring licenses in Cargo.toml, with no validation against actual source code licensing. This creates significant compliance risks when the declared license doesn't match the actual licensing terms found in source files.

Concrete Example

I recently analyzed the av1-grain crate (v0.2.3) and discovered a license compliance discrepancy:

Cargo.toml declares:

license = "BSD-2-Clause"

But source files (src/create.rs, src/lib.rs, src/parse.rs) contain:

// This source code is subject to the terms of the BSD 2 Clause License and
// the Alliance for Open Media Patent License 1.0.

This means the actual licensing is BSD-2-Clause AND AOMPL-1.0, not just BSD-2-Clause. This discrepancy is confirmed by Debian's package analysis, which correctly identifies both licenses.

(Note: I apologize for using av1-grain as a specific example - this appears to be an honest oversight by the maintainers, and this issue is likely widespread across the ecosystem. The goal is to prevent such issues systematically, not to criticize any particular crate.)

Additional Complication: Non-SPDX Licensed Code

There's an additional layer of complexity in this specific example: AOMPL-1.0 (Alliance for Open Media Patent License 1.0) is not currently a recognized SPDX license identifier. This creates a deadlock situation:

  1. Current Cargo requirement: License fields must use valid SPDX expressions
  2. Reality: The source code is actually licensed under AOMPL-1.0
  3. Impossible compliance: There's no way to accurately represent this licensing in current Cargo.toml format

This means even if the maintainer wanted to fix the license declaration, they couldn't do so using standard SPDX expressions. They would need to either:

  • Use license-file instead of license field
  • Wait for SPDX to add AOMPL-1.0 to their license list

This highlights that license validation tooling needs to handle non-SPDX licenses gracefully, which is common in:

  • Emerging standards (like AOM specifications)
  • Corporate proprietary licenses
  • Modified or custom open source licenses
  • Legacy licenses not yet in SPDX database

Why This Matters for Enterprise Rust Adoption

Safety-Critical Systems: Industries like automotive, aerospace, and medical devices require exhaustive license compliance for regulatory approval (ISO 26262, DO-178C, IEC 62304). Inaccurate license metadata can:

  • Invalidate safety certifications
  • Create legal liability
  • Require expensive re-certification processes

Enterprise Compliance: Organizations must generate accurate Software Bills of Materials (SBOMs) for:

  • Supply chain security requirements
  • Legal risk management
  • Customer contractual obligations
  • Regulatory compliance (EU Cyber Resilience Act, etc.)

Rust's Current Strengths

It's important to note that Rust already does significantly more for compliance than many other languages. The existing SPDX license validation in Cargo.toml, comprehensive metadata in Cargo.lock, and the centralized registry model already put Rust ahead of many ecosystems.

Looking at other language ecosystems:

  • npm/Node.js: Similar metadata-only approach with license field validation, but no source code verification
  • PyPI/Python: License information is often inconsistent or missing entirely
  • Maven/Java: Has license metadata but relies heavily on manual declaration
  • Go modules: Minimal license metadata in go.mod files
  • C/C++: Generally no standardized license metadata at package level

I'm not aware of any major language ecosystem that currently provides automated source-code-to-metadata license validation out of the box. This could be an opportunity for Rust to lead in this space, which would be particularly valuable given Rust's growing adoption in compliance-critical environments.

However, the gap between declared and actual licenses still creates real problems for enterprise adoption, especially where Rust's memory safety and performance advantages would be most beneficial.

Potential Approaches for Discussion

I'd like to hear the community's thoughts on these potential approaches:

Option 1: Mandatory Validation

Publishing Requirements:

  • All crates must pass automated license validation before publication
  • Source code headers must match Cargo.toml declarations
  • Required license files must be present for certain license types

Implementation:

// Enhanced cargo publish validation
cargo publish  // Now includes mandatory license verification

Pros: Guarantees accuracy, eliminates the problem entirely Cons: May break existing publishing workflows, could reduce contribution velocity

Option 2: Opt-in Strict Mode

New Cargo.toml field:

[package]
license = "BSD-2-Clause AND AOMPL-1.0"
license-validation = "strict"  # Opts into enhanced validation

Benefits:

  • Safety-critical projects can opt into strict validation
  • Gradual ecosystem adoption
  • Maintains backward compatibility
  • Creates "compliance tier" crates for enterprise use

Option 3: Warning System

Publishing flow:

$ cargo publish
Warning: License validation detected potential issues:
  - src/lib.rs contains AOMPL-1.0 license header
  - Cargo.toml only declares BSD-2-Clause
  - Consider updating license field to: "BSD-2-Clause AND AOMPL-1.0"
Continue publishing? [y/N]

Features:

  • Non-blocking warnings that educate maintainers
  • Gradual ecosystem improvement
  • Low friction adoption
  • Helps catch honest mistakes before publication

Questions for Discussion

  1. Is this a problem worth solving? Are others experiencing similar license compliance challenges in enterprise environments?
  2. Which approach feels most appropriate for the Rust ecosystem's culture and values?
  3. Technical feasibility: How complex would it be to implement reliable source code license detection? Should this be built into Cargo directly or as separate tooling that could eventually replace external compliance tools like FOSSology?
  4. Ecosystem impact: How do we balance compliance needs with maintaining Rust's excellent contributor experience?
  5. Non-SPDX licenses: How should we handle licenses that aren't yet recognized by SPDX? Should validation tooling:
  • Require license-file for non-SPDX licenses?
  • Support LicenseRef- prefixed custom identifiers?
  • Integrate with license recognition beyond just SPDX?
  • Provide workflows for submitting new licenses to SPDX?
  1. Scope: Should this cover all license types or focus on specific categories (e.g., copyleft licenses, patent-related licenses)?
  2. Transition path: How should existing crates be handled if we implement stricter validation?
  3. Alternative solutions: Are there other approaches I haven't considered that might address these compliance needs?

I'm particularly interested in hearing from:

  • Enterprise Rust users dealing with compliance requirements
  • Crate maintainers about the publishing experience impact
  • Anyone with experience in license compliance tooling
  • The Cargo team about technical feasibility

This could potentially position Rust as the leader in supply chain compliance tooling, which seems aligned with the language's focus on safety and reliability. But I want to make sure we're solving a real problem in a way that fits the ecosystem.

What are your thoughts?

7 Likes

How wpuld cargo know what the license of individual files is?

Well I would say it depends on which route we want to take here, but most likely it will involve some form of license header parsing. It does not need to be very complex, but just check if there are license headers, or mentions of licenses, that are not inside the license = "..." key-value pair of the Cargo.toml and take some actions based off of that. Doing so would already solve some problems. If some package maintainer is using the license-file field instead, the licensing is already more complex than a base case and should most likely involve manual clearing.

I think the way to get there is to first introduce this in external tooling, to see if the code to do this is robust enough and to see if there is enough uptake. It seems pretty niche to me -- most code I see doesn't even carry per-file license headers (which I see as a kind of anti-pattern anyway).

14 Likes

You may be right that this is primarily an enterprise/compliance-heavy use case. My perspective might be skewed by working in safety-critical embedded systems where license compliance is legally required. It would be valuable to gauge actual demand before investing significant effort.

I think I might start a tool like cargo-license-audit which will essentially scan all repos and search for mismatches between license headers and actual licenses.

Would you see value in a tool that at least warns about discrepancies between Cargo.toml and detected licenses, even if per-file headers aren't the norm? Or do you think the current cargo/SPDX approach covers most real-world needs adequately?

Thanks for your feedback!

2 Likes

There does already exist tooling in this space, look into cargo-about and askalono.

I don't think this solves the issue. I did not interact with these tools beyond simply trying them and reading some documentation.

cargo-about generates attribution reports based on declared licenses in Cargo.toml, it would happily report av1-grain as "BSD-2-Clause" without catching that the source files are actually dual-licensed.

askalono does excellent license file identification, but doesn't scan source code headers or validate consistency between different license sources.

There is also a tool that I like even more called cargo-deny. However it has the exact same problems, this is why I came up with this thread.

The gap is validation/auditing rather than attribution/identification. We need tooling that says "Hey, your Cargo.toml says BSD-2-Clause, but your source files indicate BSD-2-Clause AND AOMPL-1.0 there's a compliance problem here." for the developer or rather "Hey the dependency xxxx says it's BSD2-Clause licensed, but there are AOMPL-1.0 headers present in its code. This might by a compliance issue".

Neither existing tool would have caught the av1-grain issue I discovered, where FOSSology found licensing discrepancies that cargo-deny missed entirely.

So while those tools are valuable for their intended purposes, there's still a need for license consistency auditing across the dependency tree - especially for enterprise compliance where accurate SBOMs are legally required.

Does that distinction make sense?

3 Likes

What evidence do you have for this? One datapoint is not statistically significant.

There is very little reason to repeat the license from the cargo file in the source files. I consider it pure noise and don't do so in my own code at all. I'm strongly opposed to requiring that. Stating the license once for the entire repo should be enough (and you can state any deviation in the files that differ where those files are less restrictive in license).

No, all your files can be equally wrong, and you can't know. If you need to care you need to do it manually anyway.

2 Likes

SPDX does have a way to refer to unknown licenses.

Likely a first step in any of this is

As for license headers when people want them, we should likely direct people to dedicated tools and processes, like https://reuse.software/, rather than invent it ourself.

There are still the points of

  • NOTICE file generation for final artifacts. This is likely best developed outside of Cargo
  • License compliance checks. This would be Cargo giving legal advice which seems risky. As an alternative, cargo-deny has a license allowlist/denylist which helps users with the license choices they've already made
1 Like

You're absolutely right that one example isn't statistically significant. I made an assumption based on finding this on my first random check, but that's not rigorous evidence.

I also think there is no good reason to have license headers in code. However if there are any license headers and they are different than the license in Cargo.toml it is problematic for a user of said dependency. Thus, if they are present they must match.

If there are differences between license headers and licenses in the Cargo.toml I think there is no way around a manual review anyway as a dependency user. However, cargo may warn dependency developers when there are discrepancies.

I just think that helping this process from the cargo side, would make the language much more leading in enterprise / supply chain contexts.

No matter what the details are, license auditing has the potential to put significant amounts of extra work on crate maintainers. Since license auditing is mostly of value to large for-profit organizations (as you point out yourself), I think that they ought to be prepared to pay for this work to be done, including all the exploratory work of figuring out what should be done.

I don't think anyone should put even a minute's more effort into any aspect of this proposal other than finding out if anyone is actually willing to pay for the work. As long as nobody is willing to pay for the work, the Rust community should actively refuse to do anything about it.

14 Likes

Detection of licensing issues from code comments might be a tricky problem, but there are other things that Cargo could do to improve licensing overall:

  • Licenses often require the license text or a notice to be distributed alongside software. It would be helpful if Cargo detected when there's SPDX identifier for such license, but without the corresponding notice. The notices require specific text to be used, so it's easier to detect a notice than a license name from arbitrary comments.

  • These requirements can apply to binaries, requiring all licenses from all deps to be collected for redistribution. cargo-about does it, but gathering the list of crates and their dependencies is an involved process. Cargo could help by emitting list of deps linked into the binary and their license/license-files as part of the build (so that instead of using a tool that parses manifests and fetches stuff from registry, a minimal way to be compliant could be something like cat target/release/mybinary-licenses/* > dist/COPYING). Or maybe cargo tree -e license?

  • I'd prefer license-file to be allowed alongside license identifiers, and behave more like readme and build properties, bundling LICENSE or COPYING files by default.

1 Like

Even this has surprising complexity:

  • Do you normalise the license file for newlines (LF vs CRLF)?
  • What about the BSD and MIT license where you are supposed to replace the copyright and year?
  • GPL for example is also officially available as a markdown file, LaTeX, restructured text, or even RTF or ODF. What happens if someone runs a markdown formatter on it with a differnt line length? It is going to render the same, but that doesn't seem like an easy level to compare at. The RTF and ODF would be even worse to compare.

There are probably more difficulties that I'm not thinking about. (And I myself have used the markdown version of GPLv3 as I prefer having nicely formatted files in my repo. I have not run a markdown formatter on that specific file though.)

This proposal could have stood to use a bit less LLM.

3 Likes

I think there may be a misunderstanding about my intent. I'm not asking the volunteer community to build this. I'm exploring whether there's enough enterprise interest to justify company-funded development that could benefit the broader Rust ecosystem.

My actual position: Companies with compliance requirements should fund the development of tooling they need. However, if multiple enterprises face the same challenges, it makes sense for them to collaborate on open source solutions rather than each building proprietary tools in isolation.

When companies contribute compliance tooling back to the ecosystem, it may create a cycle:

  • Enterprises get the compliance capabilities they need
  • Rust becomes more attractive to other regulated industries
  • The broader ecosystem benefits from better tooling infrastructure
  • Future enterprises have lower barriers to Rust adoption

Even if no companies step forward to fund this work, the discussion helps clarify what compliance challenges actually exist, whether there's real enterprise demand, what approaches might be feasible if someone does want to fund development

I'm not expecting volunteer maintainers to take on enterprise compliance work for free, but I am interested in whether enterprises and the rust community see enough value to collaborate on shared solutions.

2 Likes

AOMPL isn't really a good example, since it's a patent grant, not a copyright license. The right way to declare patents and patent licenses isn't obvious, and that's probably why it wasn't part of the license field.

2 Likes

https://github.com/rust-lang/rfcs/pull/3553 is relevant. We are limiting it to unique content, meant to be paired with other tools like cargo metadata. iirc it doesn't allow build.rs from including arbitrary content yet but was left to the next phase.

1 Like

Oh, and this exists, cargo tree --edges no-dev --format "{p} ({l})"

1 Like

I don't understand why safety critical software should be highlighted here in any way. Everyone using stuff made by others needs to comply with any agreements that may allow them to do so.

2 Likes

It's for social reasons, not technical. When you aren't trying to meet strict legal certification requirements, a policy of "assume good faith" generally works, and errors can themselves be fixed in good faith without undue strain on any individual actor.

When you are under certification scrutiny, however, you don't get to "assume good faith" anymore. You have to actually go and show that you and all of the upstream you are consuming comply to the often outdated and well-intentioned but ill-implemented guidelines, and if even one little thing is found to be wrong, you're down a lot of money and need to start the whole certification process completely over.

(Or at least, that's how people who aren't under such scrutiny perceive that world as working. I've heard some stories that might suggest it's not actually that different, even if sometimes it should be.)

So these areas get highlighted because they're the only cases likely to even care strongly enough for this to be a problem. Most other companies will take the much easier route of blaming the provider for distributing the library under a license that they didn't have the authority to do so, if any actual incompatibilities ever come up.

5 Likes