Discussion: Enhanced License Compliance for crates.io
Summary
I'd like to start a discussion about implementing enhanced license validation and compliance checking for crates published to crates.io to address critical gaps in license metadata accuracy that impact enterprise adoption, particularly in safety-critical and embedded systems.
Background & Problem Statement
The Current Issue
Rust's current package metadata system relies on developers manually declaring licenses in Cargo.toml, with no validation against actual source code licensing. This creates significant compliance risks when the declared license doesn't match the actual licensing terms found in source files.
Concrete Example
I recently analyzed the av1-grain crate (v0.2.3) and discovered a license compliance discrepancy:
Cargo.toml declares:
license = "BSD-2-Clause"
But source files (src/create.rs, src/lib.rs, src/parse.rs) contain:
// This source code is subject to the terms of the BSD 2 Clause License and
// the Alliance for Open Media Patent License 1.0.
This means the actual licensing is BSD-2-Clause AND AOMPL-1.0, not just BSD-2-Clause. This discrepancy is confirmed by Debian's package analysis, which correctly identifies both licenses.
(Note: I apologize for using av1-grain as a specific example - this appears to be an honest oversight by the maintainers, and this issue is likely widespread across the ecosystem. The goal is to prevent such issues systematically, not to criticize any particular crate.)
Additional Complication: Non-SPDX Licensed Code
There's an additional layer of complexity in this specific example: AOMPL-1.0 (Alliance for Open Media Patent License 1.0) is not currently a recognized SPDX license identifier. This creates a deadlock situation:
- Current Cargo requirement: License fields must use valid SPDX expressions
- Reality: The source code is actually licensed under AOMPL-1.0
- Impossible compliance: There's no way to accurately represent this licensing in current Cargo.toml format
This means even if the maintainer wanted to fix the license declaration, they couldn't do so using standard SPDX expressions. They would need to either:
- Use
license-fileinstead oflicensefield - Wait for SPDX to add AOMPL-1.0 to their license list
This highlights that license validation tooling needs to handle non-SPDX licenses gracefully, which is common in:
- Emerging standards (like AOM specifications)
- Corporate proprietary licenses
- Modified or custom open source licenses
- Legacy licenses not yet in SPDX database
Why This Matters for Enterprise Rust Adoption
Safety-Critical Systems: Industries like automotive, aerospace, and medical devices require exhaustive license compliance for regulatory approval (ISO 26262, DO-178C, IEC 62304). Inaccurate license metadata can:
- Invalidate safety certifications
- Create legal liability
- Require expensive re-certification processes
Enterprise Compliance: Organizations must generate accurate Software Bills of Materials (SBOMs) for:
- Supply chain security requirements
- Legal risk management
- Customer contractual obligations
- Regulatory compliance (EU Cyber Resilience Act, etc.)
Rust's Current Strengths
It's important to note that Rust already does significantly more for compliance than many other languages. The existing SPDX license validation in Cargo.toml, comprehensive metadata in Cargo.lock, and the centralized registry model already put Rust ahead of many ecosystems.
Looking at other language ecosystems:
- npm/Node.js: Similar metadata-only approach with license field validation, but no source code verification
- PyPI/Python: License information is often inconsistent or missing entirely
- Maven/Java: Has license metadata but relies heavily on manual declaration
- Go modules: Minimal license metadata in go.mod files
- C/C++: Generally no standardized license metadata at package level
I'm not aware of any major language ecosystem that currently provides automated source-code-to-metadata license validation out of the box. This could be an opportunity for Rust to lead in this space, which would be particularly valuable given Rust's growing adoption in compliance-critical environments.
However, the gap between declared and actual licenses still creates real problems for enterprise adoption, especially where Rust's memory safety and performance advantages would be most beneficial.
Potential Approaches for Discussion
I'd like to hear the community's thoughts on these potential approaches:
Option 1: Mandatory Validation
Publishing Requirements:
- All crates must pass automated license validation before publication
- Source code headers must match
Cargo.tomldeclarations - Required license files must be present for certain license types
Implementation:
// Enhanced cargo publish validation
cargo publish // Now includes mandatory license verification
Pros: Guarantees accuracy, eliminates the problem entirely Cons: May break existing publishing workflows, could reduce contribution velocity
Option 2: Opt-in Strict Mode
New Cargo.toml field:
[package]
license = "BSD-2-Clause AND AOMPL-1.0"
license-validation = "strict" # Opts into enhanced validation
Benefits:
- Safety-critical projects can opt into strict validation
- Gradual ecosystem adoption
- Maintains backward compatibility
- Creates "compliance tier" crates for enterprise use
Option 3: Warning System
Publishing flow:
$ cargo publish
Warning: License validation detected potential issues:
- src/lib.rs contains AOMPL-1.0 license header
- Cargo.toml only declares BSD-2-Clause
- Consider updating license field to: "BSD-2-Clause AND AOMPL-1.0"
Continue publishing? [y/N]
Features:
- Non-blocking warnings that educate maintainers
- Gradual ecosystem improvement
- Low friction adoption
- Helps catch honest mistakes before publication
Questions for Discussion
- Is this a problem worth solving? Are others experiencing similar license compliance challenges in enterprise environments?
- Which approach feels most appropriate for the Rust ecosystem's culture and values?
- Technical feasibility: How complex would it be to implement reliable source code license detection? Should this be built into Cargo directly or as separate tooling that could eventually replace external compliance tools like FOSSology?
- Ecosystem impact: How do we balance compliance needs with maintaining Rust's excellent contributor experience?
- Non-SPDX licenses: How should we handle licenses that aren't yet recognized by SPDX? Should validation tooling:
- Require
license-filefor non-SPDX licenses? - Support
LicenseRef-prefixed custom identifiers? - Integrate with license recognition beyond just SPDX?
- Provide workflows for submitting new licenses to SPDX?
- Scope: Should this cover all license types or focus on specific categories (e.g., copyleft licenses, patent-related licenses)?
- Transition path: How should existing crates be handled if we implement stricter validation?
- Alternative solutions: Are there other approaches I haven't considered that might address these compliance needs?
I'm particularly interested in hearing from:
- Enterprise Rust users dealing with compliance requirements
- Crate maintainers about the publishing experience impact
- Anyone with experience in license compliance tooling
- The Cargo team about technical feasibility
This could potentially position Rust as the leader in supply chain compliance tooling, which seems aligned with the language's focus on safety and reliability. But I want to make sure we're solving a real problem in a way that fits the ecosystem.
What are your thoughts?