[Pre-RFC] Lang-Level License Management

Summary

  • Add a new compiler built-in macro:
    • licenses!(): a list of all in-project and upstream licenses.
  • Add to the rustc CLI:
    • rustc --emit licenses: emit the human-readable license list as part of the compiler output
    • rustc --emit licenses-json: emit the JSON-formatted license list as part of the compiler output
    • rustc --licenses PATH: path to a JSON-formatted list of the licenses used by user-compiled code
  • Add to cargo
    • [licenses] section in Cargo.toml: configure how Cargo generates the license list passed to rustc

Motivation

Every single piece of code in the Rust ecosystem, from a simple "Hello, World" example to a complex application with hundreds of dependencies through Cargo, depends on licensed open-source software. At the lowest level, projects depend on libcore (dual-licensed under the MIT and Apache licenses) and libstd (libcore's licenses, and assorted licenses from libstd's Cargo dependencies), and most real-world binaries directly and transitively depend on tens or hundreds of libraries from Crates.io. The vast majority of that code is made available under licenses that require attribution to the authors and/or reproduction of the appropriate license notice. Manually compiling and maintaining that information is out of the question for pretty much all users.

To my knowledge, there is no tooling that automatically generates the appropriate upstream license file. As a result, I been unable to find a single Rust project that appropriately complies with all of its upstream license agreements. This includes:

  • Servo
  • Ripgrep
  • Rustup
  • Rustc (includes licenses for C dependencies, but not Rust dependencies)
  • Cargo (has LICENSE-THIRD-PARTY file that has not been updated with new dependencies since 2014).

Getting this right is really important, but barely anyone bothers since it's such a massive undertaking. If Rust's flagship projects can't get this right, what hope do average users have to confidently manage it correctly without assistance? As such, the standard Rust tooling should include tools that Just Do The Right Thing, so that Rust projects can be fearlessly legally compliant.

Guide-level explanation

The standard Rust distribution provides tooling to help automatically manage upstream licenses. At a high level, the tooling can be split into two categories: license specification and license retrieval.

License Specification

In Rustc

rustc --licenses PATH is the lowest-level brick in the rust licensing toolchain. It takes a path to a JSON-formatted map between licenses, license text, and libraries.

As an example, the following block is a licenses file that contains serde and rand, used under the Apache license, as well as syn, libc, and cfg-if, used under the MIT license.

{
    "Apache-2.0": {
        "libraries": ["serde", "rand"],
        "text": "{apache license text}"
    },
    "MIT: The Rust Project Developers": {
        "libraries": ["libc"],
        "text": "Copyright 2014 The Rust Project Developers\n\nPermission is hereby granted {...the rest of the MIT license}"
    },
    "MIT: Alex Crichton": {
        "libraries": ["cfg-if"],
        "text": "Copyright 2014 Alex Crichton\n\nPermission is hereby granted {...the rest of the MIT license}"
    },
    "MIT": {
        "libraries": ["syn"],
        "text": "Permission is hereby granted {...the rest of the MIT license}"
    }
}

Note how the Apache 2.0 section contains two libraries from two different authors, as the Apache license text does not contain the copyright holder's name, while there are three different MIT sections:

  • One for libc, under The Rust Project Developers
  • One for cfg-if, under Alex Crichton
  • One for syn, where the license has no listed copyright holder

Rustc combines this file with a built-in licenses file that includes licenses from Rust's implicit dependencies. For example, if the user were building their crate in a no_std context, rustc would combine the user's file with the following file:

{
    "MIT: The Rust Project Developers": {
        "libraries": ["libcore"],
        "text": "Copyright 2014 The Rust Project Developers\n\nPermission is hereby granted {...the rest of the MIT license}"
    }
}

Resulting in the following license map (assuming we use the example user-provided license file above):

{
    "Apache-2.0": {
        "libraries": ["serde", "rand"],
        "text": "{apache license text}"
    },
    "MIT: The Rust Project Developers": {
        "libraries": ["libc", "libcore"],
        "text": "Copyright 2014 The Rust Project Developers\n\nPermission is hereby granted {...the rest of the MIT license}"
    },
    "MIT: Alex Crichton": {
        "libraries": ["cfg-if"],
        "text": "Copyright 2014 Alex Crichton\n\nPermission is hereby granted {...the rest of the MIT license}"
    },
    "MIT": {
        "libraries": ["syn"],
        "text": "Permission is hereby granted {...the rest of the MIT license}"
    }
}

Rustc then renders the combined file into a human-readable version that can be included in downstream applications. This RFC does not specify how the human-readable version should be formatted. Rustc's built-in file would contain more dependencies when users compile with libstd.

In Cargo

The user generally shouldn't have to hand-write the license file. Instead, it gets automatically generated by Cargo based on the crate's dependency tree and passed into rustc through the standard build process.

If a crate is multi-licensed, Cargo will default to using the first license in the multi-license list. However, users can specify a preferred license via the [licenses] section in Cargo.toml:

[licenses]
prefer = ["Apache-2.0", "MIT", "Zlib"]

If present, Cargo will prefer the earliest matching license in the prefer list.

If a crate has dependencies with licenses Cargo cannot detect (e.g. in FFI crates), it can specify external licenses via the licenses.external field, which contains a path to a JSON licenses file relative to the crate's root:

[licenses]
external = "./THIRD-PARTY-LICENSES.json"

Cargo merges the provided license file with its auto-generated license file before passing the file into rustc.

License Retrieval

Users can access the license data either directly in the source code or through the filesystem alongside the crate's build artifacts. Both methods are provided so that crate authors can distribute license information as is best suited for their particular application: CLI applications that are distributed via a single binary may want to expose the license information through a command-line argument, while dynamic libraries or applications with more complex distribution methods may want to include the license information as a file that gets distributed alongside the binary.

A compiler built-in macro is provided to retrieve the license information in the source code:

macro_rules! licenses { () => { /* compiler built-in */ } }

licenses!() returns a list of all licenses used by the library as an &'static LicenseList. LicenseList implements Display so that you can use println! to display all the necessary licenses in human-readable form:

println!("{}", licenses!());

LicenseList dereferences to [License] to allow you to do more complex structural manipulations on the license list (say, if you'd like to use custom formatting that better suits your application):

for license in licenses!() {
    println!("# {}", license.name);
    print!("Used by ");
    for library in license.libraries {
        print!("{} ", library);
    }
    println!();
    println!();
    println!("{}", license.text);
}

Users can specify licenses or licenses-json in rustc's --emit argument, which outputs the human-readable and JSON-formatted license files to the filesystem alongside the standard build outputs. Cargo passes --emit licenses to rustc by default, but this can be customized with cargo rustc --emit.

Reference-level explanation

License resolution

The license map is formatted as follows:

{
    "SPDX License Identifier: Optional Copyright Holder": {
        "libraries": ["library0", "library1", "library2"],
        "text": "License Text"
    }
}

Keys in the license map should be formatted as a SPDX license identifier (when available), optionally followed by a colon and the copyright holder's name if the license text is customized to each particular copyright holder. This isn't strictly enforced, but is followed by all official tooling. If the license identifier contains a colon, it can be escaped with two consecutive colons (ANNOYING:IDENTIFIER -> ANNOYING::IDENTIFIER).

The --licenses argument is only valid for crate-types that produce a binary file intended for redistribution (bin, dylib, staticlib, and cdylib). Other crate types will use the --licenses file specified by the nearest parent crate that accepts the --licenses flag. Attempting to pass the --licenses flag on an invalid crate types will result in an error. This is done to facilitate build artifact sharing. To illustrate, let's say we have a workspace that depends on a licenses_markdown crate, which calls licenses!() and renders it into a Markdown file. Multiple crates in the workspace depend on this library:

workspace_crate_a: cdylib
    licenses_markdown: rlib 
    serde: rlib
    serde_json: rlib
workspace_crate_b: cdylib
    licenses_markdown: rlib
    winit: rlib
    glutin: rlib

workspace_crate_a and workspace_crate_b have different dependency trees, and as such have different license requirements. If the license data were baked into the licenses_markdown build artifacts, Cargo would have to recompile the crate whenever it got linked to a different top-level crate. Deferring licenses resolution to the higher-level crates allows Cargo to re-use the build artifacts from the lower-level crates when building with different parents. Now, let's say we add an additional crate that linked to workspace_crate_a, workspace_crate_b, and licenses_markdown:

workspace_executable: bin
    licenses_markdown: rlib 
    workspace_crate_a: cdylib
        licenses_markdown: rlib 
        serde: rlib
        serde_json: rlib
    workspace_crate_b: cdylib
        licenses_markdown: rlib
        winit: rlib
        glutin: rlib

licenses!(), as invoked for workspace_crate_a and workspace_crate_b, would continue to only contain the licenses they directly depend on (so, workspace_crate_a would not include a license for winit). However, workspace_executable would include licenses from the entire dependency tree, regardless if they were behind a cdylib or not. Bear in mind, if the workspace_crates were rlibs, all invocations of licenses!() would return the same value.


Cargo splits the license string into multiple licenses on any of the following patterns:

  • /
  • ,
  • \bOR\b

Additional patterns may be added as discovered in the ecosystem.

Cargo will attempt to discover license files in the crate's root folder. Cargo will pull the LICENSE or COPYRIGHT file for single-licensed crates, and will make a best-effort attempt to match the license in multi-licensed crates with the following rules by finding the file in the crates root that best matches the particular license's SPDX identifier. This draft currently doesn't define the exact file matching rules, but the full RFC should probably be more specific here.

The license!() macro

The licenses!() macro is added to libcore, and returns a &'static LicenseList. A new licenses module is added to libcore, containing the following types:

#[derive(Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub struct LicenseList {
    licenses: [License]
}

impl Deref for LicenseList {
    type Target = [License];
    // implementation
}

impl Display for LicenseList {
    // renders the complete human-readable license list
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub struct License {
    pub name: &'static str,
    pub libraries: &'static [&'static str],
    pub text: &'static str,
}

impl Display for LicenseList {
    // renders the license in human-readable form
}

We return &'static LicenseList instead of &'static [License] so that the Display trait can be implemented on licenses!()'s return type. The data returned by the macro is adapted from the license data passed into rustc via the --licenses flag.

Drawbacks

More complexity in the language infrastructure. Most languages don't seem to provide a license management mechanism built into the compiler, but most languages don't make it as easy to add new dependencies as Rust does.

Rationale and alternatives

Why is this built into the language?

It's entirely valid to ask, "why are we making this a compiler feature instead of a toolchain feature?". It's not immediately clear that that's the best option; license information is admittedly metadata and you can reasonably argue that this process should be entirely handled by Cargo and related tools. This RFC involves the compiler for two reasons:

  • #![no_std] is a language-level attribute, not a Cargo-level attribute. There needs to be some process for determining whether or not libstd's licenses should be included, and Cargo is currently unable to derive this information.
  • The licenses macro cannot exist without rustc's involvement. If this were a Cargo-level process, it'd still be possible to embed license information into the crate with build scripts, but that's significantly more finnicky and less easy-to-use than a language-level macro.

Given those points, this RFC decides that it would be easiest to include license processing in the compiler as well as in the Cargo infrastructure. It's worth noting that Cargo may eventually be able to manage std - the cargo-std-aware WG is making progress on that front - but that work will likely take years to complete and this RFC is written for the language as it exists today.

Alternatives

  • Let the community manage this, and provide a stable way to access the licenses implicitly used by the standard libraries.
  • Standardize Cargo.toml fields for specifying all necessary license information, and let third-party crates handle compiling that information into a usable form.
  • Instead of having licenses!() return a structure, we could have the macro return a rendered license string.
  • Instead of specifying dependency licenses in the top level of the build tree, we could have each crate pass the license information the crate is responsible for into rustc and include it in the generated rlib. This might actually be the better solution, but I'm not including it as the primary solution in this draft because it's unclear to me how this would interact with dylibs and cdylibs. Worth discussing more thoroughly.
  • Instead of passing license information via a command-line argument, we could pass it via an environment variable.

Prior art

Mozilla Firefox has shell scripts that automatically generate a human-readable license file: https://github.com/mozilla/application-services/blob/master/DEPENDENCIES.md. This isn't entirely complete, as it doesn't contain licenses for Rust's libstd or libcore, but it's the best I've seen.


Various crates exist in the ecosystem for listing and managing licenses

Both crates provide utilities for listing upstream licenses, and cargo-deny provides utilities for rejecting incompatible licenses (say, the GPL). However, neither crate peeks into libstd or libcore, and neither crate automatically assembles a human-readable license file.

Unresolved questions

  • We need to figure out the exact process Cargo uses to resolve license files.
  • What should we do when a crate specifies a license in Cargo.toml, but doesn't include the license file in the Cargo package?
  • What should we do when crate's don't specify a license?

Future possibilities

  • This RFC has cargo choose the first license in the multi-license list so that it doesn't have to be aware of license semantics. In theory, it could automatically select the most permissive license, for whatever definition of permissive Cargo chooses to accept.
  • [licenses] could have a disallowed licenses list that warns if a forbidden license is detected.
  • Cargo could automatically warn upon license incompatibilities.
9 Likes

What about licenses that do not cover distribution of binary modules but only the source code form, or that cover them differently? How would a crate itself need to consider dependencies published under such licenses? The data published on both crates.io and source code on github does not include the source code of dependencies. (Any project with the goal of automatically enabling pre-compiled binary distribution on crates.io carries a huge risk of breaking licensing terms though, now that we speak of it). In any case, this difference should likely be reflected somewhere in the design.

With the usual preface about not a lawyer etc., I had always thought the binary form created from code under MIT (such as part of a compiled, binary package) was not subjected to the license, in contrast to BSD which explicitly covers that as well. (Well, TIL. Time to relicense code). Oh, and a Rust crate seldomly redistributes libcore and libstd as well, it uses whatever cargo determines your system installation.

1 Like

Other prior art: cargo-lichking includes prototypical support for bundling dependencies license information, it is very limited at the moment, and basically only properly supports MIT + Apache-2.0 because I never got around to adding more license texts (but now there's the license crate that includes all SPDX licenses that could be used to very quickly expand the support). As an example you can see the bundle it includes for its own dependencies (it also supports generating a few other data layouts than a source file).

1 Like

First of all, thank you for looking at this problem.

I do think that portions of this are things we should absolutely do. For instance, having a cargo field for "external licenses", to cover FFI.

I think we should be using SPDX wherever possible.

I really think this should be entirely within Cargo, not rustc. I don't think the use case of embedding license text into the binary justifies adding complexity to the language. I think it would suffice to have a reliable way to compute all the licenses that apply to the full Cargo dependency tree (including external libraries).

(That said, for builds that incorporate system libraries, it might be necessary for some crates to determine external license based on which system library they end up building against. So static metadata might not suffice, and this might need a way to emit additional license information for a crate from the build.rs script.)

4 Likes

There's potential future work we could do to exclude licenses that don't need to be included in the binary distribution. However, you're not going to go wrong by blanket-including everything, and it's better to have more licenses than technically required than it is to not have all the licenses you need.

Also, the MIT license says:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

I interpret "Software" as the binary distribution that's compiled from the source code.

Those are absolutely valid points. However, libraries still need to include licenses from libcore and libstd, and the license bundling code should ideally be aware of #[no_std]. How would we go about resolving those issues without the involvement of the compiler?

Yes, but the normal way people prefer to comply is providing a license file in text form, or an about box, not a string embedded in the binary itself.

No argument there. However, there's one problematic case: suppose your crate dependencies can build against different system libraries, and those system libraries have different licenses with different compatibility. Say you have a crate that can build against one library that's under a non-GPL-compatible license, and another library that's under a GPL license. Sometimes you'll want one and sometimes you'll want the other depending on the licenses of your own crate and your other dependencies. If your metadata just gives both licenses then your metadata will imply a license compatibility that doesn't actually exist. (This is a real problem with some license compliance software used in enterprises, which doesn't always understand such cases.)

Ideally, via std-aware cargo. Short-term, until we have that, by having special cases in the tools that just "know" which additional licenses to include for std/core/etc.

That's a good point. I'll add build script license specification to the RFC once I get around to revising it.

If we're doing that, it just occurred to me that we could have the macros without any direct compiler support! Cargo already sets environment variables for builds that get used by external crates to include code generated by the build script ($OUT_DIR). We could include the license types in libcore, have Cargo generate a licenses.rs file with the file's path set to $RUST_LICENSES_FILE, and have the licenses!() macro include the file at that environment variable. This wouldn't need any additional language support; libcore already has the macros needed to build that system, so we could define licenses as:

macro_rules! licenses {
    () => {{
        include!(env!("RUST_LICENSES_FILE"))
    }};
}
1 Like

I don't know when this interpretation began, but I don't think this is how the MIT license was seen traditionally.

The license defines "the Software" like this:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software")

The license text is often included in the source code itself so when the license says "copy of this software", it naturally refers to "this file", meaning the source code itself. Further, the license talks about documentation files (not just documentation) which makes it clear to me that software means the source files, not a compiled binary.

What can I do with such a MIT licensed file? Pretty much anything since the license says I can

deal in the Software without restriction

Then some examples are giving, which includes things like "sublicense". That's what makes it possible to use MIT licensed code in a proprietary program — I'm putting a new license on the code when using it in my proprietary program.

The restriction imposed is that

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

Again, I see this as referring to the source code that was MIT licensed. It doesn't make sense to talk about "half a binary" — where do you cut it in half and will either half be able to run? On the other hand, it makes a lot of sense to talk about a portion of source code (the Software).

My conclusion is that the MIT license doesn't put constraints on how I can use the software and so I can for example compile the code to a binary and distribute that under a proprietary license. Put differently: the attribution requirement only applies to the source form.

There's this thread, specifically this reply written by a lawyer, which basically says "yeah, you really do need the MIT license included with all binary distributions".

5 Likes

Am I right in thinking that both this RFC, and all of the Alternatives require a way to learn what stdlib libraries are being used, and what their licenses are? If so, does it make sense to peel that off into a separate RFC?

I don't want to discourage progress on this RFC, since I think it does a nice job capturing what a fully-featured solution might look like. But I'm wondering if iterating on just the stblib part might be a good "tactical" way to make progress on a smaller chunk of work

1 Like

Ugh... It's pretty clear that at least libstd/libcore ought to be relicensed under a license that doesn't require attribution. That thread is from five years ago. Why didn't we start then?

To link to a comment thread stating as a language evolution concern from five years ago that this should be addressed, to effectively say it has not been addressed, is clarifying but still somewhat ironic, isn't it?

I'm sure I have absolutely no idea what you mean :3

This RFC may be relevant / of interest to this thread:

1 Like

Given that the task of such a tool is "understand licensing", I really don't think it should be the compiler's responsibility.

Given the fact that a #![no_std] crate can trivially use std anyway (it's just an extern crate std away, or a dependency that isn't no_std), you need cargo enforcement anyway to make sure you actually don't link std, or just a plain lack of std for the target. Either way, cargo knows whether std is available or not.

My gut says this should be an external tool to cargo, but distributed with the toolchain, like rustfmt is today. I personally think the best way forward is to teach lichking about more licenses as well as the std licenses, and implement/bless a way of listing external ffi licenses in -sys crates (beyond just licensing the -sys crate the same as the system library).

4 Likes

Another WIP tool from the cargo-deny developers: https://github.com/EmbarkStudios/cargo-about. After a quick skim I feel like this might be a better base to work from than cargo-lichking if there is interest in getting a fully capable thirdparty tool. For one thing it's using a much better method of detecting the actual license files distributed in dependencies packages.

+10,000

I work for a large organization that is concerned with dealing with licensing issues all the time. Our lawyers spend an inordinate amount of time trying to make sure that we're legal on all fronts with our code, and this tool would help them immensely.

That said, I'd prefer it if all licenses were collected (rather than just first preferred); there are licenses that are mutually incompatible, but if the code is dual licensed (or multi-licensed) then there may be a choice of license that will permit use of the code. Our lawyers decide which licenses are mutually (in)compatible, and once they do, I'd like to feed the table of compatible licenses into a tool that will determine if a set of licenses exist that permit use of all of the dependencies. If the set of compatible licenses is empty, then I need to know what the problem areas are so that I can contact the copyright holders and ask if they'll dual license under a different license that we can use. The last part is out of the scope of this RFC, but once we have the ability to gather every single license, we can develop tools on top of the output of this RFC.

We (Embark) just released the first proper version of cargo-about which is our tool to generate human readable complete license listings for Rust applications. Output formatting uses handelbar so user has full control over it (can be text, html, JSON, whatever formatting). We use html for our own use cases and that looks something like this:

It also supports selecting preferred licenses and we have a mechanism to add licenses that Cargo doesn't capture such as dependent C libraries, libstd and such.

So this together with our other tool, cargo-deny, gives us the main components for complying to the license attribution requirements and. Though note that the tools are early, IANAL, and we haven't released a product with them yet with a full review of our legal counsel. But think it is a good starting point and something we would love more contributions to.

9 Likes

This is what cargo-about does today. We have a list of the accepted license and they are in priority order of which one to pick out of all the licenses a crate supports:

In the about.toml that one has for a project to configure cargo-about:

accepted = [
    "Apache-2.0",
    "MIT",
    "BSD-2-Clause",
    "MPL-2.0",
    "LicenseRef-Embark-Proprietary",
    "ISC",
    "CC0-1.0",
    "Apache-2.0 WITH LLVM-exception",
    "BSD-3-Clause",
    "Zlib",
    "OpenSSL"
]

And for some crates that have C dependencies with different licenses than the crate itself we also manually specify those licenses. This I hope to be able to add as additional sub-package licenses in the projects own Cargo.toml files.

[[regex-syntax.additional]]
root = "src/unicode_tables"
license = "Unicode-DFS-2016"
license-file = "LICENSE-UNICODE"

[[physx-sys.additional]]
root = "PhysX"
license = "BSD-3-Clause"
license-file = "PhysX/README.md"