[Idea] Declaring obligations associated with licensed use or linkage


#1

An idea was triggered for me in the discussion of cargo audit and verifying your dependencies (also, in terms of Mongodb’s recent license kerfluffle): increasing concern about linking software that utilizes viral licenses.

I was wondering if there was some way to increase the safety of utilizing rust and crates, which may lead to increasing adoption and contribution by some larger entities that may have become gun-shy about relying on open source. The vast majority of crates I’ve seen are released under things that won’t give corporate lawyers pause (i.e. mit, apache, and related such licenses).

While obviously, for convenience, I’d prefer to see the community just embrace non-license-virality as the expected default, and expect/require people to declare in crates when they are breaking that pact, for the sake of peace, my thought is that you could declare your crate’s non-virality, and peer-pressure (i.e. making it a very simple change to the cargo.toml which anybody could pull-request the author about) might take care of the rest.

For those things that are derived, or knowingly link to virally licensed crates or libraries, my thought process would be something like tying a feature to its virality, so the same crate could be used without virality, or with it for the many people who don’t care.

I think you can only provide a suitable guarantee about your own crate (i.e. my crate does not statically link gnu libc). I suppose one could provide assertions about particular versions of your dependencies, but I don’t think those are necessary: if the cargo version checker had the notion of allowed virality, then it could do it at runtime.

That all said, I’m not entirely sure how to go about representing this in cargo.toml, I have some thoughts:

Identifying a viral feature and declaring the rest okay would be nice:

[features]
default=["notviral1", "notviral2", "viral3", "viral4"]
viral=[{name: "viral3', virality:"staticlink", {name: "viral4", virality: "gpl"}]
nonviral=["notviral1", "notviral2"]

My self-declared safe package could maybe can just have

[features]
default=["notviral1", "notviral2"]
nonviral=* 

The last is the one-line change that would be needed to declare my package doesn’t break known virality issues

The user of that crate may like to say:

[dependencies.othercrate]
version="1.1.1"
allow-viral-features = false

or

[dependencies.othercrate]
virality_allowed=[{virality: "gpl"}, {virality: "lgpl"]

So, basically, don’t allow declare static links of it or any crate it depends on, but I’m okay with lgpl and gpl usage.

There could be a couple of different different levels of the type of virality people might concern themselves about: static linkage to a library is a critical one I am aware of that is unaddressed. GPL/GPLv3 type virality is second of concern. I don’t know if dynamic linkage is ever a problem, and there may be others I am unaware of - wasn’t there something with APIs, maybe? Was there something special about deriving source from a GPL’d header - so, if you read a C++ .h file and generate and link a rust file from that, that would be considered ‘derived’ and have a virality problem? Etc.


#2

Note that even “permissive” licenses may still have requirements you must comply with, such as including the notices in application’s documentation. In a sense they’re also “viral”, because your whole project inherits that requirement.

You need something like cargo-license to review all your licenses, as otherwise you’re probably violating the “permissive” licenses too.


#3

I don’t think this needs to be in Cargo specifically. There are already tools out there for listing the licenses of your dependencies. For example:

$ cargo lichking list
warning: IANAL: This is not legal advice and is not guaranteed to be correct.
MIT: strsim, termion, owning_ref, textwrap, clap, void, redox_termios, atty, redox_syscall
BSD-2-Clause: cloudabi
BSD-3-Clause: fuchsia-zircon-sys, fuchsia-zircon
ISC: rdrand
Unlicense / MIT: utf8-ranges, grep-printer, ignore, ripgrep, pcre2-sys, grep-pcre2, pcre2, walkdir, grep-searcher, termcolor, winapi-util, aho-corasick, grep, grep-regex, grep-cli, grep-matcher, globset, byteorder, same-file, wincolor, memchr
MIT / Apache-2.0: rand_isaac, winapi-x86_64-pc-windows-gnu, rand_os, num_cpus, stable_deref_trait, rand_chacha, ucd-util, bitflags, libc, scopeguard, cfg-if, unreachable, memmap, rand_xorshift, log, fnv, serde_derive, smallvec, encoding_rs_io, unicode-width, base64, rand_hc, quote, unicode-xid, encoding_rs, serde, rand, simd, itoa, regex-syntax, lock_api, winapi-i686-pc-windows-gnu, bytecount, proc-macro2, winapi, crossbeam-channel, crossbeam-utils, thread_local, lazy_static, serde_json, rand_core, rand_pcg, regex, parking_lot, syn, parking_lot_core
Apache-2.0 / BSL-1.0: ryu

I personally just check this output every so often (particularly after a cargo update) to make sure that there are no viral licenses, but you could just as easily add a CI check or something and grep for common viral license names.


#4

That an interesting tool, leaves me a little dry for what my actual concerns are: not the source for the crates themselves, which announce the license under which their source-code is licensed, but the ‘drive-by’ use of things that might not.

Would it be better to have a crates.io policy that says you should include a reference of the license of things you link to or derive from at build time?


#5

Note that even “permissive” licenses may still have requirements you must comply with, such as including the notices in application’s documentation.

This is a point: in my paradigm of self-identification, I’d also give that some classification of ‘virality’ which people can choose to accept, or reject the burden of, eyes-wide-open.

For this, it might be nice if a crate could identify, via the manifest, what thing(s) needs to be shipped or included, and to have some level of automation around doing so.

What I’m really trying to puzzle out is how people who work for corporations can avoid getting burned when trying to use rust at its best (with ease and safety of integrating external contributions).

I do think taking things like this into account would help drive adoption, acceptance and, ultimately, contribution among more major players.


#6

cargo-lichking has nascent support for this (properly supports only a limited number of licenses, doesn’t handle extra things like NOTICES for Apache-2.0)

> cargo lichking bundle | head
warning: IANAL: This is not legal advice and is not guaranteed to be correct.
The futures-core-preview package uses some third party libraries under their own license terms:

 * either 1.5.0 under the terms of MIT / Apache-2.0:

    Copyright (c) 2015

    Permission is hereby granted, free of charge, to any
    person obtaining a copy of this software and associated
    documentation files (the "Software"), to deal in the
    Software without restriction, including without

Which it uses to power it’s own thirdparty usage list:

> cargo lichking thirdparty | head
cargo-lichking uses some third party libraries under their own license terms:

 * adler32 v1.0.3 under the terms of BSD-3-Clause AND Zlib
 * aho-corasick v0.6.9 under the terms of Unlicense / MIT
 * ansi_term v0.11.0 under the terms of MIT
 * arrayvec v0.4.9 under the terms of MIT / Apache-2.0
 * atty v0.2.11 under the terms of MIT
 * backtrace v0.3.13 under the terms of MIT / Apache-2.0
 * backtrace-sys v0.1.26 under the terms of MIT / Apache-2.0
 * bitflags v1.0.4 under the terms of MIT / Apache-2.0

In terms of non-Rust dependencies, I’ve long wondered whether -sys crates should be including the license details of the libraries they link against to bring that into the Rust tooling; either by just putting it as part of their license data (i.e. curl-sys is licensed under MIT AND curl rather than it’s current MIT) or as additional metadata specifically for this case.


#7

In the same vein as what @kornel says, perhaps it’s worth considering that there’s any number of reasons for which one might not want to bring in code under a certain license. Say, if one is building free software, they would not want to depend on software which is e.g. source available or has fields of endeavour restrictions.

Abstracting out the requirements of various kinds of licenses seems risky (if not intractable). Individual provisions matter a lot; any organization or individual will have to use their own judgement on what’s legal or acceptable.

For that to be possible, of course, we need crates to be appropriately tagged with the corresponding license information. @Nemo157, in the interest of accuracy, it seems to me the second of your suggested solutions (additional metadata) is the clearest option? The resulting build artifact might only be distributable under license X because it links to a third party library, but there could still be a singificant amount of code in the crate that has been made available under license Y. It seems awkward to have the crate claiming License: X AND Y if the crate as a whole cannot be distributed under X (i.e. it might not be immediately obvious to everyone what that AND means).

On the checking front, the output of cargo-lichking seems like a good basis to work from. Personally, I’d be much more comfortable being able to specify a whitelist of licenses I deem acceptable (and compatible to each other) than relying on some mechanized inference to tell me what license I can distribute under.

Finally, “viral” is a propaganda term. Copyleft-licenced software does not have agency, hence does not infect you. It is the programmer that pulls in external crates. A more accurate term might be “hereditary”; when you create derivatives of copyleft works, then of course the derivative needs to be distributed under the same terms.

As hereditary implies, this is something that is normally done intentionally, i.e. someone actually went out and fetched the copyleft work and wrote code to call out to it. Or, at the very least, they are well aware that bringing in an arbitrary amount of third-party code, without checking the respective licenses, might result in derivatives.

On a more personal note, as someone who has published code under the GPL, I find “viral” to be a gross misrepresentation of intentions (and, hence, somewhat insulting). When I distribute code under a copyleft license, my intent is not to trick anyone in opening up their code[1]. My intent is for the code I wrote to only be distributed as free software and, if it’s a library, to incentive others to work under similar terms. You do not force people to be part of a community.

Hardcoding a propaganda term like ‘viral’ in the tooling is a sure-fire way to alienate a number of potential contributors (and existing users…). Again, I fully support augmenting the tools to enable whitelisting of licenses for code one depends on.

[1] That is not how copyright works in any case. You are never obligated to open up your code. You’re simply obligated to stop distributing /the copyleft code/ under a proprietary license. In the worst case, you would be liable for damages.


#8

If we were trying to deal with an infinite regime of licensing possibilities with some kind of fixed logic, then it would be intractable, but that isn’t the situation: we can rely on basic categorization, and we can rely on the author’s expression of intent, commonly accepted standards, community standards, and the fact that an 80% solution is going to be fine for more than 80% of the people.

Basically, the point is that relying on what it says on the tin ought to be sufficient. While one can reasonably double-check, we all, already, place a reasonable amount of trust in our upstream crate authors. This is just another instance of this.

For instance, in that kind of regime, I could create a crate and curate a collection (say that 3 times fast) of licenses that are acceptable for some classes of usage. A different crate author could depend on my definitions, and chances are, 99% of the time you could pull from mine or someone else’s curated crates and be adequately confident, and covered, with what you’re getting. A small pull request would allow slow dribbling of more compatible licenses. How much liability would I be letting myself in for if I did so? I’m willing to take it on as an experiment.

Viruses don’t have agency either, (and, to continue the simile, their creator does), but I accept the criticism. Can we find neutral terminology?

[edit] I’ve changed the title of the thread. (@Nemo157 I suppose a similar objection could be made to ‘lichking’ name, since the implication is one is trying to control the undead. I don’t care, but I wonder if there’s a place for a less tongue-in-cheek name for the humorless and easily confused? Or are you worried about liability?)

Can we call them “usage obligations”?

Can we classify those common, potentially objectionable obligations?

Forced actions contingent on use of software:

  1. Required to publish works not authored by the package author (i.e. derivative works, or linked works)
  2. Required to publish works by the package author 2.a) with your source distribution 2.b) with your binary distrubtion
  3. Required to publish links, archive, &c
  4. Required to do X (rub your head and pat your stomach, or buy a license)
  5. Required to grant license to derived work.
  6. Required to grant license to patents that are used in derived work
  7. Required to obtain/purchase a license, register, etc.
  8. Special requirements: free-form

There’s probably a couple of other common ones. I do think we could categorize them and automate many of them.

Forced non-actions:(these are probably harder to classify other than in broad cases)

  1. Required to NOT enable someone to do X (e.g. DRM) (this one might be less tractable than the obligations to positive action)
  2. Explicit limitations on upstream liability for downstream use (don’t sue the author for bugs)
  3. Do not use patent litigation against upstream author or other covered entities, etc.

Requirement to publish using certain means:

  1. This one would be be amenable to automation: the bundle thing is good, more of the same.

Finally, finishing the track I started in a different place, I wonder if another thing that may be done is that there has to be a separate policy - probably something with crates.io - that affirms that by uploading, the representations in the metadata are accurate and sufficient. This would also cover things like misleading descriptions, or occult functionality, in addition to these cases.


#9

The license field in crate metadata is supposed to be a SPDX License Expression (with a minor extension to allow using / to mean OR, but this should be being phased out), and this is assumed to be valid by tools like cargo-lichking. Under that interpretation AND has a well defined meaning.

After a little more consideration I definitely think that having this as additional metadata would be the better way to go, so if you are able to replace the linked dependency with an ABI compatible implementation under a different license you know how that will affect the overall licensing of your project.

I’m not worried about liability as the only claim towards accuracy I make is that you can’t trust the accuracy. I am just strongly pro-unique/clever names in software and see no reason to require the name to directly reflect the capabilities, that’s what the description is for (the name actually comes from “li cense ch ec king”, so really, it says exactly what it does). If anyone is really worried they can use cargo-license which offers some of the same capabilities. IMO the difference between this and something like calling the GPL viral is that the king of liches has nothing to do with licensing at all, whereas the term “viral” is referencing an actual component of the GPL license.

Having a classification of licenses like this makes sense to me, I believe I saw something similar related to how npm handles licensing while researching existing tools similar to cargo-lichking. If there were something providing a database of SPDX identifiers to descriptions of the obligations it places on dependents, then I think surfacing this information in cargo-lichking could make sense.


#10

I’m rather concerned that it is a goal to apply peer-pressure. Presumably, if I make a GPL licensed crate, I don’t want PRs filed against my repo asking me, or my collaborators, to re-license under a permissive license. In my view, cargo should not be used as a tool for political ends, so license checking should stay in lichking.


#11

I’m rather concerned that you deleted all the relevant context.

The term “peer-pressure” was perhaps inelegant, but it was directed towards having people declare their own creates safe to use for those (many) people who care about this kind of thing, not directed to changing their license.


#12

I think I can sort of sympathize with the motivation behind that. However,

  1. creating a new ontology for licensing freedoms and obligations is a lot of work (see below)
  2. educating casual users of cargo about it seems like a poor use of everyone’s time

Re: (2), as a user, I wouldn’t want to have to rederive whatever “usage obligations” (again, you also need to specify what the license allows you to do) for license X. Doubly so if I’m pulling in a crate under license X and I’m already familiar with license X; then I would need to carefully parse your more verbose representation of it, instead of simply going “oh, it’s version two of the Apache license, cool”. As you say, in most cases people will already be familiar with the licenses they come in contact with.

Conversely, if someone is not familiar with the license of a certain crate, it is much safer for them to take the time to actually read the actual license before they build anything depending on that crate. If you would like to provide some ontological representation somewhere, perhaps people might find it useful. But if that were the first representation of the license people ran up against, I worry that this would encourage them not to look any further, which they might regret down the line. Most mainstream licenses come with FAQs and rationale documents that explain the concepts that the license relies on, how to apply it to your work, common gotchas etc. Those seem way more appropriate as an introduction to the license for someone who doesn’t know much about it.

Viruses don’t have agency and that’s why this is a bad analogy. It’s also judgemental, implying that a piece of software giving its users the freedom to run, study, distribute modified versions of it and denying them the power to deny others these freedoms with regard to this code, is somehow BAD.

As explained in my previous comment, I think this idea of special-casing copyleft software is a complete non-starter. No matter which word you use, the intent is clearly to treat copyleft software as somehow tainted. Not only am I totally against that, I would very much like to have a generic mechanism that is useful to all developers using published crates.

This is not a requirement of any license I know of. All copyleft licenses that come to mind trigger on distribution. Do you have a specific license in mind?

If X is totally arbitrary… I don’t see how that’s an improvement over pointing people to the license text.

Same as for (4).

Also, I don’t think you can derive all those from a license. E.g. a number of large companies have committed to using v3 termination provisions for their GPLv2 software. The degree to which that is relevant for any piece of software depends (among other things) on the amount of code in a project that those companies hold the copyright for. How do you encode that in your suggested obligations, even in free form? Would you consider that an improvement over “you’ll need to understand these licenses (let alone the copyright landscape for this project)”?

Cool. I had even forgotten SPDX has defined license expressions :slight_smile: I think with a link to some reasonably readable canonical document (perhaps emitted after the license list by default?), that should be pretty easy to understand then.


#13

I’m not a fan of making this a project concern.

a) It would make the Rust project an arbiter about describing properties of licenses and their interactions, which is not our area of expertise. Even if we would build the expertise, we would need to keep it around. We would also have to go bikeshed about naming, “viral” is for example a bad term for what the GPL does (it doesn’t “infect” anything, it expects that all other licenses don’t delude its terms).

b) It’s a task that is often handled out of band, most companies just want a list of used licenses for legal review and legal will have their own tooling. We already have that. Automated tooling might ease things on a small scope, but Rust is often deployed in mixed context - we’d need to integrate with the processes in other languages.

c) As we can give no guarantee on the correctness of our implementations, a legal review can’t be skipped anyways if it’s necessary.

d) I don’t think this will drive or hamper adoption in any way. Companies relying on and shipping FOSS have tooling and processes in place. Projects caring about this as well. Yes, it would be convenient, if the topic space could be easily and exhaustively covered, but it cannot.

To speak from my own experience in dealing with this issue in a corporate environment, we’d need a docx generator first ;).

It’s a perfectly valid idea to try implementing as tooling outside of the Rust project.


#14
  1. Leaving aside the biased term “viral” that has already been discussed…

  2. Cargo and crates.io should be providing mechanism, not policy. It’s fine to help users comply with licenses, but no part of the Cargo or crate ecosystem should be pushing people towards permissive licensing (or, conversely, pushing people towards copyleft licensing).

  3. All licenses have license obligations you have to meet, and if you’re redistributing you have to care about those licenses. You’re singling out copyleft licenses specifically.