[pre-RFC] Security advisories as part of crates.io metadata

This is a continuation of Idea: Security advisories as part of crates.io metadata. The following text is a copy-paste of https://github.com/untitaker/rfcs/blob/security-advisories/text/0000-security-advisories.md, the text on GitHub and this post will be updated as the discussion progresses.

Since we’ve already had relatively much of discussion about this, please tell me if this is already ready for a pull request.


Summary

Crates.io should offer an API to release security advisories for crates. Cargo should offer a new command for the same purpose and warn about vulnerable crate versions during compilation.

Motivation

When compared to other ecosystems such as Python’s, Rust’s broader community prefers many single-purpose crates over larger monoliths. This situation, together with the strongly encouraged practice of pinning MINOR versions of dependencies, slows down the propagation of critical security fixes.

Assume a crate W, which depends on X, which depends on Y, which depends on Z. If Z releases a new MINOR version including a security fix, it requires the attention of Y's and X's maintainers to propagate that security fix to W. What makes this situation worse is that the author of W is never notified that they were running a vulnerable version of Z all the time.

Tooling that builds on top of the API provided by Crates.io (such as Cargo) could alert crate users of their vulnerabilities, which in turn spurs them to update their dependencies accordingly. Even if that does not happen, the additional metadata at least makes it clear which crates are potentially dangerous to use and which ones not. This not only helps Rust programmers, but potentially also distributors (such as packagers of Linux distros) and end-users.

Detailed design

Crates.io

Similar to yanking, Crates.io should provide an API that allows a user of Cargo to attach an arbitrary amount of so-called security advisories to crates they own.

Each advisory gets assigned an ID that is unique within the set of advisories for the affected crate. Every advisory should have a unique URL, for example https://crates.io/crates/<crate>/advisory/<id>, where <id> is the advisory’s ID. On that URL a human-readable representation of the advisory should be stored.

Other pages on Crates.io should link to those advisories prominently where appropriate.

Cargo

cargo advisory

A command called advisory will be added to Cargo. Here is an excerpt of its help page:

$ cargo advisory --help
Generate and upload security advisories for the given or the current crate.

Usage:
    cargo advisory [options] -- [<crate>]
    --filename PATH      The filename to use. Defaults to `./Advisory.toml`.
                         If `-` is given, generated advisories are printed to
                         stdout and advisories to upload are read from stdin.

    --vers VERSION       Versions to release this advisory for. Can be
                         specified multiple times. Only valid in conjunction
                         with --generate.

    --upload/--generate  Whether to upload or generate a advisory. The default
                         is to generate. These options are mutually exclusive.
    [...]

Like yank it takes a --vers option, with two differences:

  • if a version is not specified, advisory will default to all existing versions.

  • Version ranges such as <1.2.6, >1.0.0 can be specified. This is comparable to the syntax used for specifying dependencies in the Cargo.toml, with the exception that x.y.z is not equivalent to ^x.y.z, but means the exact version.

Here’s the workflow:

  1. The user invokes cargo advisory without the --upload option. Cargo will generate a file under filename. Cargo should abort if the file already exists. The content looks like this:

    [vulnerability]
    package = "mypackage"
    versions = ["1.2.0", "1.2.3", "1.2.4", "1.2.5"]
    
    # It is strongly recommended to request a CVE, or alternatively a DWF, and
    # reference the assigned number here.
    # - CVE: https://iwantacve.org/
    # - DWF: https://distributedweaknessfiling.org/
    dwf = false
    # dwf = "CVE-YYYY-XXXX"
    # dwf = ["CVE-YYYY-XXXX", "CVE-ZZZZ-WWWW"]
    
    # URL to a long-form description of this issue, e.g. a blogpost announcing
    # the release or a changelog entry (optional)
    url = false
    
    # Enter a short-form description of the vulnerability here. Preferrably a
    # single paragraph (required)
    description = """
    
    """
    
  2. The user invokes cargo advisory --upload. Cargo verifies the passed file against the following rules:

    • the file exists and is valid TOML
    • Optional keys may be either false or absent.
    • the description contains not only whitespace. More text than a paragraph should be allowed, but not necessarily recommended.
    • package exists on Crates.io and the versions specified in versions exist
    • dwf is not an empty array. It should be false if there are none.

    If not, Cargo should print one or more error messages and exit.

  3. When the advisory is found to be valid, Cargo should print a summary, ask the user for confirmation and upload it to the package index. The vulnerability ID assigned by Crates.io and optionally the corresponding URL should be printed to stdout.

The recommended workflow is to first file the advisory with cargo advisory, and then release the versions that contain the security fix.

Using vulnerable packages

  • cargo build and cargo install will emit a warning for each vulnerable package used, regardless of whether this package is already compiled, downloaded or otherwise cached, or whether it is a direct dependency or not:

    Downloading foo vx.y.z
    Downloading bar vx.y.z
    Warning: bar vx.y.z (dependency of foo vx.y.z) is vulnerable. See https://crates.io/... for details.
    
  • cargo publish will refuse to upload a crate if any version of a direct dependency satisfying the constraints in Cargo.toml is vulnerable. Indirect dependencies should not trigger this behavior.

    For example, if I have a dependency such as bar = "^1.2.3", this means publish should refuse to upload my crate even if bar=1.2.3 is not vulnerable, as another version satisfying that constraint may be.

The author of a crate that directly depends on a vulnerable crate may disable these warnings with a switch in their Cargo.toml. If iron==0.4.x has an advisory with the ID deadbeef, the dependent author may use the allow_vulnerable parameter to disable all the above-described warnings and errors for this vulnerability:

[dependencies]
iron = { version = "0.4", allow_vulnerable = ["deadbeef"] }

This only affects the warnings for deadbeef for the current crate. Cargo will still print warnings:

  • for other vulnerabilities. Each warning has to be explicitly disabled by appending its ID to that array.
  • if another package in the dependency graph uses a version of iron that has the deadbeef vulnerability, but does not have allow_vulnerable = ["deadbeef"] set.

Cargo must reject nonexistent vulnerability IDs with a fatal error.

Drawbacks

There is a risk that users will abuse this system to mark their versions as deprecated or to call out other kinds of critical bugs such as data loss. This would make the entire advisory system as semantically worthless.

Alternatives

Extending yanking for security advisories

It has been proposed to extend the semantics of yanking such that it could be used for security advisories. While this alternative meets the popular aesthetic preference of having generic commands with a large variety of usecases (over single-purpose commands), using yanking this way has a few drawbacks:

  • Cargo dosen’t allow yanked packages to be used in new packages. In the author’s opinion, people who know what they’re doing should be allowed to depend on vulnerable packages, as they might use the crate in a way that poses no security threat.

    Some vulnerabilities can be mitigated in ways other than upgrading a crate, like making local configuration changes. Some vulnerabilities may affect optional functionality not everyone is using, or functionality that can be compiled out by e.g. disabling certain cargo feature settings for that crate. Some may be relatively innocuous and/or hard-to-exploit and therefore not warrant an immediate upgrade. Sometimes no action (other than setting allow_vulnerable = true) is required at all because the dependent crate never used the vulnerable functionality to begin with.

    At the same time it doesn’t make sense to depend on packages that don’t compile, and currently yanking is primarily used to mark such packages.

  • Cargo doesn’t give any advice about further procedure when yanking a package. I think in the context of security vulnerabilities this is very much needed, as few OSS maintainers are exposed to this problem regularly enough to know what they’re doing.

  • Lastly, the data exposed via the Crates.io would be a lot less structured. Automatic security notifications via third-party tooling would be impossible because there is no way to determine whether a package was yanked because of a security vulnerability or not.

Most of these problems can be fixed by asking the user to attach a “reason” to their yanked packages, such as security, deprecation, broken (and then make Cargo’s behavior dependent on that). However, at that that point yank is no longer generic (as in a function having type parameters), but simply a lot of single-purpose commands stuffed into one (as in function overloading in Java). And the name “yank” wouldn’t make sense for crate versions that may still be available (depending on the “reason”).

Unresolved questions

DWF vs CVE

  • It may be counterintuitive that one can specify CVEs in the DWF parameter. Should it be called cve instead even though it can also be used for DWFs?

Comparison:

  • CVEs are more popular
  • Applying for a CVE number is a manual process and requires review by a human. DWFs can be automatically managed assigned by Crates.io

What to do if dwf = false

  • Crates.io could apply for blocks of DWF IDs and automatically assign them if the user didn’t specify one in the advisory (dwf = false).

CVSS

  • Scoring vulnerabilities. Should a new field for the usage of CVSS be created?
2 Likes

As the one who suggested cargo vuln originally, I’ve come to prefer cargo advisory. I’ve pitched RubyGems on a similar feature, and I think they’ll be going with gem advisory. But yeah, bikeshedding… anyway.

I think it’d also be prudent to include a URL to an official advisory page along with the metadata. This could point to an official announcement about the vulnerability or the relevant sections of a changelog, for example.

1 Like

Done in Rename command to `advisory`. · untitaker/rfcs@cb75ca3 · GitHub

I've jsut assumed that Crates.io itself is that advisory page, but I can see why people would want to avoid being dependent on that service. Reformat vulnerability report, add url field · untitaker/rfcs@e6c7ee7 · GitHub

Nit: Rust's packaging tooling doesn't encourage one or the other.

1 Like

Would it be more appropriate to say that Rust’s community prefers one style over the other (relative to Python)?

EDIT: Applied in https://github.com/untitaker/rfcs/commit/45a7c2a62652b52c8843b14114ed092bde0766c5

I think a binary allow_vulnerable = true is very dangerous. Cargo.toml cannot be modified afterwards and if a new vulnerability is discovered, the dependent crate isn’t necessarily immune against it.

Therefore, I propose to extend this scheme a little bit:

  1. cargo advisory should not be boolean like cargo yank. I should be possible to have multiple advisories per crate.
  2. Every advisory should have a unique identifier (can be internal to cargo or an external one like CVE or DWF).
  3. Dependencies can now be specified as:
[dependencies]
iron = { version = "0.4", allow_vulnerabilities = ["CVE-YYYY-XXXX"] }

EDIT: An automatically generated internal identifier is probably better to prevent collision.

4 Likes

Makes sense, though I’d much rather just use DWF instead of an arbitrarily generated identifier. This would require assigning DWF IDs automatically if none is specified (see open questions), and it would also require that one “vulnerability report” would point to exactly one DWF ID (i.e. dwf would be a string instead of an array of strings)

It probably makes sense to cc @alexcrichton since a solution with automatic DWF assignments requires the Rust project to continuously apply for new DWF blocks, and whether we choose this solution would depend on whether the tooling team would be even willing to take on that burden.

I also remembered that in https://github.com/rust-lang/cargo/issues/2608 @alexcrichton described in great detail how Cargo should asynchronously hit the network to check for vulnerable crates while compiling. I’m torn on this idea as it makes it less intuitive which steps of building a crate require internet connection and which do not. At the same time I didn’t really think about when those reports should be fetched. Does anybody have an idea?

Every advisory should have a unique identifier

I think every advisory should have at least one unique identifier. However identifier systems come and go: for a long time the ruby-advisory-db relied on the now-defunct OSVDB.

I think it would make sense to collect a dict/hashmap of identifiers, as vulnerabilities may be reported to multiple advisory systems and get unique identifiers from each one.

Regarding a single canonical identifier for each advisory, I'd prefer DWF if possible, especially if Mozilla is willing to obtain DWF identifier blocks for Rust vulns as @untitaker is requesting.

1 Like

Shouldn't this be any version of the direct dependency? If the crate headache's latest version is 1.2.3, versions >= 1.0.0, < 1.2.3 are vulnerable and the dependency specifies headache ^1.2.1 then even while 1.2.3 is not vulnerable there is a possibility that the vulnerable version of headache (>= 1.2.1, < 1.2.3) mixes into the final executable.

I still think the alternative of extending yank with some metadata is the best solution. Especially if

is basically the only reason. Nobody should be (newly!) depending on a crate with known sec vulnerabilities, if there is a version without them. At some point someone will forget about the crate's vulnerability and end up using that feature.

At the same time it doesn't make sense to depend on packages that don't compile, and currently yanking is primarily used to mark such packages.

Yanking is also used for removing versions that accidentally introduced breaking changes in minor version changes.

Indeed, that is the one requirement needed for yank to be helpful.

Lastly, the data exposed via the Crates.io would be a lot less structured. Automatic security notifications via third-party tooling would be impossible because there is no way to determine whether a package was yanked because of a security vulnerability or not.

As you said, that's solved by

But there's no reason to

no matter the reason (security, deprecated, broken), noone should be newly depending on that crate.

I believe having a second "yank" that changes behaviour does not yield any advantage over simply forbidding usage of the crate, but increases complexity for users, crate storage (crates.io) and tooling (cargo).

1 Like

In the "Alternatives" section I explictly disagreed with what you just stated. If I am currently using the vulnerable version 0.9 of a library whose latest version 1.0 is not vulnerable and contains a lot of API breakage, a upgrade to the patched version would be rather costly, and as said, it may be the case that I use the crate's API in a way that poses no security threat. This is something the developer has to decide, and Cargo should not be presumptious about this.

Security advisories may affect a much larger range of versions than it is usually the case when yanking a package because of accidental breakage. My original thread explicitly assumes that most OSS maintainers don't want to invest the time to backport security patches to older version ranges, and under this assumption there is no backwards compatible 0.9.x version I could upgrade to in my example. Using warnings instead of hard failures (for build), and making this behavior overridable in Cargo.toml takes this into account. It is a tradeoff.

How can you "forget" about vulnerabilities if Cargo, as proposed, refuses to pass tests and warns during compilation? If you mean "ignore", no security advisory system in the world can protect from completely irresponsible crate authors. I would call a person irresponsible if they even ignore the warnings that are brought to them, as opposed to them having to subscribe to separate channels to keep on top of those.

Funny, a few lines up you described another "only reason" why yank shouldn't be used.

If those assignment systems really come and go as you seem to imply, it would probably make more sense to use our own identifiers (!= DWF) and allow DWFs, CVEs, etc, but not rely on them.

Good point. I will change this. EDIT: Apply suggestion from @lifthrasiir · untitaker/rfcs@68bb0b1 · GitHub

That's the only fundamental reason. Extending yank is a good idea irrelevant of this issue. There's even a cargo issue talking about adding reasons to yank.

Then you are not a new user to the crate, and thus can keep depending on it. Yanking only forbids new dependencies.

I meant, if you add allow_vulnerable = true to your dependency, because you checked every single use of that crate, and it doesn't affect you. Then some months later, you do some changes to the source, and suddenly it affects you.

The entire "new user" behavior makes no sense for security vulnerabilities either. If I want to use a crate that has no security patch yet (of course only the parts that are not vulnerable), what am I going to do? Note that I assume you don't want to honor allow_vulnerable = true for new dependencies.

I can do the same thing with existing dependencies. I see this more of an argument for removing allow_vulnerable and always deny usage of crates that are vulnerable in any way, but that is obviously not feasible.

Thanks for the writeup @untitaker! I'm sorry I haven't been able to keep up with much of the conversation up to this point, but I figure it's about time I write down some thoughts :slight_smile:

In general I agree and like the motivation section. I think there's a real use case for publishing and notifying about security vulnerabilities on crates.io, and doing this through Cargo seems quite natural and what one might expect. Additionally I think it's reasonable to avoid yank as the alternatives section outlines as well.

It may be worth putting some thought into the motivation section, though, in terms of ergonomics of a separate cargo advisory command. For example is the motivation to weed out vulnerable crates in the ecosystem ASAP, or is it to primarily just notify developers? To me there's a spectrum of how "hard" Cargo can be about using a vulnerable crate, and we can certainly tweak the slider to see where we come up on that spectrum.

I personally would advocate for something which retains the current ergonomics of Cargo. That is, a cargo advisory feature would:

  • Not overly pollute build logs
  • Never break an existing build
  • Be lightweight and easy to use without lots of process

(etc). Some of my comments below I believe will touch on this as well!

I'd recommend just requiring --vers. I think it's easier to understand and I'd prefer to not have subtle differences between cargo yank and cargo advisory. I think it'd be fine to just run the command once per version as well, in theory this is something that's called very rarely.

Cargo currently doesn't do this, and it's a tricky thing to do across platforms, so perhaps an alternative strategy could be taken? I'd recommend Cargo having the ability to generate a template (like you pasted) and then that template is passed as an argument to cargo advisory. Cargo can still do validation and print out summary information, but it avoids all the headaches of interactive CLI tools by sidestepping the problem!

My recommendation would be that if these are optional keys they don't need to be included at all, e.g. omission is allowed.

Note that I very much like that the only required field here is a description, it feels like the right balance for "I can get something out" vs "I did all the rigamarole and I want to message everything"

Another thought of mine, can you add two advisory reports to a version of a crate? I'd argue that this should be allowed. For example a low-risk advisory could be issued but then later a more serious advisory could be added which should be messaged out again.

To me this I think will be the most contentious part of the RFC. I personally would find this behavior a non-starter, for example, as it would drastically change the ergonomics of Cargo in my opinion. In the few times I've used npm I'm constantly getting deprecation warnings about packages I've never heard of. Not only do I not know how to fix the issue but it's also not always my problem (some upstream maintainer may need to update to a new major version).

So put another way, I feel like this is making Cargo far too harsh in terms of messaging this out. I think that messaging out is crucial, but we need to find a right balance here. If Cargo is constantly telling you that you're using vulnerable packages, then we run the risk of developers conditioning themselves to just ignore all the reports, defeating the purpose of the feature in the first place!

I unfortunately don't have a great concrete recommendation to give here, though. We could explore possibilities like "warn once a day" or "once a week" or "only on a new dependency being formed". It may help to explicitly outline use cases of where an advisory would be read and fixed, and then given those scenarios we could design a messaging system that is effective but also unobtrusive for other scenarios.

I would personally also find this as a bit too harsh and go against my personal goal of "Cargo never breaks your build". I feel that with the right messaging strategy we won't need a restriction like this.

I feel like this is on the right track, and I like the idea that cargo publish has stricter requirements than other commands (as it's so much more rarely called). I feel like it may be a bit too harsh, though, to reject a publish if any vulnerable version matches. As you mentioned there are legitimate use cases for relying on vulnerable crates, so I'd perhaps recommend the logic of if a fresh resolution has any direct vulnerable dependencies the publish is aborted. This abort-by-default behavior, though, could be papered over with a flag.

I also feel that allow-vulnerablein Cargo.toml is perhaps a little heavyweight for this use case, but maybe a flag doesn't always suffice?


Sorry if some of that is dragging up discussions that have already happened, but feel free to just point me to existing threads if that's the case!

I have a few thoughts here as well.

At a high level, the concerns in this RFC draft are pretty similar to other concerns I've had with yanked packages for other use-cases. Regardless of the disposition of this particular proposal, I'd like to beef up yanking to support more scenarios ergonomically.

Some examples:

  • "Yanked because the code won't work in a future version of Rust"
  • "Yanked because there is a different package that in the author's opinion supersedes this package"
  • "Yanked because this version doesn't work on a particular platform"

These examples share some characteristics with the issues that the security vulnerability scenario is trying to address:

This is also true about packages yanked because they will no longer work in the future (which hasn't arrived yet) or doesn't work on a particular platform (maybe you're not using that platform) or because it's superseded (maybe you have a good reason to use the old package anyway).

I think that's a too-limited view of yank, but it does reflect the current set of functionality. I personally would like to expand yank to have more capabilities (I've talked with @alexcrichton about "yank with reason" for ages, and that was one of the motivations).

I think this is a more general problem with yank, and the reason for wanting to beef it up. Security is a particular kind of reason, that might justify special handling, but yanking because the package won't work in the future also argues for better advice from Cargo.

Which gets to my concrete proposal re: "yank with reason".

I'd like to beef up the reason to include an enum of reasons that Cargo could use to supply useful guidance based on what has happened. I would still want to allow unstructured reasons, but I would also want to try to have the structured enum include the most common reasons, as well as reasons that justify special guidance or handling.

Given the number of reasons for wanting to discourage people from using a package, I'd like to avoid an explosion of different top-level commands that represent each scenario. Instead, I'd like to extend the conceptual meaning of "yank" to include many more scenarios.

I personally think it's fine to have a "yanked" version that is still available (in fact, all yanked crates work that way to some extent), but I would also be open to another general name like cargo discourage :wink:

After thinking about your arguments for security, I also think the yanking semantics might want to be a little bit tweaked, even in terms of today's yank.

  • It's already true that if I ask for 1.6 and 1.7 is yanked but 1.8 has been released, you get 1.8.
  • It's also already true that if your lockfile asks for 1.7, you get 1.7.
  • On the other hand, if you ask for 1.6 and 1.7 was yanked, you get 1.6.
  • I believe the only problematic case is if 1.6 was released, 1.7 was yanked, and that is the latest version and you ask for 1.7. I think it makes sense, in that scenario, to allow you to use 1.7, perhaps with some additional annotation (similar to allow_vulnerable).

For what it's worth I don't necessarily oppose having a suite of security commands that are effectively sugar for "yank with structured reason", but since there are other kinds of yanks that share characteristics with security vulnerabilities, I'd like to try to beef up the primitive first (or at least, at the same time).

I wanted to add one last thing, which is that I think we already need a way to summarize messages about packages at the end of a Cargo run, rather than inline together with the installation.

This would allow us to do things like:

Downloading foo vx.y.z
Downloading bar vx.y.z
Downloading baz vx.y.z
   Finished release [optimized] target(s) in 243.7 secs

Crates using deprecated features: foo, bar
Crates with security vulnerabilities: bar vx.y.z (see ... for details)

This would allow us to print out some useful information about non-fatal problems with the build without turning it into a massive scroll of things you have no control over. Of course, --verbose would provide the complete details.

Notification is the means, and weeding out vulnerable crates is the goal/motivation. I've updated the section here: Update motivation · untitaker/rfcs@5a06156 · GitHub

The idea behind that behavior is: Better mark too many crates as vulnerable than too few. I imagine that enumerating every single concrete version that has been vulnerable to be either tedious or error-prone, or both (i.e. I could imaging to forget a specific bugfix release in my enumeration because I forgot that it existed).

The workflow I imagined is:

  • Run cargo advisory without any args.
  • Edit out the versions that are not affected in the advisory file and submit
  • Release new versions that are not vulnerable.

What about a hybrid? Try to use $EDITOR if it exists (don't fiddle around much with fallbacks such as trying a set of common editor names like it is sometimes done), dump a file if it doesn't.

I don't know how the situation looks under Windows, but under POSIX it appears fairly easy once you've found an appropriate editor command. Terminal settings e.g. are supposed to be restored by the editor.

Here we go: Allow optional keys to be absent · untitaker/rfcs@fa78731 · GitHub

This has been proposed at [pre-RFC] Security advisories as part of crates.io metadata - #6 by troplin

I feel like those situations are not comparable as deprecations are much more common (by orders of magnitude!) than security advisories. I simply don't expect advisories to be so common such that developers would be conditioned to ignore them.

I would agree that the "regardless of whether this package is already compiled, downloaded, ..." part is too disruptive though.

@bascule suggested to create an option that makes warnings errors instead. Is that more acceptable? Note my follow-up questions in that thread for which I haven't found a good answer:


I mean "any vulnerable version that matches the version requirements", see also [pre-RFC] Security advisories as part of crates.io metadata - #10 by lifthrasiir

I don't know what you mean by this. Are you saying that allow-vulnerable should affect behavior of publish? This is already my intention.