[pre-RFC] Security advisories as part of crates.io metadata

Would it be more appropriate to say that Rust’s community prefers one style over the other (relative to Python)?

EDIT: Applied in https://github.com/untitaker/rfcs/commit/45a7c2a62652b52c8843b14114ed092bde0766c5

I think a binary allow_vulnerable = true is very dangerous. Cargo.toml cannot be modified afterwards and if a new vulnerability is discovered, the dependent crate isn’t necessarily immune against it.

Therefore, I propose to extend this scheme a little bit:

  1. cargo advisory should not be boolean like cargo yank. I should be possible to have multiple advisories per crate.
  2. Every advisory should have a unique identifier (can be internal to cargo or an external one like CVE or DWF).
  3. Dependencies can now be specified as:
[dependencies]
iron = { version = "0.4", allow_vulnerabilities = ["CVE-YYYY-XXXX"] }

EDIT: An automatically generated internal identifier is probably better to prevent collision.

4 Likes

Makes sense, though I’d much rather just use DWF instead of an arbitrarily generated identifier. This would require assigning DWF IDs automatically if none is specified (see open questions), and it would also require that one “vulnerability report” would point to exactly one DWF ID (i.e. dwf would be a string instead of an array of strings)

It probably makes sense to cc @alexcrichton since a solution with automatic DWF assignments requires the Rust project to continuously apply for new DWF blocks, and whether we choose this solution would depend on whether the tooling team would be even willing to take on that burden.

I also remembered that in https://github.com/rust-lang/cargo/issues/2608 @alexcrichton described in great detail how Cargo should asynchronously hit the network to check for vulnerable crates while compiling. I’m torn on this idea as it makes it less intuitive which steps of building a crate require internet connection and which do not. At the same time I didn’t really think about when those reports should be fetched. Does anybody have an idea?

Every advisory should have a unique identifier

I think every advisory should have at least one unique identifier. However identifier systems come and go: for a long time the ruby-advisory-db relied on the now-defunct OSVDB.

I think it would make sense to collect a dict/hashmap of identifiers, as vulnerabilities may be reported to multiple advisory systems and get unique identifiers from each one.

Regarding a single canonical identifier for each advisory, I'd prefer DWF if possible, especially if Mozilla is willing to obtain DWF identifier blocks for Rust vulns as @untitaker is requesting.

1 Like

Shouldn't this be any version of the direct dependency? If the crate headache's latest version is 1.2.3, versions >= 1.0.0, < 1.2.3 are vulnerable and the dependency specifies headache ^1.2.1 then even while 1.2.3 is not vulnerable there is a possibility that the vulnerable version of headache (>= 1.2.1, < 1.2.3) mixes into the final executable.

I still think the alternative of extending yank with some metadata is the best solution. Especially if

is basically the only reason. Nobody should be (newly!) depending on a crate with known sec vulnerabilities, if there is a version without them. At some point someone will forget about the crate's vulnerability and end up using that feature.

At the same time it doesn't make sense to depend on packages that don't compile, and currently yanking is primarily used to mark such packages.

Yanking is also used for removing versions that accidentally introduced breaking changes in minor version changes.

Indeed, that is the one requirement needed for yank to be helpful.

Lastly, the data exposed via the Crates.io would be a lot less structured. Automatic security notifications via third-party tooling would be impossible because there is no way to determine whether a package was yanked because of a security vulnerability or not.

As you said, that's solved by

But there's no reason to

no matter the reason (security, deprecated, broken), noone should be newly depending on that crate.

I believe having a second "yank" that changes behaviour does not yield any advantage over simply forbidding usage of the crate, but increases complexity for users, crate storage (crates.io) and tooling (cargo).

1 Like

In the "Alternatives" section I explictly disagreed with what you just stated. If I am currently using the vulnerable version 0.9 of a library whose latest version 1.0 is not vulnerable and contains a lot of API breakage, a upgrade to the patched version would be rather costly, and as said, it may be the case that I use the crate's API in a way that poses no security threat. This is something the developer has to decide, and Cargo should not be presumptious about this.

Security advisories may affect a much larger range of versions than it is usually the case when yanking a package because of accidental breakage. My original thread explicitly assumes that most OSS maintainers don't want to invest the time to backport security patches to older version ranges, and under this assumption there is no backwards compatible 0.9.x version I could upgrade to in my example. Using warnings instead of hard failures (for build), and making this behavior overridable in Cargo.toml takes this into account. It is a tradeoff.

How can you "forget" about vulnerabilities if Cargo, as proposed, refuses to pass tests and warns during compilation? If you mean "ignore", no security advisory system in the world can protect from completely irresponsible crate authors. I would call a person irresponsible if they even ignore the warnings that are brought to them, as opposed to them having to subscribe to separate channels to keep on top of those.

Funny, a few lines up you described another "only reason" why yank shouldn't be used.

If those assignment systems really come and go as you seem to imply, it would probably make more sense to use our own identifiers (!= DWF) and allow DWFs, CVEs, etc, but not rely on them.

Good point. I will change this. EDIT: Apply suggestion from @lifthrasiir · untitaker/rfcs@68bb0b1 · GitHub

That's the only fundamental reason. Extending yank is a good idea irrelevant of this issue. There's even a cargo issue talking about adding reasons to yank.

Then you are not a new user to the crate, and thus can keep depending on it. Yanking only forbids new dependencies.

I meant, if you add allow_vulnerable = true to your dependency, because you checked every single use of that crate, and it doesn't affect you. Then some months later, you do some changes to the source, and suddenly it affects you.

The entire "new user" behavior makes no sense for security vulnerabilities either. If I want to use a crate that has no security patch yet (of course only the parts that are not vulnerable), what am I going to do? Note that I assume you don't want to honor allow_vulnerable = true for new dependencies.

I can do the same thing with existing dependencies. I see this more of an argument for removing allow_vulnerable and always deny usage of crates that are vulnerable in any way, but that is obviously not feasible.

Thanks for the writeup @untitaker! I'm sorry I haven't been able to keep up with much of the conversation up to this point, but I figure it's about time I write down some thoughts :slight_smile:

In general I agree and like the motivation section. I think there's a real use case for publishing and notifying about security vulnerabilities on crates.io, and doing this through Cargo seems quite natural and what one might expect. Additionally I think it's reasonable to avoid yank as the alternatives section outlines as well.

It may be worth putting some thought into the motivation section, though, in terms of ergonomics of a separate cargo advisory command. For example is the motivation to weed out vulnerable crates in the ecosystem ASAP, or is it to primarily just notify developers? To me there's a spectrum of how "hard" Cargo can be about using a vulnerable crate, and we can certainly tweak the slider to see where we come up on that spectrum.

I personally would advocate for something which retains the current ergonomics of Cargo. That is, a cargo advisory feature would:

  • Not overly pollute build logs
  • Never break an existing build
  • Be lightweight and easy to use without lots of process

(etc). Some of my comments below I believe will touch on this as well!

I'd recommend just requiring --vers. I think it's easier to understand and I'd prefer to not have subtle differences between cargo yank and cargo advisory. I think it'd be fine to just run the command once per version as well, in theory this is something that's called very rarely.

Cargo currently doesn't do this, and it's a tricky thing to do across platforms, so perhaps an alternative strategy could be taken? I'd recommend Cargo having the ability to generate a template (like you pasted) and then that template is passed as an argument to cargo advisory. Cargo can still do validation and print out summary information, but it avoids all the headaches of interactive CLI tools by sidestepping the problem!

My recommendation would be that if these are optional keys they don't need to be included at all, e.g. omission is allowed.

Note that I very much like that the only required field here is a description, it feels like the right balance for "I can get something out" vs "I did all the rigamarole and I want to message everything"

Another thought of mine, can you add two advisory reports to a version of a crate? I'd argue that this should be allowed. For example a low-risk advisory could be issued but then later a more serious advisory could be added which should be messaged out again.

To me this I think will be the most contentious part of the RFC. I personally would find this behavior a non-starter, for example, as it would drastically change the ergonomics of Cargo in my opinion. In the few times I've used npm I'm constantly getting deprecation warnings about packages I've never heard of. Not only do I not know how to fix the issue but it's also not always my problem (some upstream maintainer may need to update to a new major version).

So put another way, I feel like this is making Cargo far too harsh in terms of messaging this out. I think that messaging out is crucial, but we need to find a right balance here. If Cargo is constantly telling you that you're using vulnerable packages, then we run the risk of developers conditioning themselves to just ignore all the reports, defeating the purpose of the feature in the first place!

I unfortunately don't have a great concrete recommendation to give here, though. We could explore possibilities like "warn once a day" or "once a week" or "only on a new dependency being formed". It may help to explicitly outline use cases of where an advisory would be read and fixed, and then given those scenarios we could design a messaging system that is effective but also unobtrusive for other scenarios.

I would personally also find this as a bit too harsh and go against my personal goal of "Cargo never breaks your build". I feel that with the right messaging strategy we won't need a restriction like this.

I feel like this is on the right track, and I like the idea that cargo publish has stricter requirements than other commands (as it's so much more rarely called). I feel like it may be a bit too harsh, though, to reject a publish if any vulnerable version matches. As you mentioned there are legitimate use cases for relying on vulnerable crates, so I'd perhaps recommend the logic of if a fresh resolution has any direct vulnerable dependencies the publish is aborted. This abort-by-default behavior, though, could be papered over with a flag.

I also feel that allow-vulnerablein Cargo.toml is perhaps a little heavyweight for this use case, but maybe a flag doesn't always suffice?


Sorry if some of that is dragging up discussions that have already happened, but feel free to just point me to existing threads if that's the case!

I have a few thoughts here as well.

At a high level, the concerns in this RFC draft are pretty similar to other concerns I've had with yanked packages for other use-cases. Regardless of the disposition of this particular proposal, I'd like to beef up yanking to support more scenarios ergonomically.

Some examples:

  • "Yanked because the code won't work in a future version of Rust"
  • "Yanked because there is a different package that in the author's opinion supersedes this package"
  • "Yanked because this version doesn't work on a particular platform"

These examples share some characteristics with the issues that the security vulnerability scenario is trying to address:

This is also true about packages yanked because they will no longer work in the future (which hasn't arrived yet) or doesn't work on a particular platform (maybe you're not using that platform) or because it's superseded (maybe you have a good reason to use the old package anyway).

I think that's a too-limited view of yank, but it does reflect the current set of functionality. I personally would like to expand yank to have more capabilities (I've talked with @alexcrichton about "yank with reason" for ages, and that was one of the motivations).

I think this is a more general problem with yank, and the reason for wanting to beef it up. Security is a particular kind of reason, that might justify special handling, but yanking because the package won't work in the future also argues for better advice from Cargo.

Which gets to my concrete proposal re: "yank with reason".

I'd like to beef up the reason to include an enum of reasons that Cargo could use to supply useful guidance based on what has happened. I would still want to allow unstructured reasons, but I would also want to try to have the structured enum include the most common reasons, as well as reasons that justify special guidance or handling.

Given the number of reasons for wanting to discourage people from using a package, I'd like to avoid an explosion of different top-level commands that represent each scenario. Instead, I'd like to extend the conceptual meaning of "yank" to include many more scenarios.

I personally think it's fine to have a "yanked" version that is still available (in fact, all yanked crates work that way to some extent), but I would also be open to another general name like cargo discourage :wink:

After thinking about your arguments for security, I also think the yanking semantics might want to be a little bit tweaked, even in terms of today's yank.

  • It's already true that if I ask for 1.6 and 1.7 is yanked but 1.8 has been released, you get 1.8.
  • It's also already true that if your lockfile asks for 1.7, you get 1.7.
  • On the other hand, if you ask for 1.6 and 1.7 was yanked, you get 1.6.
  • I believe the only problematic case is if 1.6 was released, 1.7 was yanked, and that is the latest version and you ask for 1.7. I think it makes sense, in that scenario, to allow you to use 1.7, perhaps with some additional annotation (similar to allow_vulnerable).

For what it's worth I don't necessarily oppose having a suite of security commands that are effectively sugar for "yank with structured reason", but since there are other kinds of yanks that share characteristics with security vulnerabilities, I'd like to try to beef up the primitive first (or at least, at the same time).

I wanted to add one last thing, which is that I think we already need a way to summarize messages about packages at the end of a Cargo run, rather than inline together with the installation.

This would allow us to do things like:

Downloading foo vx.y.z
Downloading bar vx.y.z
Downloading baz vx.y.z
   Finished release [optimized] target(s) in 243.7 secs

Crates using deprecated features: foo, bar
Crates with security vulnerabilities: bar vx.y.z (see ... for details)

This would allow us to print out some useful information about non-fatal problems with the build without turning it into a massive scroll of things you have no control over. Of course, --verbose would provide the complete details.

Notification is the means, and weeding out vulnerable crates is the goal/motivation. I've updated the section here: Update motivation · untitaker/rfcs@5a06156 · GitHub

The idea behind that behavior is: Better mark too many crates as vulnerable than too few. I imagine that enumerating every single concrete version that has been vulnerable to be either tedious or error-prone, or both (i.e. I could imaging to forget a specific bugfix release in my enumeration because I forgot that it existed).

The workflow I imagined is:

  • Run cargo advisory without any args.
  • Edit out the versions that are not affected in the advisory file and submit
  • Release new versions that are not vulnerable.

What about a hybrid? Try to use $EDITOR if it exists (don't fiddle around much with fallbacks such as trying a set of common editor names like it is sometimes done), dump a file if it doesn't.

I don't know how the situation looks under Windows, but under POSIX it appears fairly easy once you've found an appropriate editor command. Terminal settings e.g. are supposed to be restored by the editor.

Here we go: Allow optional keys to be absent · untitaker/rfcs@fa78731 · GitHub

This has been proposed at [pre-RFC] Security advisories as part of crates.io metadata - #6 by troplin

I feel like those situations are not comparable as deprecations are much more common (by orders of magnitude!) than security advisories. I simply don't expect advisories to be so common such that developers would be conditioned to ignore them.

I would agree that the "regardless of whether this package is already compiled, downloaded, ..." part is too disruptive though.

@bascule suggested to create an option that makes warnings errors instead. Is that more acceptable? Note my follow-up questions in that thread for which I haven't found a good answer:


I mean "any vulnerable version that matches the version requirements", see also [pre-RFC] Security advisories as part of crates.io metadata - #10 by lifthrasiir

I don't know what you mean by this. Are you saying that allow-vulnerable should affect behavior of publish? This is already my intention.

If yank would want to satisfy all those usecases, it IMO would have to change such that, by default, it only ever prints warnings about yanked crates, and never refuses to compile. In particular “refuse to compile added dependencies if those are yanked” is problematic for security vulnerabilities, but an absolute no-starter for some of the other yank reasons you listed.

I’d say such semantics are toothless. Even with the new output format you propose I feel like all the user sees at a glance is a giant wall of yellow at the end of the output, because if yank is extended for deprecating packages for all kinds of reasons, there sure is going to be a lot of that.

You could then add options to make certain classes of yanks hard errors and suppress others, but I feel like those options are more likely to be abused than allow_vulnerable.

True! This is executed pretty rarely, though, and I imagine we could solve the issue by making it easy to execute the command multiple times?

The lack of support on Windows (this will basically never work) makes me think that we shouldn't try to optimize this use case. It means many users will not have an editor opened automatically, so we shouldn't optimize for that.

True yeah, but I think the risk is still there. If a user once or twice starts seeing vulnerabilities they have no idea how to fix, then one might quickly enter the mode of "just always ignore" the messages.

I would prefer the reverse, an option to turn warnings to errors. I personally feel that breaking existing builds with a Cargo.lock, for whatever reason, is a non-starter.

Oh I just mean cargo publish --allow-vulnerable or something like that rather than modifying Cargo.toml.

There is another drawback: Some crate developers may not be aware of this mechanism and/or may be aware of it and choose not to use it for various reasons. Further, the cost is too expensive to keep track of vulnerabilities in every version of a crate ever published, to minimize costs it makes sense to mark everything but the newest version vulnerable, or abstain from using the mechanism at all.

Thus, while the existence of the mechanism would likely induce people to depend on it, but it isn't something people can depend on and it would create a false sense of security.

What would that solve? You'd still have the risk of forgetting a bugfix release in your enumeration.

Fair enough, I'll change that. EDIT: Don't invoke EDITOR, let the user do it. · untitaker/rfcs@b0691a1 · GitHub

Which is why I wanted to use hard errors for security vulnerabilities during testing, such that they're harder to ignore and people rather ask questions than ignore.

That's what I said.

see here for why having a bool switch for turning off all vulnerability warnings is highly problematic. My version to disable them on a per-dependency basis is already argued to be too broad, yours is broader.

For "lazier" developers this is exactly the behavior I want to nudge them into rather than not using the feature at all, by defaulting to selecting all versions by default. The idea is that you'd release a new patched version after marking all existing versions as vulnerable.

Regarding developers that simply won't use this due to other reasons, I can't think of an alternative design that solves this problem, but it's surely worth mentioning as drawback. FIXME