Suggestion: cargo yank is a misfeature and should be deprecated and eventually removed

A big difference is, Cloudflare doesn't ship software to customers or partners, they write the software and run it themselves, so deploying a new release is much easier -- you are in control of the only deployment you care about.

For many kinds of software, it's the opposite. You may be selling software to customers. You may be delivering software to partners to run, who expect support when there are bugs, and expect to update on their own schedule. You may simply not have the power in the relationship to tell them they have to update immediately -- they have a large interconnected system as well and they have their own priorities, and you need them more than they need you.

For financial software, you often end up with long-term stable release branches which receive bug fixes and so on. If you have a working system, how can you justify doing a lot of work to update the compiler, in a release branch, when a patch fix for an issue is all that is needed? You risk invalidating a lot of testing that was done to stabilize the release. The same is true in safety critical software, like automotive or aviation.

I'm not saying one approach is right and one is wrong -- far from it, it is great that Cloudflare can update the compiler the day it was released across a large codebase. It's more like, business and project realities force many developers to do something quite different from that. All I'm asking is, it would be nice if we can all play nicely together in the same sandbox of crates.

And I'm trying to explain that this:

This is why I find it more important to yank crates incompatible with the only supported Rust version, rather than keep old unsupported code that only works with old unsupported Rust versions.

is not playing nicely in the same sandbox as people who can't update the compiler as quickly Cloudflare apparently does.

10 Likes

Actually I would say that this is the largest source of tension:

In both your post, and Brian Smith's post (https://github.com/briansmith/ring/issues/774#issuecomment-457859163), maintainers want to use yanking to indicate "a particular release is no longer supported".

There is no way other than yanking them to indicate that they're not supported.

However, I believe that this is emphatically NOT the purpose of yanking. Yanking indicates that a crate has bugs that are so severe that cargo should actually change it's version selection algorithm to avoid giving you those crates, possibly breaking your build.

c.f. Why were all versions prior to 0.14 of this crate yanked? · Issue #774 · briansmith/ring · GitHub

There is plenty of working software that is not receiving further updates. Saying "I am no longer making patch releases to this major version" or whatever is fine. It simply doesn't justify yanking it -- that's very disruptive to people downstream of you.

I wish that there were some way to encourage crate maintainers not to yank crates frivolously, or promote a wider understanding of the costs of this to the ecosystem at large. There are lots of high-value industries that would love to benefit from the safety properties of Rust, but if the community isn't aligned on supporting such use-cases then that will push such users away to some extent.

5 Likes

And if you're doing this, then you'd better be checking in the lock file, or else you're getting potentially-new versions of all the dependencies, which is probably much higher risk than updating the compiler.

And if the lock file is checked in, you can keep using the yanked versions just fine.

7 Likes

And if the lock file is checked in, you can keep using the yanked versions just fine.

See I wish that were the case, but it's actually not. Cargo will still let you build using the versions in the lock file, if no changes to the lock file are needed. But if you actually need to make a patch to fix something, you may have to change a cargo toml somewhere. Then your lockfile needs to be adjusted, and once that happens, your protection from yanking goes out the window.

2 Likes

I think you'd have better luck getting that adjusted than getting yanking banned.

What specifically about cargo --locked isn't doing what you wanted in this scenario? What change are you making?

2 Likes

That is not accurate. Existing locked-yanked crates are kept if you change something else. You can use cargo add, cargo update --precise and other commands without losing locked dependencies.

6 Likes

That is not accurate. Existing locked-yanked crates are kept if you change something else.

Well that's good news, I don't remember this being the case. So this may have gotten in the last few years since I tried it.

But let's do a thought experiment. Suppose that there's v1.0.0, v1.0.1, v1.0.2, and v2.0.0. Suppose my repo is on v1.0.0, then it turns out there's a security problem that's fixed in v1.0.2. So what I need to do is upgrade from v1.0.0 to the fixed version v1.0.2. But if the maintainer has yanked the entire v1.0.x series, just because v2 was released and they want to indicate that v1 is unsupported, then now I'm upgrading from one yanked version to another yanked version, which is trickier.

It's also very complicated when you have dependencies that depend on yanked crates. The cargo update command that is needed takes a long time to create, and it can be a lot of trial and error.

My experience has been that, usually, if the maintainer seems to yank crates often, we just delete them as a dependency because it's too difficult to work around all this. A lot of people similarly deleted the dependency on ring and rustls a few years ago when the maintainer was more aggressive about yanking.

It would be much simpler if there were something like cargo build --ignore-yank, that just completely ignores whether crates have been yanked during version decision making. I would even put that in .cargo/config if I could. It's simpler to learn about security issues via cargo audit and learn about yanked crates using cargo deny, then I can try to figure out why it was yanked and if it actually matters for me.

4 Likes

You need to consider time frames here.

If the 1.x series has been yanked entirely, it's most likely that it's been really really old, and so obsolete the maintainer doesn't even want to hear about it again. In that case your software ran with unpatched 1.0.0 for months or years, and could have ended up like Equifax. Attackers read changelogs and will use patched-in-the-future bugs against your software that holds on to old unmaintained versions.

Or if it was yanked quickly, then you've picked an experimental unstable crate for a Very Serious Long-Term Financial Software, without signing LTS maintenance contract with the author. Ring used to be disruptive like that, but has settled since. You can use releases as far as 2019.


There is not a lot of work to update the compiler. That's the whole point of Rust's release model. Compiler upgrades are mean to be quick and painless. There are processes (like crater) to ensure that is the case. There are some rare situations when a little work is needed, but these are minor things like adding type annotations or running cargo fix.

1 Like

If the 1.x series has been yanked entirely, it's most likely that it's been really really old, and so obsolete the maintainer doesn't even want to hear about it again

See, you might think so, but there are also overzealous maintainers who yank things for no good reason, because they think there's no cost to doing this, or they think everyone is on the latest rust release anyways.

It doesn't really matter if the maintainer doesn't want to hear about v1 ever again. If it's working software, and, say, v1.0.2 fixes a security problem with v1, and adapting to API changes in v2 is out of scope, then there are definitely people who will choose to upgrade to v1.0.2 rather than v2.

In the end, downstream decides when the update happens, not the maintainer.

In that case your software ran with unpatched 1.0.0 for months or years and could have ended up like Equifax. Attackers read changelogs and will use patched-in-the-future bugs against your software that holds on to old unmaintained versions.

And a responsible downstream is well aware of this! If there's a rustsec advisory against the vulnerable versions, then downstream can pick up on that and fix it. If the only thing you do is yank it, some people will pick up on that if they are using cargo deny, but others won't because it's kind of a crapshoot how the yank will actually affect you.

The problem here is that, the maintainer is not in a position to decide for all possible users of their crate who actually needs to update. If the maintainer thinks there might be an issue, but many users aren't actually affected, then they have to swim upstream fighting against defaults in cargo build etc.

Yanking things just because they are old and you don't want to hear about them anymore, whether or not they actually have a vulnerability, creates fatigue among people who are actually trying to follow all the security advisories and yanks and who need to make sure they spend their time fixing the things that matter.

5 Likes

There is not a lot of work to update the compiler. That's the whole point of Rust's release model. Compiler upgrades are mean to be quick and painless. There are processes (like crater) to ensure that is the case. There are some rare situations when a little work is needed, but these are minor things like adding type annotations or running cargo fix.

It doesn't actually matter how much work it actually is in terms of code edits. The point is like, suppose you work on self-driving car software, and whenever a release is stabilized, 100 people are going to drive the release candidate every day for a few weeks. Now suppose a developer comes along and says they want to update the compiler in the release branch, which may come with associated updates to LLVM etc. etc.

The fear is that, after the compiler update, a new bug occurs which causes some piece of code to be miscompiled. Maybe, there was already undefined behavior in some low-level crate somewhere, but LLVM happened to compile it in a benign way, and the next version of LLVM optimizes more aggressively and something bad happens. In the worst case, maybe there's memory corruption, and an unsafe situation could be created, or the software panics while it's driving or something.

You can't repeat all that testing in the stable release branch, it's too expensive and it takes too long. If you only need to patch a bug in a small, self-contained component, there's no reason to change how all the code in the project is compiling -- it's a needless risk. You're going to make the smallest change that you possibly can that will address the issue. It's safer to stay on older crate versions that we know are working, because we spent a lot of time and energy testing them. Even if the maintainer really thinks it's important, and really thinks that it's best, that we get off of v1 and go to v2. Downstream is going to make up their own mind about how important that actually is for them.

9 Likes

This is getting pretty far from the original problem of random person on the internet yanking their crate. If you need to support some hardware for a decade or more, you'd better cargo vendor all your dependencies, and be prepared to maintain them all yourself. Because not only you can't guarantee that the crates won't be yanked during lifetime of a car, you can't even guarantee that crates.io will still exist (hopefully it will).

Also testing for miscompilations in self-driving software manually, on the road, is extremely irresponsible! For all the UB-risking parts you should have automated tests and fuzzing coverage. That's why Rust has unsafe, #[test] and ability to split projects into crates that are easier to test.

3 Likes

Timeframes cut both ways, though.

If I've come into a project because the company has realised that it really matters, but it's been unmaintained for a while, I may be able to cut a release that updates from 1.0.0 to 1.0.2 (and updates other dependencies, too) in a day or so. It could take me a week or more to move to 2.0.0, just fixing up the API breakage that the major version change indicates.

Am I really better off running 1.0.0 for another week, or moving to 1.0.2 (with fewer issues than 1.0.0) today and getting onto 2.0.0 in the same timeframe I would have done if I'd moved to 2.0.0 directly?

Yes, I've got to get onto 2.0.0 as fast as I can in both cases - but assuming semver rules apply, moving from 1.0.0 to 1.0.2 is basically a recompile + test cycle (which should be mostly automated, and which is in any case using the time of people who don't normally do development, but do testing instead), while moving from 1.0.0 or 1.0.2 to 2.0.0 involves developer time.

Yanking takes that option away from me - I'm effectively stuck on 1.0.0 while my development team moves to 2.0.0, even though 1.0.2 had some fixes that would make me less at risk while my development team moved the codebase to 2.0.0.

6 Likes

If the maintainer released a fix for a security problem, they wouldn't yank the fixed crate within a short timeframe, right? They might yank v1.0.0 and v1.0.1, but if there is no v1.0.3 I don't expect a maintainer to yank it for a while. Even if they yank it a couple of months down the road, not updating before this would happen is a security risk and if you are willing to accept running with a vulnerable version for months, why wouldn't you accept running with a vulnerable version a month longer? And if you really need to you can always manually edit Cargo.lock to circumvent the yank.

A good maintainer wouldn't yank the fixed crate within a short timeframe, but there's nothing that guarantees that. And yes, you'd want to move off the crate completely if you discover that the maintainer is bad, but you may not be able to do that quickly.

Similarly, if I've been running with a vulnerable version, I'd prefer to get off it as fast as possible; moving from 1.0.0 to 2.0.0 may well be significant work (rewriting my code to remove uses of the APIs that were removed going from 1.0.2 to 2.0.0), while moving from 1.0.0 to 1.0.2 should be simple.

Further, in a corporate situation, I can distribute the work; a testing team can test the update from 1.0.0 to 1.0.2 while a development team is doing the migration from 1.0.x to 2.0.0. Why would I want to stay vulnerable for an extra three months while the development team gets up to speed and fixes the formerly unmaintained code, when I can move to 1.0.2 in a couple of days of the test team's time?

2 Likes

This is getting pretty far from the original problem of random person on the internet yanking their crate.

Right. So let's circle back to that.

Subjectively, when a maintainer yanks their crate, maybe half the time, there's actually some security problem or defect that might be of interest. But the other half of the time, there's not actually any known defect or problem with the crate. It's just the crate maintainer "trying to indicate to users that they are no longer interested in developing this version of the code" or something like this. Generously, that's an example of something that's "important but not urgent" to downstream -- if the code is working in production, then there may be little reason to do anything even though the maintainer yanked it.

Less generously, this is essentially noise. And it's noise that's being reported over the same channel as actual security problems that are actually important and urgent. (Part of the reason that it's noise is that there simply isn't consensus among the rust community around how yanking is supposed to be used.)

By itself, that would be fine. There are lots of noisy channels in life, and this is just one more of them. But what makes yanking really pernicious is that this noise is being piped directly into my build system, so that when cargo build happens, all of this yanking information is going into the build system in ways that I don't have meaningful visibility or control over.

To me, this just seems bizarre. If I'm working on high-assurance stuff, I want to have visibility and control over everything that I'm incorporating. I want to be in the loop if there's a security vuln against X that's detected when I build against master, because it might also affect an earlier release branch. I don't really want things to be upreved without me opting in -- as responsible downstream, I want to be in the driver seat.

The idea that like, hundreds of random crate maintainers are able yank things at any time, which might somehow affect my build in a way that I can't easily squelch, and like, a build that worked one way yesterday could work a different way tomorrow, just feels profoundly uncivilized and not how I would want this to work. And yes, I get that I can edit lock files manually, and use cargo update --precise, and that it's possible in principle to work around all this, but like, that is all very tedious and feels like it just shouldn't be necessary. The contract I need to have here is that downstream decides when things get updated, and the maintainer's opinion is just one data point.

I don't work at a self-driving car company anymore. All their code was C++, and indeed, they vendored everything to themselves in a giant internal repository, because they can't tolerate the possibillity that upstream would delete the release they were using or something. And they were a large company, so they can deal with all the manual labor that that entails.

Someone approached me once about joining a self-driving plane startup. Suppose I joined a company like that tomorrow. I would like to advocate for Rust, because it catches a lot of bugs that end up in large C++ projects. But I would not want to start on day one by doing cargo vendor to like, mirror a whole bunch of crates.io stuff, because I would want to avoid all the manual labor associated to such flows. We might get to vendoring eventually, but the way I would want to start is something like, put ignore_yank = true in .cargo/config, and set up CI so that cargo audit and cargo deny are run, and see how far we get that way until the company is bigger.

I still think it would be better for the ecosystem if we steer people towards filing RUSTSEC advisories rather than just yanking things with no explanation, and discourage people from yanking unless there is a severe security defect. Alternatively, if there's a snafu before cargo publish, and the crate doesn't build, may be yanking is allowed but there is a time limit after which you can't yank, would reduce a lot of noise. But giving projects a way to ignore yanking information during cargo build, for the users that want that, is also a reasonable way to help downstream simplify and have more control over the build. I would even consider forking cargo to add an ignore_yank = true option if I thought it would save me time overall, it might be a pretty easy patch to develop and carry forward as cargo is updated.

Also testing for miscompilations in self-driving software manually, on the road, is extremely irresponsible! For all the UB-risking parts you should have automated tests and fuzzing coverage.

Automated tests and fuzzing are great, but they are not a subsitute for end-to-end testing on real hardware. There are a lot of reasons for this. Exercising the component with random data may not actually match the real data it sees in production, so you aren't guaranteed to find the bug. If you just fuzz the code on your x86 workstation, you may not actually discover the bug, especially if it's a code-gen issue. But it may be prohibitively difficult or complicated to instrument the production hardware in a way that you can run a fuzzing test on the actual hardware.

If I recall correctly, MISRA standards actually require you to perform a certain amount of hours of real-world use, depending on the impact of the change. So you may be required to have people drive the car for a certain number of hours, if you are seeking MISRA compliance.

7 Likes

I think you have an opportunity to focus your argument here better.

If the versions aren't locked, then the build can work differently tomorrow without anyone yanking something. If the versions are locked, then yanking doesn't matter -- this is one of the things that crates.io intentionally did differently from NPM to fix exactly this complaint.

So things aren't "profoundly uncivilized" this way.

If you need guaranteed long-term reproducible builds, you should run your own local mirror to vendor all the versions of everything you use locally. Because there's no guarantee that crates.io won't, say, be subject to some new law that forces them to take down some package you depend on. And if you're doing that you're free to ignore yanking if you really want.

I would encourage you to instead elaborate on "that I don't have meaningful visibility or control over". What's happening that you don't control? Why are the existing cargo flags insufficient for the control you want?

13 Likes

Many people say that libraries shouldn't be committing their Cargo.lock files. There is certainly an argument for that: committed Cargo.lock means that you likely have no tests for the latest versions of dependencies, and you get fewer versions naturally tested on the devs' machines. But if we follow that supposed best practice, we hit the cargo yank issue.

If we choose to commit Cargo.lock, then we're left with manually updating dependencies' versions, rarely. So the practice of yanking which was supposed to increase security and encourage users to migrate off supposedly broken versions have led us to an exact opposite result. In that case what was the purpose of yanking in the first place?

It's also a pain to update Cargo.lock when some of the dependencies must be pinned. There is no good way to automate it when you have lots of dependencies. Also some crates for whatever reason choose to restrict the vesions of their dependencies (sometimes pinning to specific versions), which means that you are much more likely to get unresolveable conflicts if you try to keep yanked dependencies.

There is also no way to upgrade between yanked versions (e.g. if the maintainer has yanked all releases, or at least the entire semver branch). I don't consider "manually edit Cargo.lock" to be an acceptable answer, we're used to better tools in Rust. The most infuriating thing is that the problem is entirely self-imposed. Cargo can resolve yanked versions. It just won't.

That's not the threat model for most users. If you go that way, you can't rely on crates.io being availabe at all. It could get taken down or blocked in its entirety (e.g. the russian government doesn't give a shit about collateral damage when it wants to block a resource with undesirable information). Hell, in that model we can't assume any part of the Rust project is accessible, or even that Github is accessible (e.g. there was a lot of talk about entirely blocking Github access in Russia when the war started). There is, realistically, nothing one can do against the threats of that magnitude. Certainly nothing if you're not a megacorp.

Yanked crates, on the other hand, are a very real and common problem, with multiple high-profile cases in the recent years.

And the worst part is that yanking serves no purpose. It doesn't cause people to upgrade, in fact it strongly encourages the opposite. It doesn't provide any information about the supposed reasons for yanking, doesn't tell you whether it's a critical vulnerability or a hiss fit by the maintainer. It's relatively trivial to circumvent, but just hard enough to cause big problems for automated workflows (like git bisect) where it has no reason to apply in the first place.

It just gives the maintainers that warm fuzzy feeling "I can remove it all if I change my mind".

7 Likes

What about the following proposition?

  • Maintainer can yank any release as they please, but only for a limited amount of time
  • Past that timeframe, release can be yanked if, and only if a rustsec advisory is filed

Would this help, and am I missing valid reason to yank a crate past the initial time window?

What I gather is that yanking is not the problem - the problem is that the Rust tooling is unsuitable for a subset of Rust users.

Cargo should have a flag or config option to completely ignore yanking for a given crate, something like this on Cargo.toml

[dependencies]
foo = { version = "1.2.3", ignore_yank = true }

If Cargo devs don't feel like this is an use case worth considering, it's probably better to write an external tool that builds the right Cargo.lock to workaround yanking, something like cargo-ignore-yank.

3 Likes

The question then becomes "why is the maintainer yanking this release at all?"

A RustSec advisory causes cargo deny check advisories and cargo audit to complain about the use of a release.

Yanking a release stops Cargo choosing it if it's not in Cargo.lock, but it will continue to happily use it if it's in Cargo.lock.

Personally, I see only two good reasons for yanking, ever:

  1. It is not possible to use this release and comply with applicable laws. E.g. copyright infringement where the verified rightsholder is not willing to offer a licence under any terms - in this case, you want to do everything in your power to prevent people being caught up in lawsuit mess.
  2. It is not possible to use this release soundly at all. E.g. a crate where all the public API depends on something that's buggy under all circumstances. This should be rare, since a maintainer would normally catch this in their own testing - but if UB is involved, I could see there being a situation where incremental builds do what the maintainer intended, but full builds do not.

Beyond those two cases, I see a RustSec advisory as more helpful for two reasons:

  1. It tells existing users to move onto a new release or a different crate ASAP.
  2. It gives details about why the maintainer wants you to move on - is there a known bug in an API you don't use (no need to hurry) or is there a known problem with the release that affects you? Or is the maintainer simply saying that they're pretty confident they made a mistake with that series of releases, and would like you to move to the version in which they've fixed the mistakes?

Now, there's a limitation today, in that you need to use an external tool to look at RustSec advisories, and it'd be nice to have that integrated into Cargo so that (e.g.) Cargo ignores versions with RustSec advisories by default (the same way it ignore yanked crates), unless I've told it that I'm OK with certain advisories. This is tooling, though, and if it were fixed, it'd be possible to use RustSec advisories instead of yanking for all but the "cannot legally use this code" case.

3 Likes