Smart crate deletion

Has anyone thought that yanked versions of a crate that are not downloaded anymore over something like a year could be allowed to be explicitly deleted by the authoring users?

I've published lots of versions of a crate. I don't know if this is bad for the crates.io storage...

1 Like

It would break reproducible builds with Cargo.lock files in the wild that explicitly include the yanked crate.

There might be some good reasons for perma-deleting yanked crates, but a crates.io storage concern is probably quite low on the priority list.

7 Likes

I would guess that many versions of the same crate are similar and thus well compressible together. I doubt Crates.io does this, but it could, e.g., store all versions of a crate (or of all crates) in a Git repository, to apply Git's compression, which is, of course, designed to compress many versions of code over time.

3 Likes

Given that the whole point of yanking is to say "don't use this unsound garbage", breaking builds seem like a good thing?

2 Likes

That's the intent, but it's unfortunately the case that sometimes developers use yanking for bad reasons.

The way Cargo and crates.io handle this is a practical compromise; when going from Cargo.toml to Cargo.lock via dependency resolution (including via cargo update), yanked crates are not considered. However, when a crate is named in Cargo.lock, it is considered.

This avoids the "left-pad problem" where yanking is done for a bad reason. It also means that if I commit my Cargo.lock and rust-toolchain.toml files to version control, and I find my software misbehaving, I can bisect[1] across versions that depend on "unsound garbage" to find the source of the bug.


  1. Point of bisection is to narrow down where the bug was introduced - if I see that it's an update from rustls 0.22.3 to rustls 0.22.4 that introduces the bug, I have a focal point for my debugging session. ↩︎

8 Likes

The bit of storage used by yanked crates is not much and it's completely okay, don't worry about it :wink:

6 Likes

Additionally, just because a crate contains unsoundness, that does not imply that the unsound code paths are used by a dependent. The potential segfault in the time crate is a concrete example that doesn't affect Windows or wasm targets.

2 Likes

Cold[1] storage can be surprisingly cheap. If a yanked (or even just superceded) version gets requested infrequently enough, the registry can decline doing any of the more expensive things like CDN caching for it and let those rare requests go through cold(er) storage paths.

A naively built registry would potentially prefer not having to serve cold crate revisions, but a more clever registry implementation can record them without paying too much for the service.


  1. "Cold" as in not "hot", as in that the storage is infrequently accessed. Probably cold crate versions shouldn't be stored in backup tier storage, since access to that is often expensive to offset the cheap capacity, but it can be at least colder than hot crates' storage. ↩︎

4 Likes

I don't think it should be possible to delete yanked crates, except in exceptional circumstances: legal requirements, actively malicious code, simple name squatters, probably some other case I didn't think of. All of these can be handled by contacting crates.io staff for the specific cases, they should be rare enough.

The biggest reason to me to not allow deleting yanked versions of legitimate crates is that of git bisection. I need to be able to go back to an old version (via the checked in lock file) to be able to test where a bug came from. For this, every revision of my repo must be buildable for all eternity (or well, at least the lifetime of the project, I doubt anyone will care hundreds of years in the future). For this, that means all previous versions of dependencies must also be available and buildable.

5 Likes

Storage costs for crates.io in general are negligible. I wouldn't worry about that.

3 Likes

This is made difficult by the fact that the crates index publishes a checksum of the compressed tarball. If they improve on-disk storage in some clever way, they still have to produce bit-by-bit identical individually-compressed tarball when requested.

These checksums are supposed to be permanent, and would cause an alarm if they changed.

This is a solvable problem assuming cargo always uses the same gzip impelementation, which will be immortalized by crates.io. Android did that to rebuild zip files from patches, to preserve signatures on the whole zip files:

2 Likes

Dropbox used to do something similar for JPEGs uploaded to their service.

1 Like

A thought I had was that crates uploaded to crates.io could be labelled "experimental" or "provisional" with the meaning that they are ephemeral in nature and may later be deleted. This allows for developing a crate ( and using it from other crates ) until it is in a final form where it can be labelled "permanent". Perhaps this could apply to particular versions as well. Of course this would mean that depending on a "provisional" crate (version?) is not in general reliable (so there would be warnings or whatever), but when it is your own crate and you are developing, this is perfectly ok. It would mean that you could publish a "beta" version of a crate, and later withdraw it once the final version is available etc.

1 Like

Whst problem does this solve though? As in, why do you need to delete those old crate versions at all?

Well it might prevent crates.io from becoming cluttered with test/development crates (/versions of crates) that were never intended to be permanent in the first place.

Maybe what is needed is a self-hosted registry to do these kinds of things? Allowing deletion of one type of crate on crates.io sounds like a slippery slope to me as the codepaths just don't exist (for non-admins) at all right now. One hole in a conditional and we're one typo away from leftpad.rs.

1 Like

PyPi (for python) has a test instance available if you need to test publishing. I believe it is overwritten with a copy of the normal instance every so often. Googling the same for crates.io that doesn't seem to exist. Perhaps that would be a good solution to this problem.

Other than testing related to actual publishing, I don't believe you should publish on a package registry unless you want it to be permanent.

4 Likes

It isn't really testing publishing, it is the normal process of integration development.

Suppose I want to develop an alternative crate X to some crate Y I am using in crates A, B and C. Well I start development, then I want to do some integration and performance testing, so I need to publish X and runs tests on crates A, B and C (probably I have a feature gate so X is not enabled by default), then after doing that testing I expect to make changes to crate X repeat the testing etc, through many cycles. I don't really want to be continually editing the cargo.toml for crates A, B and C to fetch crate X from some alternate place ( in the mean-time this would probably stop me from publishing crates A, B and C without re-editing the cargo.toml files etc. ).

Through this entire process, crate X is still a "trial" crate, that should not be regarded as "permanent" or "stable", and if it turns out eventually that X is a complete failure, then X can be deleted entirely. Alternatively, a permanent, stable version of X can be published if X turns out to be a success.

I have done this locally (editing toml files frequently), but it is tedious, and for an organisation with multiple developers I imagine it would be even more painful.

Is there a reason that git dependencies are not sufficient here? The replace block in Cargo.toml should do fine here. Yes, you need to edit A, B, and C Cargo.toml files…but isn't preventing accidental publishing of a version using the under-test crate (that you're asking to be able to delete) a benefit of doing these edits?

2 Likes

That doesn't need to go on crates.io. I'm assuming these are not in the same workspace, as then the issue wouldn't exist. But even if it does, you can temporarily patch Cargo.toml to specify override paths. See Overriding Dependencies - The Cargo Book for more info.

You don't need to "contiunally edit" (whatever that means). Just point at a branch.

1 Like