Pre-RFC: Package Staging

Preface

New to the RFC process. This may have be floated in a few places before but I wanted to give some shape to it. Partially motivated by workspace publish / cargo publish multiple packages at once. Still a work in progress, looking for feedback. I don't know enough about cargo / crates-io internals to fully define this, or to determine feasibility or how difficult this would be.

rendered

Summary

Create a package staging area. Packages can be published to staging. Staged packages can be published to the index, replaced, or deleted.

Motivation

Package names are often reserved by uploading an empty crate. In many cases the author abandons the project. A fledgling rustacean might want some additional feedback on their project, before committing to it. Staging allows reserving a package name, and verifying a package, without permanently adding it to the index.

Packaging and publishing multiple crates currently requires packaging and publishing packages in order. While there are some community maintained tools that help automate publishing workspaces, they can't ensure that errors won't occur, and when they do, it leaves the workspace in an indeterminate state, with some packages published and others unpublished. Typically it is useful to tag released commits, but if some packages can't be published without fixes, this is no longer possible. Package staging would allow cargo and crates-io to validate all packages, before adding them to the index.

It may be useful for crates-io to quarantine crates, to prevent them from entering the index. This could be particularly disruptive when publishing many packages that depend on each other. Staging affected packages would allow the author to publish after the quarantined packages are validated.

Organizations may want to require multiple sign offs before publishing packages. Packages could also be staged by automation, particularly helpful with many large uploads. Staging could enable or otherwise improve these options.

Guide-level explanation

Add a package staging area to crates-io. The API will support publishing packages to staging, releasing packages to the index, deleting staged packages, and searching for staged packages.

  • cargo package --stage allows packaging multiple packages that depend on each other.
  • cargo publish --stage uploads packages to staging, without releasing them.
  • cargo publish --release releases staged packages. Packages specified with -p or --package do not have to be local to the workspace.
  • cargo publish --stage --release will release packages on successfully uploading them.
  • cargo unpublish deletes packages from staging.
  • cargo search --staged lists packages the user owns.

Reference-level explanation

The additional functionality is added to nightly cargo, gated with -Z unstable-options.

This RFC describes motivating features "packaging and publishing multiple crates at once" and "crate quarantine", but may be minimally implemented without them. Accepting this RFC does not require accepting these features.

Currently, packaging multiple packages that depend on each other will fail. However, multiple packages can be packaged if they do not depend on each other. Packaging of multiple crates may require significant changes, including constructing a DAG of crate dependencies, and determining a new order to package them in, as well as a [patch.crates-io] table. It is therefore easier to make this functionality opt in. Potentially the --stage flag could be removed on stabilization, or no longer required with an edition change.

Publishing multiple packages can be done via cargo publish --stage --release. The packages will be packaged together, then uploaded one at a time in order. When all packages are staged successfully, will release all the packages.

Staged packages are not added to the index, and cannot be downloaded.

In the case of quarantine, packages that cannot be published will be staged. When / if the quarantine is lifted, the owner could be notified to release the packages.

Only one package version of a package can be staged. When a package is staged, it will overwrite the previous one.

When a new package is staged, the author will gain ownership of the package. If a package is unpublished, and does not have a released version, the author will no longer own the package, and it can be used by another author.

crates-io usage policy will apply to staged packages. crates-io may accept, reject, or quarantine a package when it is uploaded.

Add a stage endpoint with authorization, supports cargo publish --stage.

Add a release endpoint with authorization. When a package is released from staging, it's dependencies must have a released version. Ensures all packages can be released, in the provided order.

When cargo releases multiple packages, cargo should ensure that they are staged by the owner. Cargo will determine and verify the order that packages can be released in, and release them, either one at a time, or all together.

Add a delete endpoint with authorization. Staged packages can be deleted. If they are quarantined, the package will be retained for investigation.

Add staged: bool argument to search endpoint. Will search staged packages owned by the user. Requires authorization.

When stabilized, crates-io website should support similar functionality. Staged packages are listed, and can be published or deleted. May also show warnings / errors, like missing manifest keys, or when a package is quarantined.

Drawbacks

This RFC proposes several new cargo command line options and web API changes, and potentially requires some coordination between teams.

The new functionality may require significant refactoring to mitigate logic duplication, in the case of staged versus not staged code paths, both in cargo and crates-io.

Package staging makes squatting of package names somewhat easier, and prevents the community from policing such crates, as they wouldn't be able to see what was uploaded, or any information other than who owns the package.

This RFC is motivated by several features, including "packaging / publishing multiple crates at once" and "crate quarantine". It is less useful without at least one of those features.

Rationale and alternatives

Cargo and crates-io could have a more explicit way to publish workspaces and / or multiple packages at once. This might be simpler and provide a better user experience. However, because crate uploads are limited to 10 mb to mitigate Denial-of-Service attacks, it might be difficult to make publishing "atomic". With package staging, packages can be staged and released independently, and it's not ambiguous what should happen on publish failure.

The exact API could be altered, options / endpoints renamed. For example, could adopt npm's publish --access public|restricted. Staging was chosen because it best describes the intent, which is to break publishing into stage and release steps, rather than allow users to use crates-io as a private registry.

This RFC does not support downloading staged packages. This could be useful for validation or for others to review. Potentially users could test installing or using their packages. But this complicates versioning, as package versions are not unique, nor are staged packages permanent. Better to not have cargo deal with lockfiles containing packages from staging.

Publishing packages yanked was also considered. This could reuse existing means to unyank packages, which is similar to publish --release. However, it's less clear that a package that is published yanked can be deleted / altered, unlike a package that was published normally and then yanked. Yanking has a specific purpose, which is different from staging.

Prior art

  • RFC: Add staging workflow for CI and human interoperability #92 One inspiration for this RFC. With npm, the promote command would be equivalent to publish --release. A separate command might have better clarity, but requires duplicating many additional options, and doesn't allow chaining --stage and --release, to facilitate publishing multiple packages. publish is a permanent (now only potentially) action that users are already aware of, so it could be confusing to introduce another one that has similar effect.

  • RFC: Nested Cargo packages #3452 Nested packages are a neat idea, but don't cover many workspace configurations. It assumes that the parent package will have many nested / private dependencies. For example, wgpu is a graphics API that includes naga, a shader translation API, which could be used independently. So nested packages would not cover the broader ecosystem, and are not a true replacement for publishing multiple packages in a workspace. Due to the 10 mb size limit mentioned previously, nested packages could potentially be better implemented via staging.

  • Crate quarantine A motivation for this RFC. See this comment "Under this proposal, quarantining a crate could very seriously complicate publication of a multi-crate workspace (or other kind of "deep stack"). Rust wants us to make smaller crates for separate compilation reasons, but cargo then insists that these crates which can be conceptually part of a single program must be published separately. For release management to be practical, there must be a way to publish a workspace even if some of the involved crates end up quarantined." (snipped). Staging allows packages to be quarantined on upload, packages can be released together, such that if any package is quarantined, the operation fails.

Unresolved questions

The exact API, names of command line options, endpoints, or other details could be changed. The RFC could be implemented, and the names changed prior to stabilization.

Future possibilities

This is more or less covered in the motivation section.

Considering we don't even have official workspace publishing support yet, I think we should start there. We can then look into batch-upload APIs. In my mind, those are pre-requisites to then exploring and adopting a staging solution. The design and implementation gap would then be smaller.

It would be good to explore Prior Art from other ecosystems. What are other ecosystems doing for staging? If they aren't using it, is there a reason?

Some other thoughts

  • Who all has access to a staged area?
  • This seems like a new way to squat on packages (and seems invisible). Requiring an "unpublish" (which sounds more like a harsher yank) to free the name is not great
  • Silently overwriting in a staged area sounds concerning
  • Organizations would likely need to download from their staging and do operations (which this proposal says can't be done). I wonder if intermediate servers for them to run would be a better approach.
  • I don't see this solving some of the quarantine problems

If we have staging areas, I wonder if stages should be ephemeral. You "allocate" a stage, publish to it (only once allowed per package), and then officially publish it. A stage would expire after a fixed amount of time with the option to request so many renewals.

1 Like

Considering we don't even have official workspace publishing support yet, I think we should start there. We can then look into batch-upload APIs.

As mentioned, the RFC is one one way of implementing that.

With batch uploads, what happens if some packages can't be published? There is a 10 mb limit, to prevent locking up resources, so at least internally it seems like it would end up looking very similar to package staging, even if not exposed via cargo itself, and merely temporary as you describe.

The simplest reason for failure is simply that not all package names could be acquired. A package could be quarantined. crates-io might have its own requirements / policies, that cargo can't fully replicate. In particular, with large numbers of packages, the delay between when cargo could perform validation, and when all pacakages are uploaded becomes larger, creating more possibility for failure.

My understanding was that workspace publish is intended to be implemented within cargo itself, which would make it difficult to ensure that failure doesn't occur, leaving some packages published and others not. Staging would allow a single command to make packages live, where nearly all validation can be done beforehand.

Cargo would have to change its behavior to support multiple packages with package and publish, so it was reasoned that it would be easier to explicitly opt in, at least when experimenting with the new functionality.

Who all has access to a staged area?

A package can have one staged version. Anyone who can publish that package (ie the owners) can publish to staging.

This seems like a new way to squat on packages (and seems invisible).

Yes. Not invisible, you can still do cargo owner foo. Squatting is already possible, at least now it would be possible to undo.

The privacy aspect isn't truly necessary, but it wouldn't make sense to make a package available on the index, as it might create confusion, and breaks the expectation of immutability. I thought about merely allowing packages to be published yanked, but felt this would be too confusing and not correctly model the intent.

Requiring an "unpublish" (which sounds more like a harsher yank) to free the name is not great

Currently there is no way to free the name. In relation to squatting, it would be easier to give up the name if the author decides to abandon the project.

Silently overwriting in a staged area sounds concerning

I suppose this could be explicit.

Organizations would likely need to download from their staging and do operations (which this proposal says can't be done).

You mention that workpsace publishing isn't implemented, but an idea that I float as potential motivation becomes a blocker? :slight_smile: I would agree that it would be useful to allow downloading for this case, but that is out of scope. It may not be necessary in order to validate that a local copy is identical, could just have a timestamp / id.

As mentioned in the RFC, downloading of staged packages allows crates-io to be used as a private registry, which might not be intended, though it might not be a problem. For example, it could be useful to test drive packages before publishing. Again, not strictly necessary, and perhaps out of scope.

I don't see this solving some of the quarantine problems

Packages can be released from staging together. That means there is no way that quarantine breaks the publish.

If we have staging areas, I wonder if stages should be ephemeral. You "allocate" a stage, publish to it (only once allowed per package), and then officially publish it. A stage would expire after a fixed amount of time with the option to request so many renewals.

If it makes sense to have staged packages expire, I suppose. Why explicitly allocate, when it could be done implicitly on upload? While I agree the intent is mostly temporary, mechanically it doesn't really matter so much, and the alternative is publishing multiple permanent versions, possibly yanking them, which only takes up more resources.