Child Thread: Survey of alternative identifier designs for Cargo and Crates.io

For background context and links to other threads, see Survey of organizational ownership and registry namespace designs for Cargo and Crates.io. This thread assumes that one has been read.

This thread is for summarizing approaches for avoiding squatting by changing how we identify packages. Other general approaches include:

Debating which proposal Cargo and Crates.io should go with is off-topic for this thread. Please create your own thread.

Prior art

Those relevant for this thread

Julia Pkg UUID

Each package has a name, version, and UUID.

The dependency name is the Julia namespace used for the package. Project.toml maps the dependency name to the UUID which is then used to identify the package.

As a parallel to cargo add, you can add dependencies with the REPL. Users can pass the package name to the add command.

TODO:

  • If names are used on the command-line, how are they resolved?
  • Presumably, packages can be renamed, so how do they deal with pkg> add foo changing meaning over time?

Motivations:

  • Changing names without breaking people
  • A shared, flat namespace between local packages and registries when resolving dependencies

References:

Potential solutions

Unreserved prefixes (no-op)

Maintainers could name their package <namespace>-<name>.

For brevity in code, maintainers can drop the namespace within Rust by setting lib.name = "name" or users can rename the package in their dependencies table.

Compared to use cases:

  • No trust in the namespace as anyone can publish in it
  • Not friendly to renames or namespace transfers

Compared to requirements:

  • If using the lib.name trick, how to reference the package in Rust code is unclear though that is an existing issue for renames and kebab case (cargo#15887)
  • Without the lib.name trick, access within Rust is verbose

Future possibilities:

  • Encode this in the ecosystem by adding to .cargo/config.toml a cargo-new.prefix = <string|bool> with a default of $USER for packages not being added to workspaces
    • advice will be given for changing this
    • Users can always disable with cargo-new.prefix = false
    • Repos that want to default to their Rust namespace can set cargo-new.prefix = "namespace::"
  • Resolve cargo#15887, making discovery of Rust namespace more discoverable

Package-level extern name

In effect, this is promoting lib.name to a package level property that is used in the UX for Cargo and crates.io. We are renaming this from 'Package "display name"s' as that implies this is for UX purposes only and does not have an impact on compatibility.

Users manually manage conflicts in their naming of the package but can also set a package-level extern name for what they desire should be shown. As an extern name, this is the name it is referenced by in Rust code. The main way the user interacts with the package name is in dependency declarations.

Compared to use cases:

  • No trust in who published the package
  • Communication of packages will be difficult
  • Users will be prone to depend on the display name, not the package name, pulling in the wrong dependency

Compared to requirements:

  • No correlation between the dependency and the namespace in Rust
  • Import collisions are possible which impacts usability. Like with multiple semver-major versions, users can rename the package.
  • Confusing having to interact with two different names

References:

UUIDs as package identifiers

Like "Package-level extern name" but deprecating package.name in favor of a unique ID.

Compared to use cases:

  • No trust in who published the package
  • Communication of packages will be difficult
  • Users will be prone to depend on the display name, not the package name, pulling in the wrong dependency

Compared to requirements:

  • UUIDs in dependencies would be meaningless to the user
  • No correlation between the dependency and the namespace in Rust
  • Import collisions are possible which impacts usability. Like with multiple semver-major versions, users can rename the package.

References:

Unique suffixes

Like Discord’s DiscordTag, usernames are assigned a server-side suffix that is used to qualify which package of that name it is.

e.g.

[dependencies]
"libc#9274" = "0.3"

It is presumed that the suffix is dropped when used in Rust code.

Users will need to pre-register their packages and update their packages to use the suffix. The RFC said that this could be done at publish time but there needs to be a way to know which suffix is intended. It can't be (package-name, publish-owner) as there can be multiple publish owners and one could have a prior version of the package or want to fork it short term.

Compared to use cases:

  • No trust in who published the package
  • Typo squatting becomes easier as harder-to-remember numbers are used and typo squatting monitoring doesn't help as easily-typoed names is an intended use case
  • Communication of packages will be difficult
  • Users will be prone to depend on the display name, not the package name, pulling in the wrong dependency

Compared to requirements:

  • A package cannot be published to multiple registries as its suffix may be different
  • Numeric suffixes will be difficult for users to distinguish packages
  • Technically, how to reference the package in Rust code is not 1:1 with the package name but dropping the suffix is likely an easy thing to pick up along the same lines as switching kebab-case to snake_case (cargo#15887)
  • Import collisions are possible which impacts usability. Like with multiple semver-major versions, users can rename the package.

References:

1 Like

IMHO, a human readable name is always better than a nonsense uuid, and different package owner that use the same name might make beginners hard to choose which crate to use.

Maybe there is a better way: combine crate name with editions.

Firstly, if you register a crate name that is never used before, you can own that crate name until a new challenge occurs and no response is submitted. Each crate could have 2 different crate names (for example, reqwest could also registered as request#seanmonstar, which has the right to submit challenge to own request crate. )

A challenge could be submitted by another crate with the same name and different owner, which is actively maintained in the latest 2 editions, and the challenged crate has at least one of the following conditions:

  1. No updates in the latest 2 editions. In this case, the challenge cannot be responsed, and the crate name transferred asap.
  2. No updates in the newest edition, In this case, the challenge could be responsed by publish an upgraded version in (for example) 90 days.

An official short name might guide users to use the correct crate, that's a very good feature and thus should not be discarded easily.

It took me a minute to realize this was a user name.

What about very similar to the libc#9274 in the OP, it's just a series of words that map to the server generated identifier: libc#persuasive-octopus

The identity of the crate is district from who / what org owns it.

It seems strange to require publishing new versions if it's not for a code change.

Any automated design to transfer control of crate names is going to be a pretty huge target for bad actors trying to take control of widely used crates

6 Likes

In Haskell they use this approach: Taking over a package - HaskellWiki

1 Like

I like this step

State your intention to take over the package in a public forum (we recommend Discourse, the haskell-cafe and/or libraries list). CC the maintainer.

And that there is a human in the loop - by default nothing happens.

Malware takeovers rely on the fact that they may go unnoticed for some time. This is likely to happen from time to time if there is any kind of automated takeover. It's less likely for this to happen if the takeover is manual and widely publicized.

In the Haskell case, the Hackage admins took for themselves the job of doing this mediation

Send an email to the hackage administrators (hackage-admin@haskell.org), with a link to the public email thread.(Include "package takeover request" in the subject).

The admins will grant you maintenance rights or upload a patched version for you.

However, the current crates.io team prefers to not have this policy, as per this RFC. In that RFC I suggested that maybe the job could be done by someone else, since it looks like an important (albeit thankless) thing to do to cultivate a better ecosystem.

Not having a mechanism to facilitate crate ownership changes means that there will be some instances of ecosystem breakage and/or churn that would be avoidable. Think about some very important crate with tons of dependencies, where owner goes MIA and realistically, a lot of projects will not even notice that they should migrate to the specific fork the community coalesced to (such forks tend to be announced on users.rust-lang.org, /r/rust, and other places, but after some time those threads get buried).

But now I think about it, a third-party tool (or even a first party cargo subcommand) that lists actively maintained forks of abandoned dependencies of your project might be a good replacement, at least for people that actually bother to use such a tool (there would still be churn though).

Anyway, about this

I think that it's odd to suppose all crates need to be regularly updated. Some crates are just done, and it's possible that for some crates further updates would be better handled in a separate, forked crate

It would be nice if Rust had some way to specify whether a crate is done, actively maintained, passively maintained (accepts PRs but no first party development), abandoned, etc. A lifecycle setting on Cargo.toml is not a great fit for this job, because Cargo.toml can only be updated with a new release, and it often a crate is declared done after some years of no activity. But crates.io already has some issues related to this config mismatch, and they might be better solved together - for example, the README in crates.io can only be updated with a new release, and this is not always the right thing to do

What is interesting here that the crate owner may decide a crate is done, but the community around it doesn't agree. I think in this case a fork is probably better, unless we are talking about a foundational crate (something at serde level). In this case, it would be better to take over it under the rust-lang umbrella, like libc, rand, etc

1 Like

This already exists: cargo-deny/cargo-audit.

1 Like

Note that in the supporting post Survey of organizational ownership and registry namespace designs for Cargo and Crates.io - #2 by epage, one of the requirements we are operating under is maintaining the existing ownership transfer policies. Changing such policies is out of scope wrt this discussion.

There used to be a maintenance badge in Cargo.toml but it ran into the problem you mentioned. This Development-cycle in Cargo: 1.78 | Inside Rust Blog covers a little bit on the topic of this type of mutable metadata.

Here is a variant of "unique suffixes"

The RFC 4151 for tag: URIs has to deal with domains changing owners. Their solution is to require the inclusion of a date. or at least a year.

Discord used the #1234 suffix because it did not require the name to be unique at a given time. If you allow renames but forbid duplicates then using a disambiguator derived from the time may offer a better DX.

For example a package may be uniquely identified by its name and date of first publication, such as serde#2014. The recommendation would be to include this year in the manifest (eventually require it?) as in serde#2014 = "1.0.228", you'd import it without the #, so just use serde; same as today.

If they decide to rename to deser then consumers would be able to specify it in their toml as either serde#2014 or deser#2026. Using the older name would trigger a warning to update it. If no year is specified, cargo defaults to resolve the oldest crate with this name.

After the rename to deser#2026, the name serde becomes available again and someone else can claim it again. The fully-qualified identity of this new package is then serde#2026. Consumers must use the fully qualified name to use this new package.

You may also support aliases for all the years when a release was published. So serde#2020 is an alias for serde#2014.


Compared to "unique suffixes" with random ids from the original post, this helps with communication of packages or publication to multiple registries.

Sounds like your idea is focused on reclamation of names after a rename. While a possible idea to explore separately, it isn't one of the use cases we are targeting in Survey of organizational ownership and registry namespace designs for Cargo and Crates.io - #4 by epage. If you think it should be included, that is likely a conversation for that thread. We do talk about the need to recognize that names outside of the registry are mutable and need a scheme to support that for some potential solutions like in Survey of organizational ownership and registry namespace designs for Cargo and Crates.io - #4 by epage but that is focused on the transition path and not reclamation and could have conflicting syntax.

1 Like

The ability to rename packages in the dependencies by simply adding a name key & value to the table of a dependency like

[dependencies]
itertools = {version = "*", name = "foo"}

to make it so that it is referenced in code as foo would shouldn't conflict with any other solution and may make many solutions work better.

You can already do this with `package

You can already do this with package = "foo", unless there's some new behavior I'm overlooking?

I didn't know that. nice