Pre-RFC: unique crate names


#1

Summary

Add a random and unique suffix to crate names, and allow the users to freely reserve names (except the suffix).

Motivation

The original name-squatting as well as the malicious typo-squatting are becoming a potential threat for the ecosystem as well as for the operations. With an unique identifier generated for each crate, these two problems can be solved without manual moderation.

Guide-level explanation

When you create a new crate, you will get a new name with a random suffix like “libc#9274”. You can reserve the name “libc” in this case for multiple times, as long as it doesn’t exceed an amount where the action is considered as brute forcing the suffix.

To publish in the new scheme, you will need to supply either --new or --tag <TAG> to cargo publish, where TAG is the numeric part (the suffix without #) described above.

If you are a user of the crate, you will need to add this to (or change the existing entry in) Cargo.toml:

"libc#9274" = "0.3"

Reference level explanation

Identifier generation

The identifier is randomly generated 4 digits number (0 padded if necessary) at the start. A unique constraint will be enforced on the database server. If the load factor goes up (many people register a certain name), then the digits will increase to keep the load factor under a target level. For example, if the target load factor is 10% and there are 5000 attempts to register libc, the first 1000 attempts will get a 4-digits ID and the rest will get a 5-digit one. Additionally, various abuse rate limits may be implemented server-side.

Changes to cargo publish

Two new flags are implemented:

  • --new: when this flag is supplied, a new crate tag will be reserved on publish.
  • --tag <TAG>: when this flag is supplied, the crate (update) will be published under an existing tag.

If neither of these flags are supplied:

  • If old style crate names are still supported, publish using the old scheme.
  • If old style crate names are no longer accepted, error out and prompt to use either of above.

Migration plan example

  1. Initial implementation of cargo publish and crates.io support of new style tags
  2. Roll out a recommendation on Cargo to migrate to new style tags
  3. Stop accepting registering new old style names
  4. Stop accepting update to crates of old style names

Drawbacks

  • These unique identifiers doesn’t have a semantic meaning and makes one crate hard to get discriminated from another one.
    • The search on crates.io may become less useful due to this; one way to improve is to incorporate more metrics like stars/downloads or maintenance, as well as peer reviews (crev).
  • Crate identifiers are no longer memorizable. This could be annoying when you have some go-to crates for common tasks.
  • The # character requires quotes in Cargo.toml. This is a minor disadvantage, but it makes clear distinction between the crate name (which is for extern crate and use statements) and the tag identifier.

Rationale and alternatives

  • Namespacing. While namespacing does allow the names to be meaningful, it’s main drawback is that the issue of squatting will not go anywhere, just shifting responsibility. Details on Rust Internals.
  • Other methods instead of the unique tag
    • Fully random and longer crate names The benefits of this approach is that crate tags can be unique across the entire registry, and it can make the original names no longer coupled with the crates. The main issue with this approach is that Cargo.toml will become hard to understand without IDE/plugins.
    • Random alphanumeric suffix Partly inspired by Tor Hidden Service URLs. Though, crates.io is a centralized service and no cryptography needs to be involved. When used without a special separator, these names are confusing, as the user don’t know what name they would use for imports and extern crate.
  • Use inline table to represent tags in Cargo.toml: libc = { version = "0.3", tag = "9274"}
    • While this does not require double quotes on crate names, and it’s more clear about the tags and names, this syntax is much more verbose.

Prior art

The tag format described is based on Discord’s DiscordTag, or what inspired it, Battle.net’s BattleTag.

Unresolved questions

  • Social engineering. With a random numeric tag you can’t discriminate a legitimate crate from a malicious one. It’s not very clear that how much is this different from the previous registry, where someone could register a seemingly legitimate name. Peer review, such as crev is likely the most effective way to deal with this kind of risk.
  • Crate discoverability. The names are now less intuitive, which basically sums down to the drawbacks regarding search. While people often use GitHub or just Google to discover a crate for specified purpose, it would take some extra work if we want to allow people to discover what they want directly on crates.io.

#2

Is there any evidence for this? Despite the name-squatting that has occurred I have yet to see anyone complaining about it directly affecting them registering a crate name. I also haven’t heard of any typo-squatting occurring.


#3

You’re correct, that was inaccurate. I will update the wording to be more conservative.


#4

I believe this makes the cure much worse than the original illness. I often prototype and throw a lot of crates in Cargo.toml from memory, even with versions. While I see name squatting potentially annoying, I’ve not hit an actual problem there and even if I did, it would be not often. Having to look up or memorise the suffixes would be constant papercut every time a new project is being created. This would be annoying on daily basis, for all rust users, not just crate authors who can’t publish a crate with the name they’ve chosen.

Besides, as a crate author, I’d try to avoid the name collision despite the suffix, to make sure people don’t mistake my crate with some other. But someone could create a „colliding“ crate later on.


#5

If there’s any vote, I’d vote strongly against this ugliness. Somehow npm survives without this.


#6

I agree. This seems more likely to encourage typo squatting than preventing it, since now two crates can have the same name, but someone might trick a victim into using the one with the wrong number sequence at the end, and the number sequence is random and therefore not likely to be memorable.


#7

In the current ecosystem, squatters often hold short and simple names, since these are the most desirable. The same things happens with domain names, too.

If someone has squatted on a short and simple crate name like “foo” and attempts to contact the squatter to have crate released for use have failed, the only option for the developer who wants the name “foo” is to pick a different name. As I understand it, there are people who are not satisfied with this approach because they are forced to pick a crate name that they feel is less desirable.

In your proposal, no one can have the name “foo” (not the squatter, and not anyone else), so the developer is still forced to pick a less desirable crate name. (I’m assuming that “foo#1952” is less desirable than “foo”). And not everyone else who are not affected by squatters is forced to move to a less desirable crate.

Can you say some words about why you think people affected by crate name squatting are going to find your proposal more appealing than the current status quo?