Summary
Add a random and unique suffix to crate names, and allow the users to freely reserve names (except the suffix).
Motivation
The original name-squatting as well as the malicious typo-squatting are becoming a potential threat for the ecosystem as well as for the operations. With an unique identifier generated for each crate, these two problems can be solved without manual moderation.
Guide-level explanation
When you create a new crate, you will get a new name with a random suffix like “libc#9274”. You can reserve the name “libc” in this case for multiple times, as long as it doesn’t exceed an amount where the action is considered as brute forcing the suffix.
To publish in the new scheme, you will need to supply either --new
or --tag <TAG>
to cargo publish
, where TAG is the numeric part (the suffix without #
) described above.
If you are a user of the crate, you will need to add this to (or change the existing entry in) Cargo.toml:
"libc#9274" = "0.3"
Reference level explanation
Identifier generation
The identifier is randomly generated 4 digits number (0 padded if necessary) at the start. A unique constraint will be enforced on the database server.
If the load factor goes up (many people register a certain name), then the digits will increase to keep the load factor under a target level. For example, if the target load factor is 10% and there are 5000 attempts to register libc
, the first 1000 attempts will get a 4-digits ID and the rest will get a 5-digit one.
Additionally, various abuse rate limits may be implemented server-side.
Changes to cargo publish
Two new flags are implemented:
-
--new
: when this flag is supplied, a new crate tag will be reserved on publish. -
--tag <TAG>
: when this flag is supplied, the crate (update) will be published under an existing tag.
If neither of these flags are supplied:
- If old style crate names are still supported, publish using the old scheme.
- If old style crate names are no longer accepted, error out and prompt to use either of above.
Migration plan example
- Initial implementation of
cargo publish
and crates.io support of new style tags - Roll out a recommendation on Cargo to migrate to new style tags
- Stop accepting registering new old style names
- Stop accepting update to crates of old style names
Drawbacks
- These unique identifiers doesn’t have a semantic meaning and makes one crate hard to get discriminated from another one.
- The search on crates.io may become less useful due to this; one way to improve is to incorporate more metrics like stars/downloads or maintenance, as well as peer reviews (crev).
- Crate identifiers are no longer memorizable. This could be annoying when you have some go-to crates for common tasks.
- The
#
character requires quotes inCargo.toml
. This is a minor disadvantage, but it makes clear distinction between the crate name (which is forextern crate
anduse
statements) and the tag identifier.
Rationale and alternatives
- Namespacing. While namespacing does allow the names to be meaningful, it’s main drawback is that the issue of squatting will not go anywhere, just shifting responsibility. Details on Rust Internals.
- Other methods instead of the unique tag
- Fully random and longer crate names The benefits of this approach is that crate tags can be unique across the entire registry, and it can make the original names no longer coupled with the crates. The main issue with this approach is that Cargo.toml will become hard to understand without IDE/plugins.
- Random alphanumeric suffix Partly inspired by Tor Hidden Service URLs. Though, crates.io is a centralized service and no cryptography needs to be involved. When used without a special separator, these names are confusing, as the user don’t know what name they would use for imports and extern crate.
- Use inline table to represent tags in Cargo.toml:
libc = { version = "0.3", tag = "9274"}
- While this does not require double quotes on crate names, and it’s more clear about the tags and names, this syntax is much more verbose.
Prior art
The tag format described is based on Discord’s DiscordTag, or what inspired it, Battle.net’s BattleTag.
Unresolved questions
- Social engineering. With a random numeric tag you can’t discriminate a legitimate crate from a malicious one. It’s not very clear that how much is this different from the previous registry, where someone could register a seemingly legitimate name. Peer review, such as crev is likely the most effective way to deal with this kind of risk.
- Crate discoverability. The names are now less intuitive, which basically sums down to the drawbacks regarding search. While people often use GitHub or just Google to discover a crate for specified purpose, it would take some extra work if we want to allow people to discover what they want directly on crates.io.