Pre-RFC: User namespaces on crates.io

There does seem to be a real disconnect between the crates.io team and a sizeable section of the user base. I mean in the sense of a communication breakdown. Can anything be done to bridge this gap?

2 Likes

The preceding posts, and the probably hundreds of similar posts on the underlying topic of user namespaces on crates.io, indicate to me that the meta-issue is a process problem rather than a technical problem. As @withoutboats states, this is only one example of a recurrent problem within the Rust community. I'm not an expert in solving such process problems, but perhaps we can shift the discussion from the technical level to the process level, where it might be possible to focus on the root disconnect.

It seems to me that any proposed solutions for issues in this class are going to require significant effort on the part of the proposer to explore, understand, and document the impediments to adopting and implementing their proposal, including negative consequences of such adoption, rather than simply "throwing it on the table of public discourse" and transferring that burden to the responsible Rust team.

I wish that I personally could help more in moving the process issues forward. Unfortunately, as stated above, that's outside my areas of expertise.

8 Likes

You seem to be completely missing the point. The problem with the various solutions proposed so far and rejected is that they are overcomplicated and require cross-cutting changes, whereas so far nobody has attempted to make an RFC for the "rather simple fix" which has come up quite a few times in discussion.

Your metaphor is 180° off, and your propsed solution of using DNS names not only violates one of the core requirements of a namespacing system which has been repeatedly pointed out over and over (see below), but brings with it quite a bit of incidental complexity.

Here are some requirements that come up in all of these discussions which have been rarely addressed by any of the proposals:

  • If namespaces are added, their identifiers MUST be immutable. This means the only tractable approach for namespacing is a create-once, immutable identifier managed by crates.io, first come, first served, just like crate/package names themselves (as it were, I thought the crates-as-namespaces idea was interesting here). This requirement alone rules out a lot of proposals, including yours, which tie the namespace name to some external resource that can be lost or renamed, such as DNS names, GitHub org names, etc
  • Adding new syntax or other things that require changes to the module system significantly complicates the problem, the number of touch points in the overall project, and should probably be avoided

These points have been repeated to you over and over, but you don't seem to acknowledge them. Instead you're casting aspersions.

A simple namespacing solution would be one which adds an immutable namespace identifier to the crates.io database (or leverages an existing one like crate names as in the crates-as-namespaces proposal), and ideally doesn't touch the module system or require changes to anything other than crates.io.

14 Likes

I don't want to dive too deep into what this thread has become, but I would like to point out that I (the OP) submitted an RFC that is precisely what you ask for. It requires no cross-cutting changes -- the only changes would be in the crates.io package registry -- and I believe it is much closer to a viable MVP than to being "over-complicated".

The primary issue from my perspective is that the purpose of namespaces is to allow multiple crates.io packages to exist for a given crate name, and the RFC discussion has revealed that a member of the crates.io team is directly opposed to users having this capability.

Discussion of namespaces as access control for large projects (e.g. crate name prefix reservation) are interesting but out of scope for my proposal, and therefore IMO off-topic for this thread.

3 Likes

Your RFC begins:

A crates.io user Jane Doe signed in with the GitHub account "jdoe" would be able to publish crates with names prefixed by "~jdoe/"

The first bullet point in my post (and the feedback you received) spells out why this is problematic. To quote the relevant parts of my bullet point: "If namespaces are added, their identifiers MUST be immutable. [...] This requirement alone rules out a lot of proposals, including yours, which tie the namespace name to some external resource that can be lost or renamed, such as DNS names, GitHub org names, etc"

Your proposal ties namespaces to GitHub account names. As mentioned in the feedback you received, GitHub usernames can change, and when they do, crates.io usernames change accordingly.

One of the foundational architectural principles of crates.io, which makes it immune to "left-pad" style incidents, is that the package registry is immutable. Once published, a package cannot be renamed or removed.

A namespacing mechanism which retains this property must have similarly immutable names. GitHub usernames do not have this property.

2 Likes

Your assertions are too strong. Packages can be removed by the crates.io maintainers. This isn't done lightly (you need to have a really good justification, like a legal demand (e.g., DMCA takedown)), but it is done. crates.io is resistant to left-pad issues, but it is not immune, and it is not truly immutable.

Though I generally agree that something as innocuous as a GitHub username change shouldn't break the universe.

4 Likes

[Edit: removed a misunderstanding]

It is true that there are certain situations where crates have been removed, due to e.g. threat of legal action, but this is rare and can only be performed by a crates.io administrator, not an end user of the repository as in the "left-pad" incident.

1 Like

No, I am not describing yanking. I am discussing crate removal. You basically repeated my points in your second paragraph. I said very clearly that package removal requires a crates.io maintainer, and isn't done lightly (see my mention of legal demands and DMCA takedowns). I'm not sure why you thought I was talking about yanking.

2 Likes

This is very in the weeds, but left pad was not caused by a legal action like a DMCA takedown request. It was caused by an end user action and I don’t see how the equivalent would be true for crates.io? One could imagine a package publisher using legal means to “super-yank” a crate but that’s not a “left pad vulnerability” at that point.

EDIT: it’s worth noting that this is about as immutable as any centrally managed service can hope to be, but I can see how one would disagree with the statement that “crates.io is immutable”

3 Likes

Getting back on topic: the point is any new namespace feature must provide immutability guarantees equivalent to what's provided for non-namespaced crates today.

3 Likes

The exact format of user namespaces isn't important, and one of the conclusions of the RFC discussion is to use immutable GitHub user IDs instead. So a namespaced package would look like ~github:1234/somepkg. This eliminates any concerns around GitHub username mutability.

I recommend reading the RFC discussion since it goes into additional detail that wasn't captured in the initial revision of the RFC.

The core problem remains that the idea of namespaces was not acceptable to a member of the crates.io team. I believe that discussion of syntax minutae doesn't matter if namespaces themselves are seen as problematic.

2 Likes

Putting aside the UX and aesthetics of the syntax, let me go back to my second bullet point and conclusion:

"Adding new syntax or other things that require changes to the module system significantly complicates the problem, the number of touch points in the overall project, and should probably be avoided. [...] A simple namespacing solution would be one which adds an immutable namespace identifier to the crates.io database [...] and ideally doesn't touch the module system or require changes to anything other than crates.io."

You claim your RFC is "precisely what [I] ask(ed) for", but this looks like new syntax?

I think it would certainly require changes to Cargo and anything which consumes Cargo metadata:

error: failed to parse manifest at `Cargo.toml`

Caused by:
  invalid character `~` in dependency name: `~github:1234/somepkg`, the first character must be a Unicode XID start character (most letters or `_`)

One note about UX/aesthetics, or more specifically this:

I think it's very important that namespaces have human meaningful names, not (potentially large) numbers. This is why we have DNS instead of just using IP addresses. For new users, a GitHub user ID looks like 60937195. Having crates with numbers like that in them rather than a human-meaningful name means that humans can no longer remember package names, which is a huge step backwards in UX.

3 Likes

Moderation note: Folks, this conversation is not headed in a good direction. Before commenting, please consider whether your contribution meaningfully advances the discussion forward.

I think the big takeaway from this discussion and RFC has been heard: communication is hard, doubly so for contentious issues such as this one. In future proposals, it would probably be a good idea to prioritize reaching consensus on a high level design with the cratesio team before plodding through the details.

8 Likes

My proposal changes crates.io package names, not crate names. The Cargo.toml file would look like this, which parses fine:

[dependencies]
something = { package = "~github:1234/something" }

There are no changes required to Cargo or rustc to accept this new package name syntax.

I don't think package names need to be memorable, or user-meaningful. Given that users can still upload packages without namespaces, I think this feature would be primarily used by people who want to upload packages where their name is already "taken" by an existing registration.

My plan for my own packages, should I need to put them on crates.io, is to use UUIDs. The UX isn't as good because crates.io and docs.rs treat the package name as the crate name, but at least the library would still be accessible to projects that are required to register with crates.io.

Just because Cargo doesn't error during parsing, doesn't mean this requires no changes to it. As far as I can tell it is not possible to have the character / in any package names with the current Registry Index Format Specification. As a test I have published a registry that uses the standard / == directory separator and installing from it fails with a weird error (and I expect any fix for this would be to error earlier since the protocol does not support it):

[dependencies]
bs58.version = "0.3.1"
bs58.package = "~github:1234/something"
bs58.registry-index = "https://ipfs.io/ipfs/QmVbNCcr9XMxZjf744qYsDjfgtKCcpv6BL4RpuKTvEnXCP"
> cargo build --config net.git-fetch-with-cli=true -Z unstable-option
    Updating `https://ipfs.io/ipfs/QmVbNCcr9XMxZjf744qYsDjfgtKCcpv6BL4RpuKTvEnXCP` index
  Downloaded ~github:1234/something v0.3.1 (registry `https://ipfs.io/ipfs/QmVbNCcr9XMxZjf744qYsDjfgtKCcpv6BL4RpuKTvEnXCP`)
error: No such file or directory (os error 2)

Trying to force in a file with a / in the name isn't allowed by git (and would be a really bad idea anyway):

> git mktree
040000 tree ae1a8b238b69ecd9da638f6426099c9ca8a43b31^I~github:1234/something
fatal: path ~github:1234/something contains slash

What might be possible is using some other character to separate it, but testing that also fails with cargo internal errors, though these I believe are bugs and this should work since the protocol places no limits on the characters allowed in package names (other than those implied by using git as the transport):

[dependencies]
bs58.version = "0.3.1"
bs58.package = "~github:1234|something"
bs58.registry-index = "https://ipfs.io/ipfs/Qmeuz3m8QWdRahB8Vvgk9R5kvASEvrnNZGaFyy3WNJqSz7"
> cargo build --config net.git-fetch-with-cli=true -Z unstable-options
    Updating `https://ipfs.io/ipfs/Qmeuz3m8QWdRahB8Vvgk9R5kvASEvrnNZGaFyy3WNJqSz7` index
  Downloaded ~github:1234|something v0.3.1 (registry `https://ipfs.io/ipfs/Qmeuz3m8QWdRahB8Vvgk9R5kvASEvrnNZGaFyy3WNJqSz7`)
error: failed to find ~github:1234|something v0.3.1 (registry `https://ipfs.io/ipfs/Qmeuz3m8QWdRahB8Vvgk9R5kvASEvrnNZGaFyy3WNJqSz7`) in path source
note: this is an unexpected cargo internal error
note: we would appreciate a bug report: https://github.com/rust-lang/cargo/issues/
note: cargo 1.44.0-nightly (390e8f245 2020-04-07)

(Though, I'm not sure how cross platform the character : in filenames is, I could see this being impossible to use on Windows).

Ah, I forgot to change the package name in one more place, Cargo does restrict the package name more than git does, updating the registry again to

bs58.registry-index = "https://ipfs.io/ipfs/QmPyWhUY98UFQyKRHTBgGJXXtdNPXCBMHXrrFD9QdU4iGT"

gives an error:

error: failed to parse manifest at `/Users/nemo157/.cargo/registry/src/ipfs.io-ee09e7da863f2c3c/~github:1234|something-0.3.1/Cargo.toml`

Caused by:
  invalid character `~` in package name: `~github:1234|something`, the first character must be a Unicode XID start character (most letters or `_`)

Sonatype has a full-time staff which manually curates groupIds and promises a maximum turnaround time of two days when submitting new ones.

7 Likes

I signed up for a groupId in OSSRH in 2017, based on a domain I control. It was processed by a human (JIRA ticket ID OSSRH-30792) and the process took about 2 days.

Regardless, OSSRH's groupIds are at least in part manually curated by a full-time staff at Sonatype.

Please calm down.

The article that @bascule links is still part of the official, current, documentation (linked from https://central.sonatype.org/pages/ossrh-guide.html as "Why the wait?"). So its publishing date is not that important.

The ticket you link is for claiming a subpath of com.github., where the author only has to proof that they are indeed under control of the account.

Your anecdotal evidence may differ, but sonatype still documents what @bascule describes and has experienced, so dismissing their evidence as "anecdotal" is unfair, as they have experienced the documented process.

For what it's worth, I've been chatting with people about optional crate-based namespaces for years, and gotten mostly positive responses from people on the teams as well as community members. A lot of projects have wanted it for basically being able to signal ownership in an unspoofable way (e.g. async/foo, hyper/foo, icu4x/foo, tokio/foo).

The goal is explicitly not to solve general squatting (just squatting/overlap of prefix-foo crates). There are some open questions about how stuff gets imported but it's all solvable.

I've been intending to pre-RFC this after incorporating all the feedback from team members and the community (i.e. something that is actually likely to merge!), but every time the topic of namespacing comes up it devolves into an unconstructive mess, where everyone wants different things and people have strong opinions.

I have more time on my hands now and was intending to start drafting this, but honestly, after seeing this discussion, I feel pretty reluctant again. I might do it anyway, but please, please be more constructive in such discussions.

15 Likes