I'd like to note that this is meant to be a summary and not an attack. I'm hoping it makes the RFC process a little easier for any future namespace proposals. The title comes off a little strong, but it's what naturally came as I tried to analyze, organize and summarize all the objections that blocked previous namespace proposals.
It also has a bevy of links at the end for those interested in finding original sources for my summary.
Thank you so much for consolidating these points in one place!
We donât have any structured support for squatting either, which makes it hard to separate bad actors from good actors. I think separating them out would require manual intervention, and the crates.io team is small and doesnât have a lot of time to put towards that.
Speaking only for myself (and not as a team member) I like the approach where we offer a UI to reserve crate names, and unless re-upped it would expire after say 6 months. We would probably want to show this in search results so that users don't think it is available, but it would be great to avoid polluting the index with reserved names. We could limit active reservations to say 10 per account (and possibly allow admins to bump this upon request). Then we could update the policy such that squatting is still allowed, but only via the approved method.
(I recall some previous discussion on this but don't have time to track down the link at the moment.)
I think there are valid use cases for namespacing, but I don't think namespacing does much to address name squatting (we would need to have similar rules to the above to protect the namespaces themselves). I think the above would be a good first step in addressing name squatting while helping to decouple it from the other namespacing use cases.
A less obvious definition is that adding new dependencies shouldnât stop you from compiling already working code. This is a major motivation for the orphan rule (though the orphan rule is more nuanced than that). This is a strike against schemes that encourage multiple distinct crates to have the same default name in code.
Could you expand on this? To me, unless I am missing something (which I may very well be), this "preference" is subjective.
Speaking for myself, for instance, I wouldn't mind encountering the following situation:
Doing:
cargo add http@namespace_name
(Invented namespace syntax, definitely not the point I wanna talk about)
To then immediately get a cargo error complaining that http is an already present direct dependency, thus asking me to rename it to something else.
That is, I don't see that workflow as contradictory / surprising / undesirable
In other words / if other people feel the same, "namespaces" could be implemented as an auto-renaming sugar, especially provided something like cargo-edit's cargo {add,rm} were to be blessed into bundled-by-default-with-cargo, like cargo vendor or other subcommands did:
Allow publishing crates to crates.io with some new optional syntax, such as a delimiting @, /, or :: sigil. For the sake of the example, I'll pick <name>@<namespace_name> syntax.
If we need retrocompat, then let's pick something like <namespace_name>--<name>
Then, cargo add <name>@<namespace_name> would be equivalent to doing cargo add <name>@<namespace_name> --rename <name>, that is, in Cargo.toml syntax:
This would solve the "get a conflict when adding a dependency issue", since it would be no different of someone having already some dependency foo renamed as bar, and then trying to add the bar crate.
It would also provide the ergonomics (at least, as long as people use cargo add and the like), to avoid the current status quo whereby people manually prefix their crates and users then need to use long crate names or manually rename as they see fit.
Multiple members of the crates.io team have expressed that they aren't in favor having multiple crates resolve to the same name in code. As I reflect, I think my post was a little too confident in assigning motivation to those opinions. I'll see if I can find sources.
Sorry if I'm being obtuse but doesn't disallowing different crates from having the same name in code effectively rule out namespacing? I mean, that's essentially the definition of namespacing no?
I see there being three major effects of namespacing:
You can easily upload a package into a namespace without worrying about the names of packages outside that namespace.
It encourages forking / overlapping identity.
It's a way to group together related packages, a sort of shared identity.
I'm not sure that overlapping identity is completely out of the picture, but I think you'd need to show some pretty compelling benefits that require overlapping identity that outweigh the negative aspects.
So names with -- are allowable right now, but the number of crates that use -- or __ in them is incredibly small. I think it'd be completely reasonable to ban new crates from using __ or -- in the off chance that we want to go that route. I say off-chance because even if it is a small chance that we go that route, I think it'd still be worth it.
Thanks for the pointers w.r.t. my remark, @samsieber, and of course for the post as a whole: these threads have become so big that having a summary like yours is definitely welcome
In computer science-y terms, the primary benefit that namespaces provide is that they reduce the complexity of dealing with the name squatting problem from O(N) to O(1). Given N crates that an individual wants to publish, without namespaces requires solving that problem O(N) times -- once for each crate, whereas with namespaces that problem is reduced to constant time O(1) because after you find just one suitable namespace the naming conflicts problem caused by other users holding names vanishes in the context of that namespace.
The problem of choosing a name is enormous; it includes squatting, but also includes typical practical problems like discoverability, popularity, memorability, trust, etc., all things us mortals will not "solve" any time soon. There are an endless array of variables that contribute to the complexity of choosing crate names. Namespaces at least enable us pull a factor of N out of it.
To be specific, I think the community should strive to realize the factor of N complexity savings that something like namespaces provides, but it's not clear to me that "namespaces" as commonly conceived is the best formulation.
Addendum
This O(N) -> O(1) reduction argument is applicable to both perspectives mentioned in the blog post:
Validating common authorship among crates - namespaces are valuable here because complexity of validating the authorship of each individual crate to just doing it once
Avoiding naming conflicts - namespaces are valuable here because it reduces the complexity of choosing a name out of a crowded space to just doing it once
A flat global namespace is an oversimplification that ignores reality. Of course it is a bug.
Identity is nuanced and has many facets:
In reality, names are not unique. It is perfectly valid for both Google and Facebook for example to have their own competing "log" crates.
Ownership can change in various ways. A project might be transitioned from one maintainer to another with or without related policy changes such as licensing. While it is true that for some cases it makes sense to keep continuity it is certainly not the case that it should be the case always. For example, if ownership change also entails change to a more restrictive license, keeping such continuity could entail legal repercussions if the user updated the crate version without verifying the license hadn't changed.
Stability is overstated - a reasonable retention policy solves the problem of stability and prevents "left-pad" style fiasco without overreaching for an immutable repository that keeps your crates forever. Such a "forever" policy entails other risks as well, such as security and legal risks. Think GDPR style laws that allow a person to be forgotten. There is a tradeoff here between the crate owners' rights and the users' rights and any simplistic extreme is flawed by definition since it doesn't take into account the nuances involved. Ownership change can be solved by setting a redirect link when appropriate.
Having globally unique crate names is also overstated - it is sufficient to have locally unique names on my machine or within a cargo workspace. It is perfectly valid to use facebook's hypothetical "log" crate within one project and google's one within another. as long as I don't try to mix conflicting names within the same project.
The current design is due to some opinionated members on the crates.io team that wanted to cater to JS developers and their fads and thus insist on an "improved npm" design. That is, they have applied a design that relies on an ambient global owner of all crates (which would be familiar to GC based JavaScript users) ignoring the most basic concept of Rust - explicit ownership semantics and lifetimes. The fact that this topic is one of the most debated in the community and has raised so many questions and pre-RFCs over the years is a clear indication that the crates.io teams is refusing to accept feedback.
This is exactly analogous to Subversion's claim of being "a better CVS" - true yet irrelevant in the grand scheme of things as both has the same flawed oversimplified model which was replaced by a more nuanced one. The only difference is that Subversion was simply not visionary enough (Git came later) whereas in our case, there are ample examples of better models from 30 years ago before JS even existed. Debian's apt supported multiple repositories in the 1990's for one example of a system with a more nuanced and better designed approach.
Nobody is talking about 'names' in this sense. You can informally refer to a crate however you like. Names in this discussion means 'crate identifiers'. They must be unique if they are to be resolved unambiguously. Whatever facts are needed to resolve a dependency is effectively the crate identifier. 'log' certainly does not suffice, especially if you have to disambiguate whose log it is.
as long as I don't try to mix conflicting names within the same project.
How would you even enforce that if you don't give each crate a globally unique identity? What technical use is there for a non-unique crate identity? Branding?
crates.io already allows multiple packages to use the same crate name. Only package names must be globally unique.
For example, it's already possible to publish one package called facebook-log containing a crate named log, and a second package called google-log that also contains a crate named log.
With my moderator hat on: This type of rhetoric is not acceptable here. Do not do this. (Contact the mod team out-of-band if you have any questions or comments on this matter.)
Why would I need to enforce anything?
It is perfectly fine for say Facebook to self host a crates repository that contains a package named "log" while Google does the same with their rival "log" package.
As a user I prefer, say, Google's logging facilities so I'll depend on "Google/log". As long as cargo can resolve my transitive dependencies without name conflicts than there shouldn't be an issue.
If one of my dependencies itself depends on the competing "Facebook/log" than I'll have a conflict trying to add my dependency on "log" which I can than resolve locally on my machine by renaming the package to maintain unique names.
This is already admitting the need for some sort of naming hierarchy, albeit with a hacky approach. People are simply asking for a more principled first class solution.
Are you actually talking about "namespaces" in the same way that the OP is talking about them? Because APT doesn't have those kinds of namespaces.
A single APT repository contains "namespaces" for architecture and OS version, but the original post was talking about author namespaces, and Debian doesn't work that way. A single Debian repository has a flat list of package names, just like Cargo repositories have.
If you're asking for something that works like APT, then I'm not sure you're even asking for anything to change within crates.io. Being able to overlay and pick from multiple upstream repositories could be done entirely with changes to the Cargo client.
Any attempt to disambiguate two crates requires giving them a unique identity. That's what uniqueness means. It disambiguates. If you're not unique, you're not disambiguating, which means you're saying they're the same thing. If you're not giving Google/log and Facebook/log unique identities, then you're assuming that they are exactly the same thing and need no disambiguation. You can't disambiguate them locally, if they are literally the same thing globally.
This is off in the weeds anyhow. My point was that "In reality, names are not unique." doesn't apply here because this is not 'reality' as such: it is an artificial system which is designed for simplicity, and uniqueness is a very important property of that system. You can't just discard it because 'real life' has some complexity to it. Crates.io has no more obligation to allow non-unique names as it has to make an omelette breakfast.
I'm talking at a more conceptual level. Crates.io currently enforcs one global repository(*) with a flat namespace. What people really asking for is official support for a hierarchy. The usual suggestion to solve this problem is by introducing namespaces to the global repository whereas I'm suggesting to simply better support multiple repositories. This provides the same kinds of benefits in a decentralized manner. Another comparison I could make is of course got itself (git allows multiple remotes)
Actually, the opposite is true: cargo already supports an "advanced" use case where you can configure an alternative registry. The problem is that cargo provides a UX for a single registry concurrently (even though it does in fact allow to have multiple registries at the same time. For instance, cargo search does not display the registry name, just the package name.
If like to see UX similar to git where remotes are first class citizens. (Branch names for example are displayed as /. so for example it is clear what "origin/master" means. So tooling has most of the basics.
The problem is that creates.io does not put other registries on equal footing. It only lists its own registered crates so external crates are less discoverable. The entire interface reinforces the single global registry view unless one goes digging in the docs. Furthermore, crates.io will not accept any crates that have external dependencies thus penalizing further users of external crates. Thus we have a community largely built around a single global registry instead of a decentralized network of registries which would have solved all these issues.
Please re-read my earlier posts as I have explained this concept multiple times. Global uniqueness is not a necessary requirement. Package names need only to be unique within a single container. if Facebook and Google both self host their own repositories than there is no reason to require them to have globally unique names. This is as absurd a claim as requiring that only a single "John Smith" would allow to exist on the entire planet st once.
When the two John Smiths meet they can qualify where they're from to disambiguate without the need to assign them globally unique names.