Namespacing on Crates.io

I think we may be misunderstanding one another. I'm not sure what you mean by, "multiple people want to reserve different pieces of a namespace"? Let me try to make this "prefix" proposal clearer, but, first let me say, the reason I'm arguing for it all is I'd rather have something with respect to name-spacing than nothing and it seems like a solution that has the following properties:

  • Aligns reasonably well with what many people are doing with respect to grouping related packages and or owned packages together
  • Does not require major changes to the compiler or cargo, only minimal changes to how crates.io works. That makes it a way forward with little churn.
  • We get the benefits going forward of name-spacing.

On the downside:

  • It's a little odd
  • It makes "legacy" packages/crates that seemingly violate the "rules" WRT "name-spacing prefixes"

Now, to clarify how I could see this working:

  • Existing Crates (at the point where the new rules go into effect):
    • All existing crates keep their existing names, nothing has to get renamed
    • Crates with no "_" or "-" in their name are considered non-namespaced crates. The namespace for these crates can be considered "legacy". No new crates will be permitted to be created in the "legacy" namespace
    • All other crates containing "-" or "_" are considered to be multi-part name-spaced crates. Namespaces are granted to existing crate owners using the following rules at the cut-over:
      • Take the first segment of all crates. For each crate if it is the only crate with that first segment (or all other crates with that first segment have the same owner) that owner is granted that prefix and no-one else will be permitted to create crates with that first segment going forward (except for sub-namespaces, see below)
      • If the first segment is shared by crates of more than 1 owner the owner with the most crates that has that prefix is granted ownership of that namespace going forward
      • the crate(s) that didn't capture that prefix become eligible for "sub-name-space" prefix reservation (below...)
      • for every crate not already assigned to an owned prefix, repeat the above, considering the first and second segment as the prefix. Repeat with subsequent segments until all existing crates capture a "name-space prefix" (which might end up being the entire multi-segment crate name in some cases).

....OK....I'm just going to stop there....the more I try to explain this proposal's details, the less I'm liking it....

I liked the idea of it because it could be a way forward that didn't require compiler changes and cargo changes, but, it's just not sounding that palatable when you get to the details. It could work though. It wouldn't require any cargo or compiler changes, but, the ambiguity of namespace vs terminal crate name definitely becomes somewhat tiresome (mainly for existing crates).

Perhaps someone else might be able to envision it better, but, I'm starting to agree with @anon2808951 that it just has too much cognitive load.

I don't get your point. Your "golden" example of Maven already requires that both authors and consumers specify both the "package" name (new.maintained) and "lib" name (:package), which are unrelated. Shouldn't doing the same for Cargo be what you want?

No, that would not be the case. "foo-" would have been assigned to someone during the cut-over. If "foo-baz" and "foo-qux" were the only 2 crates and they were owned by the same owner, then "foo-" would have went to that user. If they were different owners, neither would get ownership of the "foo-" namespace, but, they would each take ownership of the "foo-baz-" and "foo-qux-" namespaces respectively and no-one could ever own the "foo-" name-space. If there were 3 or more "foo-" crates, as long as some owner has the most of that prefix, that owner would've been granted ownership of that name-space during the cut-over and the others would get an appropriate "sub-name-space" assigned as ownership.

Sub-name-spaces would only be auto-created like this during the cut-over. After the cut-over, the only way to carve out a sub-name-space from an existing name-space and give ownership to someone else is for the owner of the name-space to do so.

So the complaint is merely that the ad hoc implementation uses a - to separate ad hoc namespace from "specific" package name, and that - is allowed in said names? That's significantly less of an objection than I was getting from your argument.

You you're forgetting the other requirement of crates: immutability. Domains change hands.

In any case you're back to introducing full namespaces, which the ad hoc solution was avoiding due to the team's existing decision on namespacing.

Maven has a high barrier to publishing. (I’ve no idea how to despite using it for multiple years!)

Cargo doesn’t. That’s a benefit.

Assuming you require domain ownership to publish (a large, new barrier), what happens for example, if I own tehcodez.fake, publish some package that gets used widely, then accidentally let it lapse and someone else grabs it?

If you don’t require ownership, how is it any better?

Hmm…this might be mostly backwards compatible if we allowed the following:

  • if there is only one crate with that name on crates.io, then, the organization is not needed in the cargo.toml to reference it (but a warning is issued, warning can be turned off)
  • if there is more than one crate on crates.io with the name under different organizations, then, the organization is required and it would be a build error if it were missing
  • after the cut-over to the new rules, and once the updated cargo/compiler support is stabilized, then the clock would begin with say a 90 or 180 day grace period during which no-one can create packages under their org with the same name as a crate under another org. This gives everyone time to update the cargo.toml’s to eliminate the warnings.
  • after the grace period, people can begin registering crate names under their org with the same name as a crate under another org.

Questions:

  • Would everyone get a starting “org” that corresponds to their VCS (github only for now) and user-name from there?
  • Would people be able to request/register other “Org” names? What would be required? Proving ownership of a DNS domain? How?
  • Would we rate-limit “orgs” if we allowed requesting additional orgs?
  • How does this solve the problem of wanting to switch to a new, maintained version of an existing, unmaintained crate?
  • Could adding the correct “org” to cargo.toml be automated through rustfix (or something similar) where there is no ambiguity?
  • What if you want to use crates of the same name (but different purposes) from different orgs as dependencies? Does crate-rename in cargo.toml suffice?

Downsides;

  • requires changes to both crates.io and cargo (and possibly the compiler) plus ideally rustfix (or something similar)
  • allows duplicate crate names going forward that would more likely require crate-rename whereas the pre-fix option never allows duplicate crate names (the crate is still always a unique name which includes the namespace prefix)
  • requires tieing to outside “registries” to validate ownership of an “org”

Upsides:

  • pretty much what “maven” and similar does so we know it works
  • would still work well if there is eventual “federation” proposal
  • others?

I’m liking this idea more and more, but, I’m still not sold on the idea of tying it to DNS domains. Perhaps a hybrid approach?

Here is what I have in mind:

  • As of a certain date, no new top-level/legacy crates may be uploaded. ALL existing crates are considered to live in the “Legacy” namespace.
  • At the same time, anyone may request a new “Org/Namespace” that has the same naming rules as crates do (as far as allowed characters) with the following restrictions:
  1. You may not claim an “Org/Namespace” name that is identical to the name of an existing top-level crate, unless you own that crate. If you own that crate, and request the corresponding namespace, the top-level crate is moved into the name-space with a “forward” rule from the existing top-level name (as you’ve defined). So, if you own the top-level crate “serde”, then you may lay claim to that namespace prefix and “serde” would become “serde/serde”. NOTE: You do not have to do this. You just can. Existing crates owned by you that started with “serde-” would likewise move to the “serde” namespace automatically (with forwarding). Existing crates starting with “serde-” not owned by you would remain as legacy, top-level crates. Existing “serde--**-” crates not owned by you would be separate namespaces that you could not claim or make crates within. So, you would own the “serde-” namespace, but, all legacy top-level crates that started with “serde” would cut holes out of your namespace that would be potentially owned by others at some point (or would remain unused). So, if someone else owns “serde-pink-pony”, then, you would not own “serde-pink-pony” as a namespace and could not create anything beneath that (whether or not the owner of “serde-pink-pony” claims the “serde-pink-pony” namespace ever). You would though, own the “serde-pink-*” excluding “serde-pink-pony and serde-pink-pony” subsets (unless another owned “serde-pink” crate).
  2. Similarly, you may not get an “Org/Namespace” name consisting only of a left-prefix of any existing crate unless you are the sole crate with that prefix or majority (not plurality) you have a plurality of dependent crates to crates you own with that prefix that are not owned by you owner of crates with that prefix (where prefix is defined as 1 or more full segments and segment is defined as any allowable character other than “-” or “" and "-/” are the segment separators) AND claiming that prefix would not confict with rule #1. So, if you own more crates that have more dependent crates not owned by you than than all others combined that start with “foo-bar-”, then you can lay claim to the “foo-bar-” namespace. If you do, all your crates with “foo-bar-” are automatically moved into that namespace with forwarding. The same rules as #1 regarding sub-namespaces and existing top-level crates apply. So, if under these rules you could claim the “foo-bar-” namespace, but, there was an existing crate called “foo-bar-baz” then the “foo-bar-baz-” namespace would not be available to you (again whether or not the owner of the “foo-bar-baz” crate ever claims that namespace).
  3. If no-one has a majority of crates for a particular prefix and more than 1 owner has crates with that prefix, that prefix may not be claimed as a namespace (see exception later).
  4. Once a “namespace” is claimed, no one other than the owner(s) of that namespace may publish crates under that namespace. Crate names do not need to include the name of their namespace in their name, but, they can. Legacy crates moved into namespaces would maintain their full-name which would be redundant, but, new crates created under the namespace need not. There would be an ability to rename crates under your namespace with automatic forwarding (you could never use the existing name for something else though).
  5. Any namespace not covered by the rules 1 through 3 could be claimed by anyone at any time; however, there would be rate limiting on getting new namespaces, something like: No more than 1 new namespace per day and 5 per week and 15 per month and 30 per year and 100 overall - without some intervention). No namespace covered by rules 1 through 3 could ever be claimed by anyone except through those rules FOREVER.
  6. If you claim a name-space under rules 1 or 2, you may move any top-level crate that you own that wasn’t automagically moved into a name-space you claimed under rules 1 or 2 into that namespace (with automatic forwarding).
  7. You may move crates you own from one namespace you own to another namespace you own (with forwarding) at any time.
  8. Forwarded crates remain forwarded forever. Forwarding may be transitively forwarded (this can be logical or actual and would be an implementation detail on crates.io). Forwarded names may never be used to publish new crates. Namespaces may not be forwarded, but, all the crates in the namespace may be forwarded to another namespace.
  9. The owner of a name-space may permit on non-owner to publish a crate under the namespace upon request and approval.
  10. The owner of a namespace may delegate a sub-name-space to another owner. Once they do so, they lose control of that sub-name-space.
  11. If you are the owner of 1 or more crates and would like to claim a prefix that you are not eligible for under rules 1 through 3 because you are not the “Majority Owner” of that prefix, you may ask other crate owners with that prefix to cede rights to that prefix, if they ALL agree, you get the namespace, but the sub-name-space carve-outs as described in rule #2 still apply. This can be automated.
  12. If you cannot get all owners to agree under 11, you may, after some period of time, make a request at large to all the owners of all the crates on crates.io where everyone votes and if you get a super-majority (66%) of a quorum (at least 60% votes cast or a defined voting time-period elapsed, say 180 days), you get the namespace.

I believe this proposal gets all the good properties you wanted out of using DNS as ORG Name, but, doesn’t tie to external things and allows for more flexibility in namespace names AND allows existing Crate collections the option of claiming a namespace associated with that already established “brand” (think “tokio-”, “serde-”, “diesel-”, etc. Also, it is 100% backwards compatible if we allow the crate to be referenced as crate=<crate>, ns=<namepace> (for new cargo) or crate=<namespace>-<crate> for existing cargo (with crate renaming). So even old cargo could use new crates published in namespaces (besides the legacy ones that are forwarded). NOTE: Forwarding is a “server-side” thing and old cargo need not know about or understand it, but, the new cargo could know about it and update cargo.toml.

Some things to note about this:

  • New namespaces are not automatically created. Someone must request them and are granted them subject to the described rules.
  • Once namespaces are permitted, no new crates are allowed to be uploaded to the root/legacy namespace.
  • Namespaces do not have to follow the names of any existing crates, but, non-owners of crates cannot create namespaces that conflict with existing top-level crates (ever), but, existing top-level crates can continue to be published and updated by the author forever (unless they choose to move them to a namespace as described under the rules)
  • Crate names themselves can remain short (and even become shorter) because you will no longer have to do things like “serde-*” to create a bunch of related crates. You can simply create a namespace and then create related crates in that namespace (including sub-name-spaces).

So, how about it? Tear it apart? Like? Don’t Like? Problems? Down-sides?

This could be simpler:

  • - is not allowed in the prefix, so you can have foo-*, but not foo-bar-* as a prefix. This eliminates complications from overlap.
  • Unprefixed crates can be registered indefinitely. This eliminates complications from switchover and legacy status.
  • Prefixes are given on first-come-first-served basis, with no connection to existing crates. This eliminates need to design algorithms for giving out prefixes based on legacy crates.

I'll once again throw out the idea of using GitHub organizations:

  • outsources name registry to another system
  • outsources antispam to another system
  • outsources organization membership to the system which is already the IdP. In for a penny, in for a pound

Domain names buy you the first two, but given the existing GitHub IdP integration, the third issue is the big one for me. Using anything but GitHub for this purpose will involve building a complicated organization membership management system into crates.io. GitHub nicely avoids that.

Do you mean you can add new top-level crates with new top-level names indefinitely? I personally think it would be better to end that once namespaces are allowed; otherwise, there is still a huge incentive to squat top-level names.

How so? Once you start having name-spaced crates, you would have to have some way for existing cargo to use them or you'd have a backwards compatibility issue. So, whether or not you continue to allow top-level crate registration, you'd still have issues with crates created in namespaces being backwards compatible with existing cargo (unless I'm misunderstanding what you are proposing).

Personally, I would prefer that existing popular crate collections don't get a namespace granted to a non-owner that uses that "name". It seems like "brand/reputation" here can be an important thing to preserve.

If by name, you mean user name, we'd still have that, no?

Does it really? If we rate-limit new namespaces, then the only "spamming" possible is uploading crates that are "spam" to a namespace you own. Which would be the same as under what you propose.

I guess I just don't see that as a positive thing. Why make crates.io dependent on other commercial services to that degree? I'm having trouble understanding why that is a good thing long-term?

1 Like

Note that GitHub allows renaming of orgs/usernames. To make things worse, they make the old name free for taking.

2 Likes

To me it doesn't matter if the top-level name refers to a crate or a namespace — it's still a thing to squat. If foo is precious, so is foo-* (or foo/).

The value of namespaces for projects (i.e. ability to publish project-foo without worrying someone may grab it first) remains the same regardless whether non-namespaced names are allowed.

So I'm proposing to add namespaces as an optional feature, not as a mandatory thing.

Indeed. There are only handful of such collections (rust, serde, tokio?), so these could be granted manually.

Yes, but, it is easier to rate-limit new top-level name-spaces without inconveniencing legitimate users than it is to rate-limit top-level crate names. That’s where I think the win comes from (partially).

Registration of prefixes is a different operation than publishing of a crate, so they could still have different rate limits.

1 Like

Organization name. Re GitHub antispam:

Yes, beyond mere rate limiting, GitHub actively monitors for all sorts of suspicious behavior including spam, and will flag/lock accounts which appear to be doing spammy behaviors. While crates.io could probably benefit from a system (or failing that, exponential backoff rate limiting), GitHub gives it to you for free today, along with many other features.

GitHub already provides all of the functionality for organization membership management, which is ultimately an access control function. From a security perspective, outsourcing this functionality to a "tried and tested" system is much less risky than trying to greenfield it all.

The set of members of any organization is sourced as OAuth users from GitHub, so any system that manages organization membership for crates.io is ultimately building on top of GitHub's user model anyway.

I understand and sympathize with concerns about centralized systems, but crates.io has already gone down that road, and that's unlikely to change any time soon.

Use of GitHub login and ability to give crate ownership to a GitHub Org already gives crates.io this spam protection and user management.

But GitHub is kept one level of abstraction away from crates and their dependencies, so crates-io can add support for other account types, and even migrate off GitHub if that turned out to be necessary.

I think it's possible to retain this property for organizations sourced from GitHub as well

1 Like

I'd consider that a deal-breaker with respect to immutability of crates/name-spaces.

Renaming a GitHub organization does not necessarily require a corresponding change on the crates.io side.