Namespacing on Crates.io


#61

You still haven’t explained how you would address conflicts in that scheme, were multiple people want to reserve different pieces of a namespace. I think this would be a fundamental requirement to go any further with this.

rename-dependency is fine for backward compatibility, where people want to provide a crate as a drop-in replacement. But going forward, I don’t want to live in an ecosystem where rename-dependency changes from being an exceptional measure to solve a specific issue to something that everyone starts using to get rid of meaningless parts of crate names. I don’t want to imagine the inconsistencies when everyone renames stuff, but picks different names.

This would not work under the proposal. You would need to go through every source file and replace old-unmaintained-package with new-maintained-package. (I already explained above why rename-dependency is only a band-aid.)

It might be better than we had before, but is fairs very poorly when comparing against better approaches that have been used successfully over the last decade.

Eh, I’m a bit baffled by this dismissive sarcasm. I provided the necessary facts. If you disagree with them, show your facts against them.

I provided an itemized list above.

Eh what? I provided the example of an actually existing ecosystem which employs a better approach for the last decade. It is clearly better, because the hyphen proposal lacks any answer to the deficiencies that have been pointed out, compared to established and proven designs.

Now I’m a bit confused. Did you perhaps mix up your responses and this was meant as a reply to someone else?

Just to get in sync again:

  • I have argued for namespaces for a long time.
  • I have suggested an established approach that has a track record of working, and until now nobody has come up with any argument against it, or pointed out any deficiency that prevent its use in Rust.
  • I have pointed out problems with other proposals.

I agree with this, and this is exactly what I have done.


#62

My apologies. It seems I misunderstood your position.


#63

It is related. If you don’t provide a lib.name, it is automatically derived from package.name.

The hyphen proposal would mean that this attribute would change from something optional, that is rarely used, to an attribute that is almost mandatory (as in “most users would expect it to be there, because they don’t want to deal with a crate name in their source code where half of it is meaningless”).

I think this actually a good argument against going further down this path. I would rather not care about inconsistencies between lib names and crate names.

So now we have three different approaches to choose from?

  • No renames, user has to change all source files.
  • Users of a crates use rename-dependency.
  • Developers of a crate override lib.name

I think it would be better to have one approach that just works in 99% of the cases, instead of forcing authors/users to add additional configuration to avoid the “bad defaults” the hyphen scheme introduces.

So if the foo-* is still unregistered, you can basically boot the authors of foo-baz and foo-qux off their crates? What happens if the author of foo-baz has already registered foo-baz-*, can someone still register foo-*?

Also, how would users even know where the namespaces ends, and the crate name starts?


#64

I think we may be misunderstanding one another. I’m not sure what you mean by, “multiple people want to reserve different pieces of a namespace”? Let me try to make this “prefix” proposal clearer, but, first let me say, the reason I’m arguing for it all is I’d rather have something with respect to name-spacing than nothing and it seems like a solution that has the following properties:

  • Aligns reasonably well with what many people are doing with respect to grouping related packages and or owned packages together
  • Does not require major changes to the compiler or cargo, only minimal changes to how crates.io works. That makes it a way forward with little churn.
  • We get the benefits going forward of name-spacing.

On the downside:

  • It’s a little odd
  • It makes “legacy” packages/crates that seemingly violate the “rules” WRT “name-spacing prefixes”

Now, to clarify how I could see this working:

  • Existing Crates (at the point where the new rules go into effect):
    • All existing crates keep their existing names, nothing has to get renamed
    • Crates with no “_” or “-” in their name are considered non-namespaced crates. The namespace for these crates can be considered “legacy”. No new crates will be permitted to be created in the “legacy” namespace
    • All other crates containing “-” or “_” are considered to be multi-part name-spaced crates. Namespaces are granted to existing crate owners using the following rules at the cut-over:
      • Take the first segment of all crates. For each crate if it is the only crate with that first segment (or all other crates with that first segment have the same owner) that owner is granted that prefix and no-one else will be permitted to create crates with that first segment going forward (except for sub-namespaces, see below)
      • If the first segment is shared by crates of more than 1 owner the owner with the most crates that has that prefix is granted ownership of that namespace going forward
      • the crate(s) that didn’t capture that prefix become eligible for “sub-name-space” prefix reservation (below…)
      • for every crate not already assigned to an owned prefix, repeat the above, considering the first and second segment as the prefix. Repeat with subsequent segments until all existing crates capture a “name-space prefix” (which might end up being the entire multi-segment crate name in some cases).

…OK…I’m just going to stop there…the more I try to explain this proposal’s details, the less I’m liking it…

I liked the idea of it because it could be a way forward that didn’t require compiler changes and cargo changes, but, it’s just not sounding that palatable when you get to the details. It could work though. It wouldn’t require any cargo or compiler changes, but, the ambiguity of namespace vs terminal crate name definitely becomes somewhat tiresome (mainly for existing crates).

Perhaps someone else might be able to envision it better, but, I’m starting to agree with @soc that it just has too much cognitive load.


#65

I don’t get your point. Your “golden” example of Maven already requires that both authors and consumers specify both the “package” name (new.maintained) and “lib” name (:package), which are unrelated. Shouldn’t doing the same for Cargo be what you want?


#66

No, that would not be the case. “foo-” would have been assigned to someone during the cut-over. If “foo-baz” and “foo-qux” were the only 2 crates and they were owned by the same owner, then “foo-” would have went to that user. If they were different owners, neither would get ownership of the “foo-” namespace, but, they would each take ownership of the “foo-baz-" and "foo-qux-” namespaces respectively and no-one could ever own the “foo-" name-space. If there were 3 or more "foo-” crates, as long as some owner has the most of that prefix, that owner would’ve been granted ownership of that name-space during the cut-over and the others would get an appropriate “sub-name-space” assigned as ownership.

Sub-name-spaces would only be auto-created like this during the cut-over. After the cut-over, the only way to carve out a sub-name-space from an existing name-space and give ownership to someone else is for the owner of the name-space to do so.


#67

In Maven the publishing organization is clearly separated from the package name, and each of them are only defined once, and they are used for different, clearly defined use cases.

With the hyphen scheme you would have the crate name in both the lib.name and the package.name, which is confusing and inconsistent (because some authors wouldn’t bother with defining a separate lib.name).

So whenever someone wants to provide a drop-in replacement, that person would have to define a lib.name which would include the whole package.name of the old crate.

Bearable for backward compat, but honestly if I was a new user, I wouldn’t want to use a crate that defaulted its name to some obscure unmaintained package I don’t even know.

All I’m saying is:

  • let’s add an organization key to [package]
  • require that the organization key is a domain name, which means
    • they are guaranteed to be unique
    • the Crates.io maintainers don’t have to deal with conflicts

Migration:

  • start requiring the organization key
  • the first time a crate is published with such a key (and the user owns the name in the legacy global namespace)
    • cross-publish the crate to both locations, with the legacy location forwarding to the new one
    • tools like cargo will upgrade to Cargo.lock file to the new location on the next update
  • maybe introduce a “scratch” namespace so that people can still publish their experiments without requiring a domain, allowing the current no-setup use of cargo publish

#68

So the complaint is merely that the ad hoc implementation uses a - to separate ad hoc namespace from “specific” package name, and that - is allowed in said names? That’s significantly less of an objection than I was getting from your argument.

You you’re forgetting the other requirement of crates: immutability. Domains change hands.

In any case you’re back to introducing full namespaces, which the ad hoc solution was avoiding due to the team’s existing decision on namespacing.


#69

Domains changing hands doesn’t change anything regarding immutability. This has been the case since forever in Maven, and it has worked exceptionally well.

Which makes sense because the ad-hoc approach has tons of issues, as pointed out earlier. gbutler worked through the some of the questions regarding what is considered a namespace, how they could be reserved etc., and figured out tons of gnarly issues which have no good answer.


#70

Maven has a high barrier to publishing. (I’ve no idea how to despite using it for multiple years!)

Cargo doesn’t. That’s a benefit.

Assuming you require domain ownership to publish (a large, new barrier), what happens for example, if I own tehcodez.fake, publish some package that gets used widely, then accidentally let it lapse and someone else grabs it?

If you don’t require ownership, how is it any better?


#71

Hmm…this might be mostly backwards compatible if we allowed the following:

  • if there is only one crate with that name on crates.io, then, the organization is not needed in the cargo.toml to reference it (but a warning is issued, warning can be turned off)
  • if there is more than one crate on crates.io with the name under different organizations, then, the organization is required and it would be a build error if it were missing
  • after the cut-over to the new rules, and once the updated cargo/compiler support is stabilized, then the clock would begin with say a 90 or 180 day grace period during which no-one can create packages under their org with the same name as a crate under another org. This gives everyone time to update the cargo.toml’s to eliminate the warnings.
  • after the grace period, people can begin registering crate names under their org with the same name as a crate under another org.

Questions:

  • Would everyone get a starting “org” that corresponds to their VCS (github only for now) and user-name from there?
  • Would people be able to request/register other “Org” names? What would be required? Proving ownership of a DNS domain? How?
  • Would we rate-limit “orgs” if we allowed requesting additional orgs?
  • How does this solve the problem of wanting to switch to a new, maintained version of an existing, unmaintained crate?
  • Could adding the correct “org” to cargo.toml be automated through rustfix (or something similar) where there is no ambiguity?
  • What if you want to use crates of the same name (but different purposes) from different orgs as dependencies? Does crate-rename in cargo.toml suffice?

Downsides;

  • requires changes to both crates.io and cargo (and possibly the compiler) plus ideally rustfix (or something similar)
  • allows duplicate crate names going forward that would more likely require crate-rename whereas the pre-fix option never allows duplicate crate names (the crate is still always a unique name which includes the namespace prefix)
  • requires tieing to outside “registries” to validate ownership of an “org”

Upsides:

  • pretty much what “maven” and similar does so we know it works
  • would still work well if there is eventual “federation” proposal
  • others?

#72

You bring up a lot of good points!

I’d would go for an approach which is more driven by crate authors and avoids putting any new requirements on users:

  • the first time a namespaced crate is published
    • crates.io checks whether the author also owns the name in the legacy global namespace
    • if true, crates.io cross-publishes the crate to both locations, with the legacy location forwarding to the new one
  • afterwards, when a user of a global crate runs cargo update, cargo will upgrade the Cargo.lock file to the new namespaced crate, using the forwarders automatically generated by the crate author

This means that other authors can publish the same crate name under their organization without any restriction, because no forwarding would happen in that case.

I would require that a file with the random string is placed at a well-known location on the requested domain, similar to how cargo login works already:

Instead of appending the crates.io-generated string to cargo login, you would just place it on the domain you want to use, and crates.io would verify it. As long as the string stays there, you own the domain.

Maven uses some notification on their website like this: https://mvnrepository.com/artifact/commons-lang/commons-lang

I imagine that more automated solutions could be found too, but I believe this would often be a manual process, especially if the original author cannot be contacted anymore.

As outlined above, this would absolutely be possible and could be automated.

With the forwarding approach I described, there would be no ambiguities at all, because it would always be clear which namespaced crate is the replacement for the global crate.

Yes, I think this would be a valid usecase for crate-rename.

I think this approach would be better than the hyphen approach. Some people even suggested that crate authors should always redefine their lib.name to strip out the namespace with the hyphen approach.

So with hyphens, you would have renames everywhere and by default; with domain namespaces you would only need renames in the unlikely event of an actual clash.


#73

I’m liking this idea more and more, but, I’m still not sold on the idea of tying it to DNS domains. Perhaps a hybrid approach?

Here is what I have in mind:

  • As of a certain date, no new top-level/legacy crates may be uploaded. ALL existing crates are considered to live in the “Legacy” namespace.
  • At the same time, anyone may request a new “Org/Namespace” that has the same naming rules as crates do (as far as allowed characters) with the following restrictions:
  1. You may not claim an “Org/Namespace” name that is identical to the name of an existing top-level crate, unless you own that crate. If you own that crate, and request the corresponding namespace, the top-level crate is moved into the name-space with a “forward” rule from the existing top-level name (as you’ve defined). So, if you own the top-level crate “serde”, then you may lay claim to that namespace prefix and “serde” would become “serde/serde”. NOTE: You do not have to do this. You just can. Existing crates owned by you that started with “serde-” would likewise move to the “serde” namespace automatically (with forwarding). Existing crates starting with “serde-” not owned by you would remain as legacy, top-level crates. Existing “serde--**-” crates not owned by you would be separate namespaces that you could not claim or make crates within. So, you would own the “serde-” namespace, but, all legacy top-level crates that started with “serde” would cut holes out of your namespace that would be potentially owned by others at some point (or would remain unused). So, if someone else owns “serde-pink-pony”, then, you would not own “serde-pink-pony” as a namespace and could not create anything beneath that (whether or not the owner of “serde-pink-pony” claims the “serde-pink-pony” namespace ever). You would though, own the “serde-pink-*” excluding “serde-pink-pony and serde-pink-pony” subsets (unless another owned “serde-pink” crate).
  2. Similarly, you may not get an “Org/Namespace” name consisting only of a left-prefix of any existing crate unless you are the sole crate with that prefix or majority (not plurality) you have a plurality of dependent crates to crates you own with that prefix that are not owned by you owner of crates with that prefix (where prefix is defined as 1 or more full segments and segment is defined as any allowable character other than “-” or “" and "-/” are the segment separators) AND claiming that prefix would not confict with rule #1. So, if you own more crates that have more dependent crates not owned by you than than all others combined that start with “foo-bar-”, then you can lay claim to the “foo-bar-” namespace. If you do, all your crates with “foo-bar-” are automatically moved into that namespace with forwarding. The same rules as #1 regarding sub-namespaces and existing top-level crates apply. So, if under these rules you could claim the “foo-bar-” namespace, but, there was an existing crate called “foo-bar-baz” then the “foo-bar-baz-” namespace would not be available to you (again whether or not the owner of the “foo-bar-baz” crate ever claims that namespace).
  3. If no-one has a majority of crates for a particular prefix and more than 1 owner has crates with that prefix, that prefix may not be claimed as a namespace (see exception later).
  4. Once a “namespace” is claimed, no one other than the owner(s) of that namespace may publish crates under that namespace. Crate names do not need to include the name of their namespace in their name, but, they can. Legacy crates moved into namespaces would maintain their full-name which would be redundant, but, new crates created under the namespace need not. There would be an ability to rename crates under your namespace with automatic forwarding (you could never use the existing name for something else though).
  5. Any namespace not covered by the rules 1 through 3 could be claimed by anyone at any time; however, there would be rate limiting on getting new namespaces, something like: No more than 1 new namespace per day and 5 per week and 15 per month and 30 per year and 100 overall - without some intervention). No namespace covered by rules 1 through 3 could ever be claimed by anyone except through those rules FOREVER.
  6. If you claim a name-space under rules 1 or 2, you may move any top-level crate that you own that wasn’t automagically moved into a name-space you claimed under rules 1 or 2 into that namespace (with automatic forwarding).
  7. You may move crates you own from one namespace you own to another namespace you own (with forwarding) at any time.
  8. Forwarded crates remain forwarded forever. Forwarding may be transitively forwarded (this can be logical or actual and would be an implementation detail on crates.io). Forwarded names may never be used to publish new crates. Namespaces may not be forwarded, but, all the crates in the namespace may be forwarded to another namespace.
  9. The owner of a name-space may permit on non-owner to publish a crate under the namespace upon request and approval.
  10. The owner of a namespace may delegate a sub-name-space to another owner. Once they do so, they lose control of that sub-name-space.
  11. If you are the owner of 1 or more crates and would like to claim a prefix that you are not eligible for under rules 1 through 3 because you are not the “Majority Owner” of that prefix, you may ask other crate owners with that prefix to cede rights to that prefix, if they ALL agree, you get the namespace, but the sub-name-space carve-outs as described in rule #2 still apply. This can be automated.
  12. If you cannot get all owners to agree under 11, you may, after some period of time, make a request at large to all the owners of all the crates on crates.io where everyone votes and if you get a super-majority (66%) of a quorum (at least 60% votes cast or a defined voting time-period elapsed, say 180 days), you get the namespace.

I believe this proposal gets all the good properties you wanted out of using DNS as ORG Name, but, doesn’t tie to external things and allows for more flexibility in namespace names AND allows existing Crate collections the option of claiming a namespace associated with that already established “brand” (think “tokio-”, “serde-”, “diesel-”, etc. Also, it is 100% backwards compatible if we allow the crate to be referenced as crate=<crate>, ns=<namepace> (for new cargo) or crate=<namespace>-<crate> for existing cargo (with crate renaming). So even old cargo could use new crates published in namespaces (besides the legacy ones that are forwarded). NOTE: Forwarding is a “server-side” thing and old cargo need not know about or understand it, but, the new cargo could know about it and update cargo.toml.

Some things to note about this:

  • New namespaces are not automatically created. Someone must request them and are granted them subject to the described rules.
  • Once namespaces are permitted, no new crates are allowed to be uploaded to the root/legacy namespace.
  • Namespaces do not have to follow the names of any existing crates, but, non-owners of crates cannot create namespaces that conflict with existing top-level crates (ever), but, existing top-level crates can continue to be published and updated by the author forever (unless they choose to move them to a namespace as described under the rules)
  • Crate names themselves can remain short (and even become shorter) because you will no longer have to do things like “serde-*” to create a bunch of related crates. You can simply create a namespace and then create related crates in that namespace (including sub-name-spaces).

So, how about it? Tear it apart? Like? Don’t Like? Problems? Down-sides?


#74

This could be simpler:

  • - is not allowed in the prefix, so you can have foo-*, but not foo-bar-* as a prefix. This eliminates complications from overlap.
  • Unprefixed crates can be registered indefinitely. This eliminates complications from switchover and legacy status.
  • Prefixes are given on first-come-first-served basis, with no connection to existing crates. This eliminates need to design algorithms for giving out prefixes based on legacy crates.

#75

I’ll once again throw out the idea of using GitHub organizations:

  • outsources name registry to another system
  • outsources antispam to another system
  • outsources organization membership to the system which is already the IdP. In for a penny, in for a pound

Domain names buy you the first two, but given the existing GitHub IdP integration, the third issue is the big one for me. Using anything but GitHub for this purpose will involve building a complicated organization membership management system into crates.io. GitHub nicely avoids that.


#76

Do you mean you can add new top-level crates with new top-level names indefinitely? I personally think it would be better to end that once namespaces are allowed; otherwise, there is still a huge incentive to squat top-level names.

How so? Once you start having name-spaced crates, you would have to have some way for existing cargo to use them or you’d have a backwards compatibility issue. So, whether or not you continue to allow top-level crate registration, you’d still have issues with crates created in namespaces being backwards compatible with existing cargo (unless I’m misunderstanding what you are proposing).

Personally, I would prefer that existing popular crate collections don’t get a namespace granted to a non-owner that uses that “name”. It seems like “brand/reputation” here can be an important thing to preserve.


#77

If by name, you mean user name, we’d still have that, no?

Does it really? If we rate-limit new namespaces, then the only “spamming” possible is uploading crates that are “spam” to a namespace you own. Which would be the same as under what you propose.

I guess I just don’t see that as a positive thing. Why make crates.io dependent on other commercial services to that degree? I’m having trouble understanding why that is a good thing long-term?


#78

Note that GitHub allows renaming of orgs/usernames. To make things worse, they make the old name free for taking.


#79

To me it doesn’t matter if the top-level name refers to a crate or a namespace — it’s still a thing to squat. If foo is precious, so is foo-* (or foo/).

The value of namespaces for projects (i.e. ability to publish project-foo without worrying someone may grab it first) remains the same regardless whether non-namespaced names are allowed.

So I’m proposing to add namespaces as an optional feature, not as a mandatory thing.

Indeed. There are only handful of such collections (rust, serde, tokio?), so these could be granted manually.


#80

Yes, but, it is easier to rate-limit new top-level name-spaces without inconveniencing legitimate users than it is to rate-limit top-level crate names. That’s where I think the win comes from (partially).