Namespacing on Crates.io

I still believe that it’s important to discuss all this. It has certainly been enlightening for me and I have found it productive, even if it doesn’t manifest in a real change.

The fork in the road here is that on an individual level, people will stop using crates.io and just hardcode in the Git repository URL into Cargo.toml. Invariably, forks will happen and crates will diverge, and this will end up in not working in crates.io, and people will go elsewhere.

There are a few avenues for the future:

  • change nothing: we could try to automate detection of crate squatting but still have to have manual user intervention to address violations
  • make a hard action to stop accepting “root-level” crates and sandbox under some scheme: this would stop squatting from continuing, if namespacing is done right. we could solve the existing squatted crates and not have this be a perpetual problem.

The discussion thus largely groups into one of these categories. Within the latter category, it’s a discussion on how to do namespacing, like based on VCS, based on some prefix, based on an org/user name in GH, based on a org/user name that crates keeps track of, etc.

I have found the discussion useful. I’m grateful to everyone who has participated so far.

Definitely. Nobody is trying to shut the discussion down. I’m just asking that folks try to keep the discussion above “I disagree on the conclusion that has been drawn in the past”, and try to bring new information to the table. There’s been a lot of discussion around this lately, and the team does try to keep on top of it.

Probably worth re-iterating that any actual change would need to be the result of an RFC. Also keep in mind that such an RFC would not only need to describe why this is a change we want, but also why it is important to prioritize this right now

This is worse in multiple ways than using the domain:

  • It still doesn’t address squatting, as you correctly point out. It just kicks the can a few meters down the road.
  • It more or less doubles the length of crate names people have to write in their source code, and the addition does not provide any benefit whatsoever. And no, rename-dependency is not a solution here.
  • There will be tons of annoying issues; like some person having published a crate foo-bar and some other person having published foo … which of them gets to claim the namespace foo and therefore gets to lock the other person out of their crate?
  • It makes it impossible to easily upgrade a crate from some organization (that doesn’t maintain their crate anymore) to some fork that is being maintained, without having to touch every source file. One of the strengths of Maven’s approach has exactly been the ability to change old.unmaintained:package to new.maintained:package in the dependency file and move on without interruption.
  • It still burdens crates.io maintainers with having to make decisions in conflicts, an issue that just doesn’t exist with using domains.

On a more personal note, I consider crates with hyphens extremely ugly: the tiny amount of mental overhead having to translate from the crate name foo-bar to the name being used in source code (foo_bar) is a reason to avoid the crate. My line of thought is “If crate authors are fine with that tiny inconvenience, what other “tiny inconvenience” do they have in store for me further down the line?”.

I’m all for reinventing things if they are better than what we had before, but please let’s not reinvent things that are clearly worse than approaches that have worked perfectly well for more than a decade.

1 Like

FWIW, I thought this was a pretty clever minimalistic approach which solves a lot of problems without making a lot of changes. Also call me crazy but I like hyphens.

2 Likes

It actually does address squatting. It reduces the incentive. It makes it harder to glom onto the good-will and “Brand” of other crates and makes reserving top-level “words” less desirable. It’s easier to limit the number of “Prefixes” someone can have without requiring justification because they can create any number of crates once they have only 1 prefix. But, all their crates will be associated with their prefix(es) and not able to inappropriately associate with anothers’ “brands”. Reducing the incentive to “Squat” is what this does and so does in fact address squatting in a most direct and efficient fashion.

Not necessarily, but, even so, “So what”? And why isn’t “rename-dependency” (or a similar mechanism) useful here? Many, many, other languages do something similar.

Update your maven.pom. Update your cargo.toml. What’s the difference? I’m not seeing an issue that you’re seeing.

No, it really doesn’t. You ask for a prefix, if it’s available you get it. Once you hit the “limit” of how man prefixes you can reserve, you can’t get any more without paying money or providing justification. You can create as many crates as you want without worrying about “conflicts” within your prefix(es). Almost zero burden on maintainers. Probably less than today.

Subjective criteria like this aren’t very useful. Whether or not it is “ugly” is largely irrelevant.

Good. It seems a number of people believe this would be better than what we have now with little “churn” required and almost 100% backwards compatibility. So, you agree, we should do it then?

Oh. You’ve declared it “clearly” worse so, never mind. I think if you had arguments that were more than just opinions about not liking it as opposed to specific issues it does not address your argument be more illuminating. Saying something is “Clearly” anything, without providing solid arguments is not useful in determining the merits of a proposal.

All of that being said, I don’t think this proposal is “Clearly” the best we can do and that their aren’t better ideas possibly, but, it does seem like a reasonable idea that maintains backwards compatibility and requires no compiler or cargo changes and very little changes to crates.io.

I think approaching this thread with ideas on how name-spacing might be made to work is the most useful thing for the discussion. Once ideas have been vetted and there is agreement about the “best” ideas, preparing an RFC, then pushing through the RFC process.

If you are opposed to the idea of names-pacing, no matter the form, then repeating that in this discussion isn’t really useful. If you want to oppose the RFC, if there is ever one, by all means, I encourage you to do so, that’s what the process is for. But, shooting down every possible solution in this thread because you are opposed to the idea seems premature.

That’s just my 2 cents I guess (or maybe that was for like a buck-fifty :slight_smile:).

I would hope that we could use this thread to ferret out possible “Namespacing Solutions” and leave the discussion of whether or not name-spacing should be had to the RFC approval process or another thread.

2 Likes

This is already unrelated. lib.name is already unrelated to package.name. The package error-chain provides the crate error_chain, the package lazy_static provides the crate lazy_static, the package pistoncore-glutin_window provided the crate glutin_window.

That last one is actually the perfect illustration of how this “ad hoc” namespacing would work (and does, currently, for the ad hoc users!).

This is not a problem, because you specify the package name in Cargo.toml, which is decoupled from the crate name. If the package old.unmaintained provides the crate package, and you want to replace it with package new.maintained's version of the crate package, you just s/old.unmaintained/new.maintained in your Cargo.toml and it just works the same as the substitution in maven.pom.

This is not anything to do with namespacing. This is the decoupling of package and library names.

EDIT:

This has nothing to do with how you specify the package. This has everything to do with the library name. These two things are not the same. Please, avoiding conflating the two will help the discussion. Changing the package provider of a library crate is the exact same, no matter how you specify the package.

One thing that I think may not be discussed enough is how people will actually work around squatting. Honestly, if I have a Crate name that I think is appropriate that’s already taken, I’ll just use it.

When I wrote this CLI utility in Python, I found that it collided with something already on PyPI. For a number of reasons, I stuck with that name anyway and instructed users to just use the full Git URL. When that inevitably happens for a crate name, I’ll likely do the same thing.

As an aside, I’m not comparing PyPI to Crates.io, I’ve never had a problem downloading something from Crates.io, and I can’t say the same about PyPI. They are, however, both similar in not namespacing.

I think I might be in the minority here, but I really like full namespaces with a VCS URL. There isn’t a case in which a collision or squatting would be Crates’ fault or responsibility to deal with. If someone squats microsoft or google on GitLab or another VCS provider, that’s not Crates’ problem.

It also makes it really easy to substitute a fork of a library. The crate name remains the last segment of the URL past the final slash. If I want to substitute github.com/rust-lang/rand with github.com/naftulikay/rand, there are no changes I need to make in my source code.

Anyway, I’m kind of done shaving this yak :smile:

I think you’re conflating or mixing up “crate name” and “package name” a little. I think that not keeping this two things distinct in the discussion clouds the issues somewhat.

Actually, I’d be happy with either proposal, but, the proposal with “prefixes” has the nice feature that it doesn’t require new cargo handling, only updates to crates.io. The other main proposal that you advocate, using VCS/User/ as prefix, is entirely good, but, I think it requires more churn in the ecosystem to implement.

Perhaps the prefixes can merely be distinguished for crates that haven’t been given the blessing of the prefix owner. Surely the tic-tac-toe crate owner doesn’t mind that crates.io points out that their crate is not part of the tic-* family of crates?

1 Like

You still haven’t explained how you would address conflicts in that scheme, were multiple people want to reserve different pieces of a namespace. I think this would be a fundamental requirement to go any further with this.

rename-dependency is fine for backward compatibility, where people want to provide a crate as a drop-in replacement. But going forward, I don’t want to live in an ecosystem where rename-dependency changes from being an exceptional measure to solve a specific issue to something that everyone starts using to get rid of meaningless parts of crate names. I don’t want to imagine the inconsistencies when everyone renames stuff, but picks different names.

This would not work under the proposal. You would need to go through every source file and replace old-unmaintained-package with new-maintained-package. (I already explained above why rename-dependency is only a band-aid.)

It might be better than we had before, but is fairs very poorly when comparing against better approaches that have been used successfully over the last decade.

Eh, I’m a bit baffled by this dismissive sarcasm. I provided the necessary facts. If you disagree with them, show your facts against them.

I provided an itemized list above.

Eh what? I provided the example of an actually existing ecosystem which employs a better approach for the last decade. It is clearly better, because the hyphen proposal lacks any answer to the deficiencies that have been pointed out, compared to established and proven designs.

Now I’m a bit confused. Did you perhaps mix up your responses and this was meant as a reply to someone else?

Just to get in sync again:

  • I have argued for namespaces for a long time.
  • I have suggested an established approach that has a track record of working, and until now nobody has come up with any argument against it, or pointed out any deficiency that prevent its use in Rust.
  • I have pointed out problems with other proposals.

I agree with this, and this is exactly what I have done.

My apologies. It seems I misunderstood your position.

It is related. If you don’t provide a lib.name, it is automatically derived from package.name.

The hyphen proposal would mean that this attribute would change from something optional, that is rarely used, to an attribute that is almost mandatory (as in “most users would expect it to be there, because they don’t want to deal with a crate name in their source code where half of it is meaningless”).

I think this actually a good argument against going further down this path. I would rather not care about inconsistencies between lib names and crate names.

So now we have three different approaches to choose from?

  • No renames, user has to change all source files.
  • Users of a crates use rename-dependency.
  • Developers of a crate override lib.name

I think it would be better to have one approach that just works in 99% of the cases, instead of forcing authors/users to add additional configuration to avoid the “bad defaults” the hyphen scheme introduces.

So if the foo-* is still unregistered, you can basically boot the authors of foo-baz and foo-qux off their crates? What happens if the author of foo-baz has already registered foo-baz-*, can someone still register foo-*?

Also, how would users even know where the namespaces ends, and the crate name starts?

I think we may be misunderstanding one another. I’m not sure what you mean by, “multiple people want to reserve different pieces of a namespace”? Let me try to make this “prefix” proposal clearer, but, first let me say, the reason I’m arguing for it all is I’d rather have something with respect to name-spacing than nothing and it seems like a solution that has the following properties:

  • Aligns reasonably well with what many people are doing with respect to grouping related packages and or owned packages together
  • Does not require major changes to the compiler or cargo, only minimal changes to how crates.io works. That makes it a way forward with little churn.
  • We get the benefits going forward of name-spacing.

On the downside:

  • It’s a little odd
  • It makes “legacy” packages/crates that seemingly violate the “rules” WRT “name-spacing prefixes”

Now, to clarify how I could see this working:

  • Existing Crates (at the point where the new rules go into effect):
    • All existing crates keep their existing names, nothing has to get renamed
    • Crates with no “_” or “-” in their name are considered non-namespaced crates. The namespace for these crates can be considered “legacy”. No new crates will be permitted to be created in the “legacy” namespace
    • All other crates containing “-” or “_” are considered to be multi-part name-spaced crates. Namespaces are granted to existing crate owners using the following rules at the cut-over:
      • Take the first segment of all crates. For each crate if it is the only crate with that first segment (or all other crates with that first segment have the same owner) that owner is granted that prefix and no-one else will be permitted to create crates with that first segment going forward (except for sub-namespaces, see below)
      • If the first segment is shared by crates of more than 1 owner the owner with the most crates that has that prefix is granted ownership of that namespace going forward
      • the crate(s) that didn’t capture that prefix become eligible for “sub-name-space” prefix reservation (below…)
      • for every crate not already assigned to an owned prefix, repeat the above, considering the first and second segment as the prefix. Repeat with subsequent segments until all existing crates capture a “name-space prefix” (which might end up being the entire multi-segment crate name in some cases).

…OK…I’m just going to stop there…the more I try to explain this proposal’s details, the less I’m liking it…

I liked the idea of it because it could be a way forward that didn’t require compiler changes and cargo changes, but, it’s just not sounding that palatable when you get to the details. It could work though. It wouldn’t require any cargo or compiler changes, but, the ambiguity of namespace vs terminal crate name definitely becomes somewhat tiresome (mainly for existing crates).

Perhaps someone else might be able to envision it better, but, I’m starting to agree with @soc that it just has too much cognitive load.

I don’t get your point. Your “golden” example of Maven already requires that both authors and consumers specify both the “package” name (new.maintained) and “lib” name (:package), which are unrelated. Shouldn’t doing the same for Cargo be what you want?

No, that would not be the case. “foo-” would have been assigned to someone during the cut-over. If “foo-baz” and “foo-qux” were the only 2 crates and they were owned by the same owner, then “foo-” would have went to that user. If they were different owners, neither would get ownership of the “foo-” namespace, but, they would each take ownership of the “foo-baz-" and "foo-qux-” namespaces respectively and no-one could ever own the “foo-" name-space. If there were 3 or more "foo-” crates, as long as some owner has the most of that prefix, that owner would’ve been granted ownership of that name-space during the cut-over and the others would get an appropriate “sub-name-space” assigned as ownership.

Sub-name-spaces would only be auto-created like this during the cut-over. After the cut-over, the only way to carve out a sub-name-space from an existing name-space and give ownership to someone else is for the owner of the name-space to do so.

In Maven the publishing organization is clearly separated from the package name, and each of them are only defined once, and they are used for different, clearly defined use cases.

With the hyphen scheme you would have the crate name in both the lib.name and the package.name, which is confusing and inconsistent (because some authors wouldn’t bother with defining a separate lib.name).

So whenever someone wants to provide a drop-in replacement, that person would have to define a lib.name which would include the whole package.name of the old crate.

Bearable for backward compat, but honestly if I was a new user, I wouldn’t want to use a crate that defaulted its name to some obscure unmaintained package I don’t even know.

All I’m saying is:

  • let’s add an organization key to [package]
  • require that the organization key is a domain name, which means
    • they are guaranteed to be unique
    • the Crates.io maintainers don’t have to deal with conflicts

Migration:

  • start requiring the organization key
  • the first time a crate is published with such a key (and the user owns the name in the legacy global namespace)
    • cross-publish the crate to both locations, with the legacy location forwarding to the new one
    • tools like cargo will upgrade to Cargo.lock file to the new location on the next update
  • maybe introduce a “scratch” namespace so that people can still publish their experiments without requiring a domain, allowing the current no-setup use of cargo publish

So the complaint is merely that the ad hoc implementation uses a - to separate ad hoc namespace from “specific” package name, and that - is allowed in said names? That’s significantly less of an objection than I was getting from your argument.

You you’re forgetting the other requirement of crates: immutability. Domains change hands.

In any case you’re back to introducing full namespaces, which the ad hoc solution was avoiding due to the team’s existing decision on namespacing.

Domains changing hands doesn’t change anything regarding immutability. This has been the case since forever in Maven, and it has worked exceptionally well.

Which makes sense because the ad-hoc approach has tons of issues, as pointed out earlier. gbutler worked through the some of the questions regarding what is considered a namespace, how they could be reserved etc., and figured out tons of gnarly issues which have no good answer.

Maven has a high barrier to publishing. (I’ve no idea how to despite using it for multiple years!)

Cargo doesn’t. That’s a benefit.

Assuming you require domain ownership to publish (a large, new barrier), what happens for example, if I own tehcodez.fake, publish some package that gets used widely, then accidentally let it lapse and someone else grabs it?

If you don’t require ownership, how is it any better?

Hmm…this might be mostly backwards compatible if we allowed the following:

  • if there is only one crate with that name on crates.io, then, the organization is not needed in the cargo.toml to reference it (but a warning is issued, warning can be turned off)
  • if there is more than one crate on crates.io with the name under different organizations, then, the organization is required and it would be a build error if it were missing
  • after the cut-over to the new rules, and once the updated cargo/compiler support is stabilized, then the clock would begin with say a 90 or 180 day grace period during which no-one can create packages under their org with the same name as a crate under another org. This gives everyone time to update the cargo.toml’s to eliminate the warnings.
  • after the grace period, people can begin registering crate names under their org with the same name as a crate under another org.

Questions:

  • Would everyone get a starting “org” that corresponds to their VCS (github only for now) and user-name from there?
  • Would people be able to request/register other “Org” names? What would be required? Proving ownership of a DNS domain? How?
  • Would we rate-limit “orgs” if we allowed requesting additional orgs?
  • How does this solve the problem of wanting to switch to a new, maintained version of an existing, unmaintained crate?
  • Could adding the correct “org” to cargo.toml be automated through rustfix (or something similar) where there is no ambiguity?
  • What if you want to use crates of the same name (but different purposes) from different orgs as dependencies? Does crate-rename in cargo.toml suffice?

Downsides;

  • requires changes to both crates.io and cargo (and possibly the compiler) plus ideally rustfix (or something similar)
  • allows duplicate crate names going forward that would more likely require crate-rename whereas the pre-fix option never allows duplicate crate names (the crate is still always a unique name which includes the namespace prefix)
  • requires tieing to outside “registries” to validate ownership of an “org”

Upsides:

  • pretty much what “maven” and similar does so we know it works
  • would still work well if there is eventual “federation” proposal
  • others?