Pre-RFC: Packages as Optional Namespaces

Pre-RFC: Packages as Optional Namespaces

Preface

There have been a lot of discussions about namespacing in cargo. (most recently: Pre-RFC: (hyper)minimalist namespaces on crates.io and Pre-RFC: User namespaces on crates.io).

There are some strong opinions about this floating around. From my perspective, most of the tension around this is that people have different sets of problems they want solved, some people want to solve squatting, some people want to solve ownership. The many years of talking past each other has left a lot of people feeling unheard and hurt.

I've been quietly talking to people, one-on-one, about this for years, to try and get an idea of how best to move forward here. I've had conversations with crates.io and cargo team members, as well as with community members, to better understand the space. I have presented various forms of this proposal to them and received positive responses. I do think this is a viable and good path forward, but for this to work out we need to be our best selves here.

I would like to request everyone to keep this discussion constructive and respectful.

This proposal is similar to [Pre-RFC]: Packages as Namespaces , but a bit more fleshed out.

This is not an official proposal from any relevant team. While I have discussed this with teams at various points, I have largely been an interested outsider to this whole set of discussions. After seeing discussions get rehashed again and again, I felt it is worth putting this proposal out there to maybe try and make progress this long standing issue.

Scope

As mentioned before, there are many different problems people want namespacing to solve. Here are a couple I've seen over the course of my discussions:

  • Namespacing to indicate organization ownership -- e.g. knowing that regex/foo is a crate you can trust if you trust regex.
  • Namespacing as a way to prevent squatting -- everyone publishes under username/foo or org/foo and thus names cannot overlap
  • Namespacing as a way to talk about multi-crate "packages" -- e.g. having a serde "package" that contains simul-versioned serde and serde/derive crates.

The focus of this proposal is the first problem only.

I do not find namespacing to be a solution to the problem of general squatting, for reasons that have been articulated many times (I don't want to go into it here, but feel free to DM me if you want to know more).

As for the multi-crate "package" concept, I'm interested in that kind of thing being possible but I feel like that does not need namespaces to work. There are some community members working on proposals in that space too.

Anyway, I'd like to keep the discussion here focused on the first problem only. The other problems are worth discussing, but let's please stay on topic!


Summary

Grant exclusive access to publishing crates parent/foo for owners of crate parent.

Namespaced crates can be named in Rust code using underscores (e.g. parent_foo)

Motivation

While Rust crates are practically unlimited in size, it is a common pattern for organizations to split their projects into many crates, especially if they expect users to only need a fraction of their crates.

For example, unic, tokio, async-std, rusoto all do something like this, with lots of projectname-foo crates. At the moment, it is not necessarily true that a crate named projectname-foo is maintained by projectname, and in some cases that is even desired! E.g. serde has many third party "plugin" crates like serde-xml-rs. Similarly, async-tls is a general crate not specific to the async-std ecosystem.

Regardless, it is nice to have a way to signify "these are all crates belonging to a single organization, and you may trust them the same". Recently, when starting up ICU4X, we came up against this problem: We wanted to be able to publish ICU4X as an extremely modular system of icu-foo or icu4x-foo crates, but it would be confusing to users if third-party crates could also exist there (or take names we wanted to use).

This is distinct from the general problem of squatting -- with general squatting, someone else might come up with a cool crate name before you do. However, with projectname-foo crates, it's more of a case of third parties "muscling in" on a name you have already chosen and are using.

Guide-level explanation

If you own a crate foo, you may create a crate namespaced under it as foo/bar. Only people who are owners of foo may create a crate foo/bar (and all owners of foo are implicitly owners of foo/bar). After such a crate is created, additional per-crate publishers may be added who will be able to publish subsequent versions as usual.

The crate can be imported in Cargo.toml using its name as normal:

[dependencies]
"foo/bar" = "1.0"

In Rust code, the slash gets converted to an underscore, the same way we do this for dashes.

use foo_bar::Baz;

Reference-level explanation

/ is now considered a valid identifier inside a crate name Crates.io. For now, we will restrict crate names to having a single / in them, not at the beginning or end of the name, but this can be changed in the future.

When publishing a crate foo/bar, if the crate does not exist, the following must be true:

  • foo must exist
  • The user publishing the crate must be an owner of foo

For the crate foo/bar, all owners of foo are always considered owners of foo/bar, however additional owners may be added. People removed from ownership of foo will also lose access to foo/bar unless they were explicitly added as owners to foo/bar.

Crates.io displays foo/bar crates with the name foo/bar, though it may stylistically make the foo part link to the foo crate.

The registry index trie may represent subpackages by placing foo/bar in foo@/bar, placed next to where foo is in the trie (i.e. the full path will be fo/foo@/bar).

No changes are made to rustc. When compiling a crate foo/bar, Cargo will automatically pass in --crate-name foo_bar, and when referring to it as a dependency Cargo will use --extern foo_bar=..... This is the same thing we currently do for foo-bar.

If you end up in a situation where you have both foo/bar and foo-bar as active dependencies of your crate, your code will not compile and you must rename one of them.

Drawbacks

Slashes

So far slashes as a "separator" have not existed in Rust. There may be dissonance with having another identifier character allowed on crates.io but not in Rust code. Dashes are already confusing for new users. Some of this can be remediated with appropriate diagnostics on when / is encountered at the head of a path.

Furthermore, slashes are ambiguous in feature specifiers:

[dependencies]
"foo" = "1"
"foo/std" = { version = "1", optional = true }

[features]
# Does this enable crate "foo/std", or feature "std" of crate "foo"?
default = ["foo/std"]

Namespace root taken

Not all existing projects can transition to using namespaces here. For example, the unicode crate is reserved, so unicode-rs cannot use it as a namespace despite owning most of the unicode-foo crates. In other cases, the "namespace root" foo may be owned by a different set of people than the foo-bar crates, and folks may need to negotiate (async-std has this problem, it manages async-foo crates but the root async crate is taken by someone else). Nobody is forced to switch to namespaces, of course, so the damage here is limited, but it would be nice for everyone to be able to transition.

Dash typosquatting

This proposal does not prevent anyone from taking foo-bar after you publish foo/bar. Given that the Rust crate import syntax for foo/bar is foo_bar, same as foo-bar, it's totally possible for a user to accidentally type foo-bar in Cargo.toml instead of foo/bar, and pull in the wrong, squatted, crate.

We currently prevent foo-bar and foo_bar from existing at the same time. We could do this here as well, but it would only go in one direction: if foo/bar exists foo-bar/foo_bar cannot be published, but not vice versa. This limits the "damage" to cases where someone pre-squats foo-bar before you publish foo/bar, and the damage can be mitigated by checking to see if such a clashing crate exists when publishing, if you actually care about this attack vector. There are some tradeoffs there that we would have to explore.

One thing that could mitigate foo/bar mapping to the potentially ambiguous foo_bar is using something like foo::crate::bar or ~foo::bar or foo::/bar in the import syntax.

Slow migration

Existing projects wishing to use this may need to manually migrate. For example, unic-langid may become unic/langid, with the unic project maintaining unic-langid as a reexport crate with the same version number. Getting people to migrate might be a bit of work, and furthermore maintaining a reexport crate during the (potentially long) transition period will also be some work. Of course, there is no obligation to maintain a transition crate, but users will stop getting updates if you don't.

A possible path forward is to enable people to register aliases, i.e. unic/langid is an alias for unic-langid.

Rationale and alternatives

This change solves the ownership problem in a way that can be slowly transitioned to for most projects.

foo::bar on crates.io and in Rust

While I cover a bunch of different separator choices below, I want to call out foo::bar in particular. If we went with foo::bar, we could have the same crate name in the Rust source and Cargo manifest. This would be amazing.

Except, of course, crate foo::bar is ambiguous with module bar in crate foo (which might actually be a reexport of foo::bar in some cases).

This can still be made to work, e.g. we could use foo::crate::bar to disambiguate, and encourage namespace-using crates to ensure that mod bar in crate foo either doesn't exist or is a reexport of crate foo::bar. I definitely want to see this discussed a bit more.

Separator choice

A different separator might make more sense.

We could perhaps have foo-* get autoreserved if you publish foo, as outlined in Pre-RFC: (hyper)minimalist namespaces on crates.io . I find that this can lead to unfortunate situations where a namespace traditionally used by one project (e.g. async-*) is suddenly given over to a different project (the async crate). Furthermore, users cannot trust foo-bar to be owned by foo because the vast number of grandfathered crates we will have.

Another separator idea would be to use ::, e.g. foo::bar. This looks great in Rust code, provided that the parent crate is empty and does not also have a bar module. See the section above for more info.

Triple colons could work. People might find it confusing, but foo:::bar evokes Rust paths without being ambiguous.

We could use ~ which enables Rust code to directly name namespaced packages (as ~ is no longer used in any valid Rust syntax). It looks extremely weird, however.

We could use dots (foo.bar). This does evoke some similarity with Rust syntax, however there are ambiguities: foo.bar in Rust code could either mean "the field bar of local/static foo" or it may mean "the crate foo.bar".

Note that unquoted dots have semantic meaning in TOML, and allowing for unquoted dots would freeze the list of dependency subfields allowed (to version, git, branch, features, etc).

Separator mapping

The proposal suggests mapping foo/bar to foo_bar, but as mentioned in the typosquatting section, this has problems. There may be other mappings that work out better:

  • foo::bar (see section above)
  • foo::crate::bar
  • foo::/bar
  • ~foo::bar

and the like.

User / org namespaces

Another way to handle namespacing is to rely on usernames and GitHub orgs as namespace roots. This ties crates.io strongly to Github -- currently while GitHub is the only login method, there is nothing preventing others from being added.

Furthermore, usernames are not immutable, and that can lead to a whole host of issues.

Registry trie format

Instead of placing foo/bar in foo@/bar, it can be placed in foo@bar or something else.

Prior art

This proposal is basically the same as [Pre-RFC]: Packages as Namespaces and [Pre-RFC] [idea] Cratespaces (crates as namespace, take 2... or 3?) .

Namespacing has been discussed in Namespacing on Crates.io , [Pre-RFC] Domains as namespaces, Pre-RFC: User namespaces on crates.io , Pre-RFC: (hyper)minimalist namespaces on crates.io , Blog Post: No Namespaces in Rust is a Feature , Crates.io package policies, Crates.io squatting, and many others.

Unresolved questions

  • Is / really the separator we wish to use?
  • How do we avoid ambiguity in feature syntax
  • Is there a way to avoid foo/bar turning in to the potentially ambiguous foo_bar?
  • Can we mitigate some of typosquatting?
  • How can we represent namespaced crates in the registry trie?

Future possibilities

We can allow multiple layers of nesting if people want it.

32 Likes

You probably meant to leave out quotation marks in the code block.

Backwards actually, I meant to leave out the bit in parentheses, @ehuss notified me that my assessment of TOML syntax was incorrect. Fixed.

I feel like I’m missing the explanation of why not foo::bar. One could even also use "foo::bar" in the Cargo.toml. Finally one would need to come up with a way to prevent (or discourage) foo from also providing a module or other top level item called bar and this would be a super clean solution, wouldn’t it?

1 Like

I actually really like foo::bar because then both the package name and the import can be foo::bar:

The problem is precisely that you will have ambiguity with providing a toplevel bar in the foo crate. Which might even just be pub use foo_bar::bar!

This is the first time such ambiguity will be introduced in use syntax, and I would tread carefully. I'm not against this, but it is a tradeoff, and we must consider this. I can explicitly mention it in that section of the RFC.

These are the problems I do want to encourage people to weigh against each other: while it sounds like a bikeshed, figuring out the exact syntax in imports and in the name itself is actually pretty tricky business.

1 Like

So even if we don't have proper aliases, it'd be pretty easy to write a tool that maintains backwards compatible reexports. I don't think it's a large issue - the biggest cost here would be that there are two ways to get to common crates.

2 Likes

Another option here is to make a double underscores map to a slash. E.g. new versions of cargo will display double underscores as a forward slash. So take all your points about crate reservations and such, but instead of slash it's double underscore. Then old cargo users don't have to worry about migration, and new versions of cargo could just display the double underscore as forward slash. It might also work out to a nice namespace convention in code. E.g. serde__json is the json crate from serde.

2 Likes

I think I would be interested in combining this with workspaces: to start, Cargo might refuse to permit a package to be named "foo/bar" unless it was in a workspace with another package "foo", and "foo" did not have a publicly visible item named bar in its root scope. The package "foo/bar" would then always have its root namespace at the location foo::bar in any crates that depend on it, even if they do not depend on the "foo" package themselves.

This sidesteps the problem of converting registry word-separator characters to Rust paths, retains the path-like nature of / in package names and :: in the item tree, and avoids the conflict between a "foo-bar" or "foo_bar" package and the new "foo/bar".

I think that reserving "package-*" or "package_*" strings for all "package"s uploaded to the registry is not something that can be pursued at the moment, so restricting the first draft of this to only use Cargo workspaces, and not yet permit crates where Cargo cannot see a hierarchical relationship, is likely the least scope of work to start.

1 Like

This is a conflict that the crate owner can resolve during the course of their transition, right? In this case the resolution would be telling users how to depend on the "child" crate alongside the "parent" crate. Maybe there's even a cargo feature that could make that automatic?

2 Likes

Rather than different variants of the separating characters how about using a "namespace-introductory" character? E.g., using @ as the namespace character @unic-langid. The introductory character is omitted in the mapped name, thus becoming unic::langid or unic_langid, as per the pre-RFC.

Names that don't start with @ (or the respective chosen character) - i.e., all current crates - are in the "global" namespace.

The advantages would be that it'd side-step most of the separator discussion, the dash could be retained as separating character, existing squatting would be minimal, and only the new concept of "namespaces" with their starting character would need to be taught.

2 Likes

Yep! To be abundantly clear, I would be extremely happy if we picked foo::bar as both the rust and cargo syntax. If we can figure out such a plan that everyone is happy with, I would very much prefer to switch to that being the main proposal.

5 Likes

Added a separate section for foo::bar so it's not buried

I don't see it as sidestepping much. The existing squatting problem still exists because you're still mapping it to an underscore, and that can be typosquatted.

One minor issue with / as the namespace delimiter is the following ambiguity:

[dependencies]
"foo" = "1"
"foo/std" = { version = "1", optional = true }

[features]
# Does this enable crate "foo/std", or feature "std" of crate "foo"?
default = ["foo/std"]
9 Likes

Good catch! I'll add it to the text.

1 Like

I'd say it should enable whatever is present, even both. The ambiguity falls on the foo authors, who control both dependencies in this design.

4 Likes

Hmm, this seems pretty brittle, though, because you can always edit it later. Also i'd argue that sometimes you want that, if foo is a façade crate you'd want it to have reexports.

Though anyway, if we are trying to make the import be foo::bar, i think the crate name should also just be foo::bar.

1 Like

Here's a crazy idea to remove ambiguity from the use statements, though maybe it's more of an implementation detail:

We could have Cargo generate a facade crate (a synthetic crate since it's on the fly) for a namespace that reexports all public items of the parent crate and then reports any children crates as modules.

Pros: From a compiler and language standpoint you don't have to distinguish between module or child crate.

Cons: You'd have to tool Cargo to generate a synthetic crate on the fly. And it might get messy if you have to do this for other dependencies potentially add a lot of synthetic crates into your compilation tree.

Would a requirement on the top level crate being an "advertisement crate" as seen eg in twilight with no exports be sensible?

I think the issue is that it's not necessary for the namespace root to just be a façade crate, what if it wants to export some convenience functions? Maybe this kind of restriction is fine, I don't know.