Pre-RFC: Packages as Optional Namespaces
Preface
There have been a lot of discussions about namespacing in cargo. (most recently: Pre-RFC: (hyper)minimalist namespaces on crates.io and Pre-RFC: User namespaces on crates.io).
There are some strong opinions about this floating around. From my perspective, most of the tension around this is that people have different sets of problems they want solved, some people want to solve squatting, some people want to solve ownership. The many years of talking past each other has left a lot of people feeling unheard and hurt.
I've been quietly talking to people, one-on-one, about this for years, to try and get an idea of how best to move forward here. I've had conversations with crates.io and cargo team members, as well as with community members, to better understand the space. I have presented various forms of this proposal to them and received positive responses. I do think this is a viable and good path forward, but for this to work out we need to be our best selves here.
I would like to request everyone to keep this discussion constructive and respectful.
This proposal is similar to [Pre-RFC]: Packages as Namespaces , but a bit more fleshed out.
This is not an official proposal from any relevant team. While I have discussed this with teams at various points, I have largely been an interested outsider to this whole set of discussions. After seeing discussions get rehashed again and again, I felt it is worth putting this proposal out there to maybe try and make progress this long standing issue.
Scope
As mentioned before, there are many different problems people want namespacing to solve. Here are a couple I've seen over the course of my discussions:
- Namespacing to indicate organization ownership -- e.g. knowing that
regex/foo
is a crate you can trust if you trustregex
. - Namespacing as a way to prevent squatting -- everyone publishes under
username/foo
ororg/foo
and thus names cannot overlap - Namespacing as a way to talk about multi-crate "packages" -- e.g. having a
serde
"package" that contains simul-versionedserde
andserde/derive
crates.
The focus of this proposal is the first problem only.
I do not find namespacing to be a solution to the problem of general squatting, for reasons that have been articulated many times (I don't want to go into it here, but feel free to DM me if you want to know more).
As for the multi-crate "package" concept, I'm interested in that kind of thing being possible but I feel like that does not need namespaces to work. There are some community members working on proposals in that space too.
Anyway, I'd like to keep the discussion here focused on the first problem only. The other problems are worth discussing, but let's please stay on topic!
Summary
Grant exclusive access to publishing crates parent/foo
for owners of crate parent
.
Namespaced crates can be named in Rust code using underscores (e.g. parent_foo
)
Motivation
While Rust crates are practically unlimited in size, it is a common pattern for organizations to split their projects into many crates, especially if they expect users to only need a fraction of their crates.
For example, unic, tokio, async-std, rusoto all do something like this, with lots of projectname-foo
crates. At the moment, it is not necessarily true that a crate named projectname-foo
is maintained by projectname
, and in some cases that is even desired! E.g. serde
has many third party "plugin" crates like serde-xml-rs. Similarly, async-tls is a general crate not specific to the async-std ecosystem.
Regardless, it is nice to have a way to signify "these are all crates belonging to a single organization, and you may trust them the same". Recently, when starting up ICU4X, we came up against this problem: We wanted to be able to publish ICU4X as an extremely modular system of icu-foo
or icu4x-foo
crates, but it would be confusing to users if third-party crates could also exist there (or take names we wanted to use).
This is distinct from the general problem of squatting -- with general squatting, someone else might come up with a cool crate name before you do. However, with projectname-foo
crates, it's more of a case of third parties "muscling in" on a name you have already chosen and are using.
Guide-level explanation
If you own a crate foo
, you may create a crate namespaced under it as foo/bar
. Only people who are owners of foo
may create a crate foo/bar
(and all owners of foo
are implicitly owners of foo/bar
). After such a crate is created, additional per-crate publishers may be added who will be able to publish subsequent versions as usual.
The crate can be imported in Cargo.toml using its name as normal:
[dependencies]
"foo/bar" = "1.0"
In Rust code, the slash gets converted to an underscore, the same way we do this for dashes.
use foo_bar::Baz;
Reference-level explanation
/
is now considered a valid identifier inside a crate name Crates.io. For now, we will restrict crate names to having a single /
in them, not at the beginning or end of the name, but this can be changed in the future.
When publishing a crate foo/bar
, if the crate does not exist, the following must be true:
-
foo
must exist - The user publishing the crate must be an owner of
foo
For the crate foo/bar
, all owners of foo
are always considered owners of foo/bar
, however additional owners may be added. People removed from ownership of foo
will also lose access to foo/bar
unless they were explicitly added as owners to foo/bar
.
Crates.io displays foo/bar
crates with the name foo/bar
, though it may stylistically make the foo
part link to the foo
crate.
The registry index trie may represent subpackages by placing foo/bar
in foo@/bar
, placed next to where foo
is in the trie (i.e. the full path will be fo/foo@/bar
).
No changes are made to rustc
. When compiling a crate foo/bar
, Cargo will automatically pass in --crate-name foo_bar
, and when referring to it as a dependency Cargo will use --extern foo_bar=....
. This is the same thing we currently do for foo-bar
.
If you end up in a situation where you have both foo/bar
and foo-bar
as active dependencies of your crate, your code will not compile and you must rename one of them.
Drawbacks
Slashes
So far slashes as a "separator" have not existed in Rust. There may be dissonance with having another identifier character allowed on crates.io but not in Rust code. Dashes are already confusing for new users. Some of this can be remediated with appropriate diagnostics on when /
is encountered at the head of a path.
Furthermore, slashes are ambiguous in feature specifiers:
[dependencies]
"foo" = "1"
"foo/std" = { version = "1", optional = true }
[features]
# Does this enable crate "foo/std", or feature "std" of crate "foo"?
default = ["foo/std"]
Namespace root taken
Not all existing projects can transition to using namespaces here. For example, the unicode
crate is reserved, so unicode-rs
cannot use it as a namespace despite owning most of the unicode-foo
crates. In other cases, the "namespace root" foo
may be owned by a different set of people than the foo-bar
crates, and folks may need to negotiate (async-std
has this problem, it manages async-foo
crates but the root async
crate is taken by someone else). Nobody is forced to switch to namespaces, of course, so the damage here is limited, but it would be nice for everyone to be able to transition.
Dash typosquatting
This proposal does not prevent anyone from taking foo-bar
after you publish foo/bar
. Given that the Rust crate import syntax for foo/bar
is foo_bar
, same as foo-bar
, it's totally possible for a user to accidentally type foo-bar
in Cargo.toml
instead of foo/bar
, and pull in the wrong, squatted, crate.
We currently prevent foo-bar
and foo_bar
from existing at the same time. We could do this here as well, but it would only go in one direction: if foo/bar
exists foo-bar
/foo_bar
cannot be published, but not vice versa. This limits the "damage" to cases where someone pre-squats foo-bar
before you publish foo/bar
, and the damage can be mitigated by checking to see if such a clashing crate exists when publishing, if you actually care about this attack vector. There are some tradeoffs there that we would have to explore.
One thing that could mitigate foo/bar
mapping to the potentially ambiguous foo_bar
is using something like foo::crate::bar
or ~foo::bar
or foo::/bar
in the import syntax.
Slow migration
Existing projects wishing to use this may need to manually migrate. For example, unic-langid
may become unic/langid
, with the unic
project maintaining unic-langid
as a reexport crate with the same version number. Getting people to migrate might be a bit of work, and furthermore maintaining a reexport crate during the (potentially long) transition period will also be some work. Of course, there is no obligation to maintain a transition crate, but users will stop getting updates if you don't.
A possible path forward is to enable people to register aliases, i.e. unic/langid
is an alias for unic-langid
.
Rationale and alternatives
This change solves the ownership problem in a way that can be slowly transitioned to for most projects.
foo::bar
on crates.io and in Rust
While I cover a bunch of different separator choices below, I want to call out foo::bar
in particular. If we went with foo::bar
, we could have the same crate name in the Rust source and Cargo manifest. This would be amazing.
Except, of course, crate foo::bar
is ambiguous with module bar
in crate foo
(which might actually be a reexport of foo::bar
in some cases).
This can still be made to work, e.g. we could use foo::crate::bar
to disambiguate, and encourage namespace-using crates to ensure that mod bar
in crate foo
either doesn't exist or is a reexport of crate foo::bar
. I definitely want to see this discussed a bit more.
Separator choice
A different separator might make more sense.
We could perhaps have foo-*
get autoreserved if you publish foo
, as outlined in Pre-RFC: (hyper)minimalist namespaces on crates.io . I find that this can lead to unfortunate situations where a namespace traditionally used by one project (e.g. async-*
) is suddenly given over to a different project (the async
crate). Furthermore, users cannot trust foo-bar
to be owned by foo
because the vast number of grandfathered crates we will have.
Another separator idea would be to use ::
, e.g. foo::bar
. This looks great in Rust code, provided that the parent crate is empty and does not also have a bar
module. See the section above for more info.
Triple colons could work. People might find it confusing, but foo:::bar
evokes Rust paths without being ambiguous.
We could use ~
which enables Rust code to directly name namespaced packages (as ~
is no longer used in any valid Rust syntax). It looks extremely weird, however.
We could use dots (foo.bar
). This does evoke some similarity with Rust syntax, however there are ambiguities: foo.bar
in Rust code could either mean "the field bar
of local/static foo
" or it may mean "the crate foo.bar
".
Note that unquoted dots have semantic meaning in TOML, and allowing for unquoted dots would freeze the list of dependency subfields allowed (to version
, git
, branch
, features
, etc).
Separator mapping
The proposal suggests mapping foo/bar
to foo_bar
, but as mentioned in the typosquatting section, this has problems. There may be other mappings that work out better:
-
foo::bar
(see section above) foo::crate::bar
foo::/bar
~foo::bar
and the like.
User / org namespaces
Another way to handle namespacing is to rely on usernames and GitHub orgs as namespace roots. This ties crates.io
strongly to Github -- currently while GitHub is the only login method, there is nothing preventing others from being added.
Furthermore, usernames are not immutable, and that can lead to a whole host of issues.
Registry trie format
Instead of placing foo/bar
in foo@/bar
, it can be placed in foo@bar
or something else.
Prior art
This proposal is basically the same as [Pre-RFC]: Packages as Namespaces and [Pre-RFC] [idea] Cratespaces (crates as namespace, take 2... or 3?) .
Namespacing has been discussed in Namespacing on Crates.io , [Pre-RFC] Domains as namespaces, Pre-RFC: User namespaces on crates.io , Pre-RFC: (hyper)minimalist namespaces on crates.io , Blog Post: No Namespaces in Rust is a Feature , Crates.io package policies, Crates.io squatting, and many others.
Unresolved questions
- Is
/
really the separator we wish to use? - How do we avoid ambiguity in feature syntax
- Is there a way to avoid
foo/bar
turning in to the potentially ambiguousfoo_bar
? - Can we mitigate some of typosquatting?
- How can we represent namespaced crates in the registry trie?
Future possibilities
We can allow multiple layers of nesting if people want it.