Pre-RFC: Packages as Optional Namespaces

Moderator note: Thank you to most folks for contributing to this discussion productively. Comments that do nothing other than say "this is wrong and it has been pointed out to you" are vacuous and unhelpful. A goal of this discussion should be to approach a shared understanding, and simply pointing and saying "you're wrong" over and over again does not help us reach that goal. Please stop doing that.

6 Likes

I think a disconnect that is cropping up here is that reverse domain notation is better suited to solve the following problem:

(especially since @ehiggs suggests that for backwards compatibility existing crates become io.crates.foo)

This solves a different set of problems. It makes indicating organization ownership harder for everyone (which is the main goal of this pre-RFC), and it makes everyone have to start thinking about namespaces (an explicit non-goal of this pre-RFC).

I very deliberately defined the scope of this discussion at the top to prevent having a situation where people talk about solutions to different problems and talk past each other:

So unless someone can make reverse dns match with the stated motivations of this pre-RFC, I'd request discussions of that proposal go in a separate pre-RFC.

3 Likes

Reading the proposal again more closely I think the limited scope makes this orthogonal to e.g. reverse-DNS and authentication of users.The only thing being proposed here is that instead of being two levels (project, version). There would now be three levels (group, project, version) and the group is currently a string with no meaning anywhere else.

'.'? :slight_smile: It would avoid ambiguity in feature syntax, could prevent ambiguity with foo_bar and would hence mitigate typo squatting.

Drawbacks: It would also require changes in rustc and new packages wouldn't be usable in older versions of rust.

Already covered in the original post:

This is not a dealbreaker, but it's something we should recognize.

You previously said the default group for everyone would be "io.crates." What is the motivation for introducing this concept of a "group" to the RFC? Why should crates.io introduce the ability to acquire "groups" by some means other than the same means by which it divies up its current flat namespace (first come first serve to anyone with a crates.io account)? You aren't being very clear about the problem you're trying to solve by introducing this complexity to the feature.

1 Like

(FWIW I've discussed this with @ehiggs out of band and explained some of these differences. I've pointed out areas of their proposal that need to be expanded on, and encouraged them to create a separate thread for their proposal, which i perceive to be significantly different, if they still wish to pursue it. We should probably not discuss this proposal further here unless it can be better tied to the current motivations.)

2 Likes

Slight update: I've started moving forward with this as a proper RFC, but haven't posted it yet. In the interest of having things go well, I'm going to let the crates.io team discuss this during their meetings first, to work out any remaining issues. Initial signals from them are promising though!

15 Likes

Thank you so much for working on this.

1 Like

There's now a prototype of what I've decided to call "cratespaces" (name to be bikeshed, of course) that uses / as the separator available to try out!

Many thanks to @stephanbuys, Justin Wernick, and @tshepang for building this!

21 Likes

I think that Cargo.toml syntax part of this proposal is problematic: "foo/bar" = "1.0.0" translated to foo_bar namespace for rustc only works if the org/individual lucked out and secured a relevant prefix for their project.

Imagine if @burntsushi did not register regex early on and had to name it "burntsushi/regex" instead. Much as I admire his work, I wouldn't want to use burntsushi_regex:: in my code. No, I'd still want the namespace to be just regex.

So how about this: we keep the left side as the local crate name, and instead add namespace on the right side:

regex = { package="burntsushi/regex", version="1.4.2" }

If that's too many characters to type, we could allow combining crate name and version into a single string, for example:

regex = "burntsushi/regex=1.4.2"

What you describe is already accepted by cargo.

Also, (say it with me,) package names are not lib names. Nothing prevents the package iqhbfiyfwne/ghoti from setting lib.name = "fish", so you declare your dep on the iqhbfiyfwne/ghoti package and have the fish crate available. (Well, except for clarity of what names are available, but then again, that's what the extended renaming syntax is for!)

5 Likes

I thought that two-part package names are not currently accepted, and would be added as a result of implementing this RFC? But this is tangential to what I am saying.

When I read this thread it seemed that a lot of discussion revolved around how to map "foo/bar" = "1.0" onto the namespace that will used in code (and how to avoid - / _ typosquatting resulting from that).

And I don't understand why this mapping is even desirable. As I mentioned above, stuff like async/std -> async_std seems cherry-picked and would only be usable when the first level namespace is "nice". I'd expect the normal case to be uglyname/nicename, in which case most everyone would put nicename = { package = "uglyname/nicename" } in their Cargo.toml.

Also, I think that forced crate renaming would remove most of the incentive to name-squat "nice" namespaces on crates.io. How many name-squatters have you seen on GitHub? I'm sure there are some, but because user/org names don't appear anywhere except in the remote's URL, people care much less about them.

This is based on an incorrect understanding of what this pre-RFC is for, see the "Scope" section in the text. This RFC is not particularly for burntsushi/regex. We are not expecting people to use user-scoped namespaces, in fact we do not even autocreate user-scoped namespaces so it is not guaranteed that burntsushi will have access to burntsushi/* if someone else gets there first.

The idea is for projects to make namespaces, so serde can have serde/derive and serde/json as "official" subcrates. The expected use of this feature is for projects to be able to publish a set of crates without having to worry about their informal "namespace" being taken over.

9 Likes

Though it says that "The focus of this proposal is the first problem only.", it seems that you are actually trying to solve the third one as well, otherwise I cannot explain why you'd want this mapping.

Drop this part and you won't have to deal with questions like typosquatting and "Does default = ["foo/std"] enable crate "foo/std", or feature "std" of crate "foo"?"

We deliberately did not implement support for Cargo features to try and resolve the ambiguous syntax issue; they won't work with this prototype :woman_shrugging:

This experimental impl deliberately is ignoring features for now. It will have to have a resolution, but this experimental impl is just about exploring the cratespaces impl without having to find the perfect solution first.

I fail to see your point here. This is just a scheme for the default lib.name if lib.name is unspecified, the same way a package.name with - is mapped to _ for lib.name. A package rayon/core needn't be simulversioned with rayon; in fact, rayon_core needs to never have a breaking change and be duplicated in-tree, whereas rayon can update independently (and in fact did pre-1.0).

Other examples where the "yes this is part of the same project, you can trust it" apply, but not the "give it all as a lib package":

  • serde and serde_json, perhaps others in the future, erased_serde, serde_yaml, etc
  • rand and it's officially provided distributions, rand_chacha, rand_pcg, rand_xoshiro, etc
  • (maybe) num and its (individually versioned!) subcrates, num_bigint, num_complex, etc

Currently, I don't know whether those crates are owned, maintained, and endorsed by the parent package that they are support for, unless I go look at who has publishing permission and try to correlate it. (Just the repo link being to the org is not enough! It could be a spoof/typosquat.)

With official cratespase support, these crates can be official external support for the main crate without any doubt that they can be (about) as trusted as the main crate.

No? The proposal doesn't handle simultaneous versioning at all. Yes, I reused serde/derive in my example, but for a better example, the icu project will have a lot of subcrates, like icu-locale, etc. These are to be individually versioned and are useful on their own (you do not need to pull in all of icu for some functionality). However, the team would like people to be able to trust the icu/* namespace as a way to get icu-* crates published by the team without having to check the author fields (Which nobody ever does).

I've repeatedly seen this kind of ask from various projects.

The problem still exists? Now the crate foo/bar can be used to squat bar.

1 Like

In case there is any doubt on the importance or urgency of this RFC, I think this research underlines the need:

The proposed RFC could help alleviate the issue as private organisations would have myorg/project and would not be hijacked by someone else trying to pollute the global crate space.

While projects using Cargo are not immune to supply-chain attacks, I do not believe the Rust ecosystem is affected by that exact issue.

The issue is caused by other package managers having packages from different registries in the same namespace. Cargo already forces crates to be namespaced by registry, and it's not possible to depend on a crate hosted outside of crates.io without specifying which registry it comes from:

[dependencies]
// Only crates.io is searched:
serde = "1.0"
// Only the "example-com" registry is searched:
secret-sauce = { version = "1.0", registry = "example-com" }

What this Pre-RFC is proposing is adding namespaces inside a single registry. There are a lot of good reasons to want that, but (thankfully!) preventing this attack is not one of them. :slightly_smiling_face:

12 Likes

Private organizations would not be publishing private libraries on crates.io as myorg/project. There is no way to post private libraries on crates.io. Private registries can already implement namespacing (such as forcing all crate names to begin with companyname-) if they so choose.

But what Pietro points out is more important. The way this attack worked, the author searched for leaked names of private packages that were already in use internally, and then published libraries with those names to the public registry.

If a developer on a private project tries to depend on a crate from a private registry but forgets to specify the registry attribute, and the crate doesn't exist on crates.io, their project won't compile and they'll immediately fix the problem.

If their Cargo.toml containing dependencies with registry attributes is somehow leaked and someone publishes a crate with the same name as one of those private dependencies to crates.io, as Pietro points out, the internal project will never install the crates.io version of secret-sauce.

The problem I could see happening is if a company forks an existing public crates.io project to their internal registry, and everyone inside that company is always supposed to use the fork, and someone goes to add that library as a new dependency to some project but forgets the registry attribute, then they'll get the existing public version, which is a different scenario than the blog post outlines. I'm not sure how you could find or exploit this scenario; perhaps someone smarter than me can figure out a way. Presumably, the company would have forked the crate for a reason, and the public crate wouldn't have whatever was needed so the project wouldn't work.

5 Likes

By the way, there is another thread about that attack in relation to Cargo, on the Rust users forum: Dependency confusion attack — may be applicable to alternative registries - #4 by 2e71828 - The Rust Programming Language Forum

5 Likes