Pre-RFC: Packages as Optional Namespaces

There's now a prototype of what I've decided to call "cratespaces" (name to be bikeshed, of course) that uses / as the separator available to try out!

Many thanks to @stephanbuys, Justin Wernick, and @tshepang for building this!

20 Likes

I think that Cargo.toml syntax part of this proposal is problematic: "foo/bar" = "1.0.0" translated to foo_bar namespace for rustc only works if the org/individual lucked out and secured a relevant prefix for their project.

Imagine if @burntsushi did not register regex early on and had to name it "burntsushi/regex" instead. Much as I admire his work, I wouldn't want to use burntsushi_regex:: in my code. No, I'd still want the namespace to be just regex.

So how about this: we keep the left side as the local crate name, and instead add namespace on the right side:

regex = { package="burntsushi/regex", version="1.4.2" }

If that's too many characters to type, we could allow combining crate name and version into a single string, for example:

regex = "burntsushi/regex=1.4.2"

What you describe is already accepted by cargo.

Also, (say it with me,) package names are not lib names. Nothing prevents the package iqhbfiyfwne/ghoti from setting lib.name = "fish", so you declare your dep on the iqhbfiyfwne/ghoti package and have the fish crate available. (Well, except for clarity of what names are available, but then again, that's what the extended renaming syntax is for!)

3 Likes

I thought that two-part package names are not currently accepted, and would be added as a result of implementing this RFC? But this is tangential to what I am saying.

When I read this thread it seemed that a lot of discussion revolved around how to map "foo/bar" = "1.0" onto the namespace that will used in code (and how to avoid - / _ typosquatting resulting from that).

And I don't understand why this mapping is even desirable. As I mentioned above, stuff like async/std -> async_std seems cherry-picked and would only be usable when the first level namespace is "nice". I'd expect the normal case to be uglyname/nicename, in which case most everyone would put nicename = { package = "uglyname/nicename" } in their Cargo.toml.

Also, I think that forced crate renaming would remove most of the incentive to name-squat "nice" namespaces on crates.io. How many name-squatters have you seen on GitHub? I'm sure there are some, but because user/org names don't appear anywhere except in the remote's URL, people care much less about them.

This is based on an incorrect understanding of what this pre-RFC is for, see the "Scope" section in the text. This RFC is not particularly for burntsushi/regex. We are not expecting people to use user-scoped namespaces, in fact we do not even autocreate user-scoped namespaces so it is not guaranteed that burntsushi will have access to burntsushi/* if someone else gets there first.

The idea is for projects to make namespaces, so serde can have serde/derive and serde/json as "official" subcrates. The expected use of this feature is for projects to be able to publish a set of crates without having to worry about their informal "namespace" being taken over.

8 Likes

Though it says that "The focus of this proposal is the first problem only.", it seems that you are actually trying to solve the third one as well, otherwise I cannot explain why you'd want this mapping.

Drop this part and you won't have to deal with questions like typosquatting and "Does default = ["foo/std"] enable crate "foo/std", or feature "std" of crate "foo"?"

We deliberately did not implement support for Cargo features to try and resolve the ambiguous syntax issue; they won't work with this prototype :woman_shrugging:

This experimental impl deliberately is ignoring features for now. It will have to have a resolution, but this experimental impl is just about exploring the cratespaces impl without having to find the perfect solution first.

I fail to see your point here. This is just a scheme for the default lib.name if lib.name is unspecified, the same way a package.name with - is mapped to _ for lib.name. A package rayon/core needn't be simulversioned with rayon; in fact, rayon_core needs to never have a breaking change and be duplicated in-tree, whereas rayon can update independently (and in fact did pre-1.0).

Other examples where the "yes this is part of the same project, you can trust it" apply, but not the "give it all as a lib package":

  • serde and serde_json, perhaps others in the future, erased_serde, serde_yaml, etc
  • rand and it's officially provided distributions, rand_chacha, rand_pcg, rand_xoshiro, etc
  • (maybe) num and its (individually versioned!) subcrates, num_bigint, num_complex, etc

Currently, I don't know whether those crates are owned, maintained, and endorsed by the parent package that they are support for, unless I go look at who has publishing permission and try to correlate it. (Just the repo link being to the org is not enough! It could be a spoof/typosquat.)

With official cratespase support, these crates can be official external support for the main crate without any doubt that they can be (about) as trusted as the main crate.

No? The proposal doesn't handle simultaneous versioning at all. Yes, I reused serde/derive in my example, but for a better example, the icu project will have a lot of subcrates, like icu-locale, etc. These are to be individually versioned and are useful on their own (you do not need to pull in all of icu for some functionality). However, the team would like people to be able to trust the icu/* namespace as a way to get icu-* crates published by the team without having to check the author fields (Which nobody ever does).

I've repeatedly seen this kind of ask from various projects.

The problem still exists? Now the crate foo/bar can be used to squat bar.

1 Like

In case there is any doubt on the importance or urgency of this RFC, I think this research underlines the need:

The proposed RFC could help alleviate the issue as private organisations would have myorg/project and would not be hijacked by someone else trying to pollute the global crate space.

While projects using Cargo are not immune to supply-chain attacks, I do not believe the Rust ecosystem is affected by that exact issue.

The issue is caused by other package managers having packages from different registries in the same namespace. Cargo already forces crates to be namespaced by registry, and it's not possible to depend on a crate hosted outside of crates.io without specifying which registry it comes from:

[dependencies]
// Only crates.io is searched:
serde = "1.0"
// Only the "example-com" registry is searched:
secret-sauce = { version = "1.0", registry = "example-com" }

What this Pre-RFC is proposing is adding namespaces inside a single registry. There are a lot of good reasons to want that, but (thankfully!) preventing this attack is not one of them. :slightly_smiling_face:

12 Likes

Private organizations would not be publishing private libraries on crates.io as myorg/project. There is no way to post private libraries on crates.io. Private registries can already implement namespacing (such as forcing all crate names to begin with companyname-) if they so choose.

But what Pietro points out is more important. The way this attack worked, the author searched for leaked names of private packages that were already in use internally, and then published libraries with those names to the public registry.

If a developer on a private project tries to depend on a crate from a private registry but forgets to specify the registry attribute, and the crate doesn't exist on crates.io, their project won't compile and they'll immediately fix the problem.

If their Cargo.toml containing dependencies with registry attributes is somehow leaked and someone publishes a crate with the same name as one of those private dependencies to crates.io, as Pietro points out, the internal project will never install the crates.io version of secret-sauce.

The problem I could see happening is if a company forks an existing public crates.io project to their internal registry, and everyone inside that company is always supposed to use the fork, and someone goes to add that library as a new dependency to some project but forgets the registry attribute, then they'll get the existing public version, which is a different scenario than the blog post outlines. I'm not sure how you could find or exploit this scenario; perhaps someone smarter than me can figure out a way. Presumably, the company would have forked the crate for a reason, and the public crate wouldn't have whatever was needed so the project wouldn't work.

5 Likes

By the way, there is another thread about that attack in relation to Cargo, on the Rust users forum: Dependency confusion attack — may be applicable to alternative registries - #4 by 2e71828 - The Rust Programming Language Forum

5 Likes

While the specific attack isn't applicable to Cargo, I think it is a good illustration of the problem with vulnerabilities based on supply-chain attacks, typosquatting in particular: they accumulate quietly for a while, because attacks aren't really feasible in small a ecosystem; and when attacks do happen, the ecosystem has become too big to change course quickly.

The researcher mentioned made over 130.000$ in less than 6 months, from bug bounties alone. Given the high-profile nature of the targets, one shudders to imagine how much he could have made by selling these vulnerabilities to malicious actors; or how many similar attacks exist undetected in the wild, camouflaged as bugs.

Granted, the supply-chain attacks Cargo is vulnerable to are subtler, and harder to exploit. But they very much exist, and the only thing keeping them from being exploited is that Rust isn't used on the same scale Node or Ruby is. This will change sooner than later.

Rust seriously needs to have a stronger supply-chain-security story. Part of that story would be better tools to controls capabilities given to dependencies (eg forbid unsafe code in dependencies, forbid arbitrary system calls or filesystem access in dependencies, WASI-style); and a large part of it would be strong defenses against typo-squatting.

Unfortunately, I don't think this pre-RFC does enough in that regard.

1 Like

Fortunately, typosquatting can be detected in the registry. There were some plans for it:

And I have high hopes that if we integrate cargo-edit with Cargo proper, then it could have advanced "did you mean?" functionality that checks for typos and bad crates.

For serious protection against supply chain attacks, I would like more people to contribute code reviews to cargo-crev:

Once there's enough reviews, it will be possible to make a source replacement registry that contains only trusted, reviewed crates, so that it would be impossible for Cargo to fetch an untrusted crate (I've experimented with one that contains only crates compatible with old Rust compilers).

5 Likes

Preventing typo-squatting is not in the scope of this Pre-RFC, and I don't think that crate namespaces are the right tool to prevent typo-squatting anyway.

One possible solution is to enforce a minimum hamming distance of 2 between any two crate names or namespaces.

That would be a serious restriction on naming crates. Even something like net2 couldn't exist, as the (unused by the looks of it) net crate also exists. Needless to say it couldn't be done retroactively, either.

If we are talking about supply-chain attacks, then a more fundamental solution would be to implement a proper TUF-like framework. We need for cargo to cryptographically ensure that compiled packages indeed come from trusted authors or were reviewed by someone trusted. In the example of a company registry all private packages would be signed by a company key, so even if someone will be able to sneak in a malicious package, cargo will refuse to compile it since it is not signed by a developer trusted key.

3 Likes

On the supply chain security front, please also check out cargo-supply-chain, which is almost ready for an initial release:

6 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.