[Pre-RFC] DNS domains as package namespaces

Package namespacing is one of those perennial topics that seems to attract designs but not much implementation. I took my own preferred approach (DNS domains) and got some robots to implement a proof-of-concept.

The code isn't great (and the robots hallucinate various minor details) but overall it's enough to get a feel for how DNS-based namespacing would work.

Cargo branch (raw LLM output): GitHub - jmillikin/upstream__cargo at domain-namespaces-raw-llm-output · GitHub

crates.io branch (raw LLM output): GitHub - jmillikin/upstream__crates.io at domain-namespaces-raw-llm-output · GitHub

I want to get a feeling for how the Cargo / crates.io team feel about the overall concept before I put any effort into cleaning up the proof-of-concept code.

  • My impression from past discussions is that those teams have historically objected to namespaces for aesthetic reasons ("why name your crate boring.com/base64 when you could just name it base64inator9000?").
  • But also opinions may currently be different than they were a year or two ago.

Rough outline of the design:

  • Package names are allowed (not required) to start with a {namespace}/ prefix, where the namespace is a DNS-style dotted name like example.com or jdoe.github.io.
  • This is designed to address name squatting ("I want to publish my wasm crate but there's already an empty crate named wasm").
    • Control over subsets of the global namespace ("My project is named cool so I want to prevent someone else from publishing a crate named cool-wasm) is out of scope.
    • You can use your own domain as a namespace if you care about this, but it doesn't prevent someone else from registering packages with a given prefix either in the global namespace or their own domain.
  • Cargo strips off the namespace prefix before passing the crate name to rustc, so example.com/base64 is base64 to the compiler.
    • If you want to depend on base64 and example.com/base64 in the same crate then you'll need to use existing Cargo mechanisms to rename one in the local compilation context.
  • When a crate is first created on crates.io, if it has a namespace then that namespace will be interpreted as a DNS name and verified by fetching a list of authorized user keys.
    • The URL of the file is https://{namespace}/.well-known/rust-lang.org/package-namespace.json.
    • This file contains a list of keys that identify who is allowed to create packages in that namespace.
    • The key format is determined by the registry. In the case of crates.io it's computed as approximately sha256((namespace, user_id)) -- see gen-auth-key.py for details. Note that user_id part of existing API responses, so no part of this key is secret.
  • The use of https://{namespace}/.well-known/ as opposed to DNS TXT records (etc) is to enable users to publish namespaced crates without having to first pay for a domain.
    • Specifically they can put something on {username}.github.io.
  • Domain verification only happens when a crate is first created. Subsequent uploads use the existing permissions (either a crate owner, or a TrustedPub key).
    • This means that if control of a domain is lost then the new owner can't use that access to publish new versions of existing crates.
  • Whenever the crate name needs to be encoded somewhere that a / is semantically meaningful (URLs, file paths) it gets encoded to %2F.
    • This includes the paths in source archives.
    • Yes this means the crates.io URLs for namespaced crates are kinda ugly. I think this doesn't matter.

Screenshots:

Example package-namespace.json:

{
	"authorized_keys": [
		"sha256-kdRCstHTc6I8OxwyuW+QC8gFkn4tT8/g4a0M//cvCXk="
	]
}

Example Cargo.toml:

cargo-features = ["domain-namespaces"]

[package]
name = "example.com/hello"
version = "0.1.0"
edition = "2024"

description = "Hello, world!"
license = "0BSD"
repository = "https://github.com/example/hello"
homepage = "https://example.com/hello"
documentation = "https://example.com/hello-docs"

Please review Survey of organizational ownership and registry namespace designs for Cargo and Crates.io and the sub-threads

4 Likes

I've read those thoroughly, both when they were originally posted and before posting this thread. As you may remember, the thread I posted in 2020 asking for the same feature (Pre-RFC: User namespaces on crates.io) was one of the "past discussions" linked in your survey.

Is there anything in particular you'd like to draw my attention to?

If so, it would be good to acknowledge the prior discussions and highlight what is different. Besides how the namespaces are verified, this looked like someone just retreading existing ground, including vague references to requirements when they are clearly written out.

1 Like

I think it would be premature to write up the "alternatives considered" section of the RFC before knowing whether the Cargo and crates.io teams are OK with the concept of namespaces.

After all, the entire thread would just end if someone from one of those teams posts "no we still don't want namespaces, ask again in five years".

The bullet points in the first post are faster to read than lengthy prose, and certainly faster to write.

If the relevant teams are interested in proceeding to a full RFC then don't worry, you'll have plenty of reading material. My Obsidian tag on this topic (collecting known namespacing designs + considering their applicability to Rust) runs to several tens of thousands of lines.

I think there is a misunderstanding. We are not blocked on an implementation. We are not short on ideas. What is needed is someone to have an idea and explain how it solves the problems that have previously uncovered or making a convincing case why we should flex on some of them.

I'm not looking for an RFC or full alternatives section. Long prose documents can be easy to lose people. What I would find helpful to see this worth bringing to other's attention is an analysis of this against the requirements and a contrast with the DNS case and its analysis.

From my own work on writing all of that up, I have strong doubts about DNS namespacing. I'm personally split between a simple identifier namespace or organizational tagging.

1 Like

Augmenting my analysis with this or doing your own would also be a great help for this effort.

OK, I'll write this in the same style as the survey. Maybe that will help with the communication.

(edit: I've been writing this for several hours now and the prose is drifting from polite formal to four-beers-guy-at-the-pub, sorry)

Use cases

Library author

An individual has written a Rust library that has the same purpose (e.g. implements the same protocol or encodes/decodes the same format) as a library that already exists. Reasons include a different design focus (e.g. support for no_std vs aiming for an ergonomic std-integrated API), different API design ideas, or a belief that they could write a better implementation.

They would like to publish their library to crates.io, but the current naming policy (= lack of namespaces) prevents them from doing so. So push their code to GitHub, don't tag it since there's no point, and set a timer to check back in a year to see if the situation has improved.

Examples:

  • I've written an implementation of FUSE, but I can't publish it on crates.io because there is already a FUSE library published (fuse).
  • I've written an implementation of the SANE ABI and network protocol, but I can't publish it on crates.io because there is already a library for something called SANE (sane) which appears to be an unrelated project.
  • I am currently working on a pure Rust implementation of SLEIGH, which I won't be able to publish on crates.io because there is already a binding to the Ghidra implementation of SLEIGH published (sleigh).

Umbrella project

I think this is what your survey calls "organizational ownership". Basically some group of people are working on a project and they've got a bunch of related crates, but sometimes when they create a new crate they find the name is already in use by someone else. They'd like a way to guarantee a lack of name collision within the scope of their project, and maybe some sort of reputation hint.

Example: Cranelift

The Cranelift project consists of dozens of libraries written by The Bytecode Alliance. Some of them are published together on a lock-step version scheme (cranelift directly depends on cranelift-codegen etc), others are published on their own schedule (cranelift-preopt, cranelift-simplejit).

If they ever want to publish a subcomponent named "PVM" then they'll be in trouble, because cranelift-pvm is a v0.0.0 published by some unrelated person. In this particular case it's a v0.0.0 so maybe the anti-squatting rule could be applied, but in principle there's no reason that the same package name couldn't be in actual use.

With domain namespaces the Cranelift project could have cranelift as the top-level crate and publish their various subcomponents with names like cranelift.dev/cranelift-simplejit (or just cranelift.dev/simplejit).

Example: Serde

The Serde project has serde (+ serde-core and serde-derive), serde-json, and some deprecated crates that aren't showing up in cargo search so I'm gonna ignore them but I know that set contains at least serde-yaml.

There's also a bunch of other packages on crates.io with names like serde-rson and serde-reflection that are published by people unrelated to the Serde project and provide some sort of Serde-related functionality.

This is a pretty common pattern in the Rust ecosystem, and I don't personally see anything wrong with it, but it's been a common them in earlier forum posts about namespaces. Basically when there's a project with a lot of ecosystem growth around it there's a desire to separate the {project}-* glob into "offical" and "unofficial" libraries. Under the assumption that any project with that level of adoption probably has at least a basic .org / .net / .rs / etc homepage, the use of domains as a package namespace would solve the reputation issue (if the project maintainers care).

Assumptions

Most Rust programmers are not willing to depend on packages that are not hosted on crates.io because (1) Cargo doesn't support fetching source archives from non-registry sources (see rust-lang/cargo#16005) and (2) crates.io doesn't allow uploading packages that rely on dependencies hosted off of crates.io.

There are non-Cargo build systems that support building Rust code, but most Rust programmers are not willing to use non-Cargo build systems. Libraries that cannot be depended on from a Cargo project are inaccessible to most Rust programmers.

Most Rust programmers do not care very much about the exact format of a package name, because the only time they see it is when adding that dependency to their project. The name that is more relevant to a typical developer is the library name, which is what rustc exposes the library's API under.

To the extent a developer cares about a package name, it is whether that name clearly describes the package's functionality and is easy to remember. For this reason it is common for library packages to be named after whatever standard or protocol they implement, e.g. a library for parsing and serializing XML would be called xml, and a library that implements the HTTP/1.1 protocol would be called http or http1.

Most codebases only use one library per major unit of functionality, for example only one XML library and one HTTP library. Library names such as xml and http are generally unique within the context of a single project.

Prior art

I will use C/C++ and Go as the comparison, because those are the non-Rust languages I'm most used to working in.

C/C++

In C/C++ (the two languages are equivalent for package management) dependencies are typically identified as URLs to a source archive plus a checksum. There is no central registry, and anyone with access to HTTP hosting can publish libraries.

The configuration syntax to add a dependency varies between build systems, but typically looks something like this:

C/C++ dependency declaration for libpng (libpng.org)
http_archive(
  # The `name` is local to this build configuration, and so doesn't have to be
  # globally unique. Different users might call this `libpng`, `png`, `libpng_cc`,
  # or whatever else makes sense for them.
  name = "libpng",

  # The location to obtain the source archive from is provided directly as
  # HTTP(S) URL(s). Depending on the build system there may be some sort
  # of macro mechanism to reduce duplicate typing when there's many mirrors
  # of the same file.
  urls = [
    "https://download.sourceforge.net/libpng/libpng-1.6.58.tar.xz",
  ],

  # The identity of a dependency is essentially its source archive's checksum,
  # which is therefore embedded into the dependency list (not kept in a separate
  # `.lock` file as in Cargo).
  #
  # Traditionally the syntax was something like:
  #
  #    sha256 = "28eb403f51f0f7405249132cecfe82ea5c0ef97f1b32c5a65828814ae0d34775"
  #
  # but SRI (or ad-hoc extensions therof) have become more popular as the
  # variety of checksum algorithms has increased.
  integrity = "sha256-KOtAP1Hw90BSSRMs7P6C6lwO+X8bMsWmWCiBSuDTR3U=",
)

And the dependencies are associated with build targets (each package can have many build targets) like this:

C/C++ build target for a simple library that depends on libpng
# This is roughly equivalent to a Cargo `[lib]` section, except there can
# be any number of libraries per package. Some C/C++ packages have
# only one library (libpng), some have a library and a binary (gzip), some
# have lots of both libraries and binaries (glib).
cc_library(
  name = "img_decoder",
  srcs = ["img_decoder.cc"],
  hdrs = ["img_decoder.hh"],
  deps = [
    # The syntax varies, but the essential point here is that the identifier
    # for a dependency is split into (local_name, build_target_path)
    # In this case: ("libpng", "path/to:libpng_target")
    "@libpng//path/to:libpng_target",

    # For ergonomics there's usually some sort of short syntax,
    # for example if a package has only a single build target in the
    # package root and that target's name is the same as the package
    # then its path can be elided.
    "@zlib",
  ],
)

In this model the identity of the package is decoupled from where its source archives are hosted, and it's common for source archives to be obtained from mirrors (either official or third-party).

On a theoretical level this could be viewed as a sort of distributed content-addressed storage, but in practical terms the identity is derived from its home page, which means it's derived from a DNS domain.

Thus anyone who can find a place to upload some tarballs can publish a C/C++ package and participate in the ecosystem on equal footing.

Go

The Go language has its own idiosyncratic dependency management system that offers excellent developer ergonomics but ignores every standard technology and common practice it possibly can, as per tradition.

The Go equivalent to a Cargo package is called a "module", and the Go equivalent to a Rust crate is called a "package". Each Go module has a go.mod file that declares its dependencies as (module_name, version) tuples, where the module name is a pseudo-URL. There is a companion file go.sum that is the equivalent of Cargo.lock. It stores checksums in an unknown format that scientists are still working hard to decipher, but is supposedly SHA256 computed over the actual source files (not the source archive).

Go dependency declaration for a simple binary

go.mod

module example.com/example/sometool

go 1.25.9

require (
	github.com/pmezard/go-difflib v1.0.0
	github.com/tetratelabs/wazero v1.11.0
	golang.org/x/sys v0.38.0
)

go.sum

github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/tetratelabs/wazero v1.11.0 h1:+gKemEuKCTevU4d7ZTzlsvgd1uaToIDtlQlmNbwqYhA=
github.com/tetratelabs/wazero v1.11.0/go.mod h1:eV28rsN8Q+xwjogd7f4/Pp4xFxO7uOGbLcD/LzB1wiU=
golang.org/x/sys v0.38.0 h1:3yZWxaJjBmCWXqhN1qh02AkOnCQ1poK6oF+a7xWL6Gc=
golang.org/x/sys v0.38.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=

Go module names are pseudo-URLs in that they consist of a (domain, path) tuple but do not have a scheme. Example Go dependency names:

  • golang.org/x/sys is part of the Go extended standard library, conceptually similar to the libc crate maintained by the Rust team. golang.org is (obviously) a DNS domain name under the control of the Go project.
  • github.com/tetratelabs/wazero is a third-party library that uses GitHub for hosting, and relies on that hosting service's default go-import tag (see below) for an identity.
  • rsc.io/pdf is a third-party library written by Russ Cox, one of the lead developers of Go. rsc.io is a DNS domain name under his control.

The Go team hosts a service that mirrors upstream source archives ("module proxy"), records observed checksums ("sum db"), and publishes a feed of newly-recorded module versions ("index"). Source archives are ZIP files, which the proxy can either serve from an upstream server or assemble itself by fetching from the upstream version control system (i.e. Git).

Mapping module names to upstream hosting

Given a Go package name like "github.com/tetratelabs/wazero/api", the Go tooling needs to do two things to actually get the code:

  • Figure out how to split it into a (module_name, package_path) tuple -- in this case ("github.com/tetratelabs/wazero", "api").
  • Given a module name, figure out where the source is located and download it (or ask the proxy to do so).

Both of these tasks are achieved through metadata that can be looked up by fetching https://{module_name}?go-get=1 and examining the obtained HTML for a special <meta> tag. Yes really. Welcome to Go.

Examples of go-import meta tags

Most major source hosting providers offer the tag by default, for example GitHub at https://github.com/wazero/wazero:

<meta name="go-import" content="github.com/wazero/wazero git https://github.com/wazero/wazero.git">

People who want to publish modules under their own identity (not coupled to their hosting) write a meta tag similar to the one at https://rsc.io/pdf?go-get=1:

<meta name="go-import" content="rsc.io/pdf git https://github.com/rsc/pdf">

Both of those examples have the source being fetched from a Git repository (before being packaged into a ZIP by the proxy). If an upstream wants to serve source archives, the syntax looks like this:

<meta name="go-import" content="example.com/gopher mod https://modproxy.example.com">

The go.mod file doesn't offer a way to map the full module name to a short local name. Each Go file's import section needs to fully name each dependency.

A Go file's import section
package main

import (
	// By convention the final name of the path is used as the symbol for this
	// dependency, so `wazero.Runtime` and `api.Function`.
	//
	// This equivalent to the pseudo-Rust:
	//   extern crate "github.com/tetratelabs/wazero" as wazero;
	//   extern crate "github.com/tetratelabs/wazero/api" as api;
	"github.com/tetratelabs/wazero"
	"github.com/tetratelabs/wazero/api"

	// However the symbol is actually derived from the dependency's `package`
	// statement, similar to Cargo's `lib.name` overriding `package.name`. This
	// allows Go packages to have unpredictable symbols at import time, but
	// is considered rude.
	//
	// Pseudo-Rust:
	//   extern crate "example.com/naughty-package" as unrelated; // ???
	"example.com/naughty-package"

	// For the benefit of readers it's considered good practice to explicitly
	// a local name when importing a package with a final path component that
	// is obviously not a valid identifier.
	//
	// Pseudo-Rust:
	//   extern crate "example.com/naughty-package" as yaml;
	yaml "github.com/goccy/go-yaml"
)

Overview of namespace design options

I'm not going to paste my entire notes here but just the highlights.

Requirements established by the Cargo / crates.io team

  1. The namespace design must be backwards-compatible. It must be possible to create and use non-namespaced packages with the same rules as today.
  2. Packages may not be renamed.
  3. Changing ownership of a crate must require affirmative action by the current crate owner, as determined by the crates.io database.
  4. Package names should be human-meaningful, without numeric IDs or UUIDs.
  5. Introducing namespaces should not cause a perception of unfairness in the crates.io ecosystem. a. Publishing a namespaced crate should not require resources that cost money beyond that of a non-namespaced crate (e.g. a top-level domain name). b. There should not be a perception that namespaced crates are inferior or lesser than a non-namespaced crate of the same name.
  6. Package identity should not be coupled to a third-party platform such as GitHub.
  7. There should not be a substantial increase in customer-support load imposed on the crates.io administrators.
  8. It should be possible to transfer ownership of a crate without breaking dependent crates.
  9. If a package is transferred to a new owner then some process exists by which the owners of dependent crates can be notified of and review the ownership transfer before new updates are accepted.
  10. The name of a crate should not mislead regarding which individual or organization is responsible for it.
  11. The possibility of two packages having the same Rust identifier is considered undesirable and should be designed against.
  12. Using sigils that are reserved characters in URLs is undesirable.
  13. Sigils must not interfere with existing syntaxes derived from package names (PackageId, Cargo feature names).

Notes on usernames

Most packages are published by individuals so there's a temptation to use a human-meaningful individual identity, whether a crates.io account username, a GitHub username, a verified email, etc. This is how the most popular package hosting service in the world works (GitHub), so it's not completely unreasonable as a starting point.

The main objection is that people often use their real-life name for a username, and sometimes they want to change their names.

Somewhat less obviously, GitHub usernames are mutable. Today's trustworthy github.com/jdoe might be github.com/jsmith tomorrow, and some new person is github.com/jdoe with unknown intentions. A user's GitHub account does have a stable identifier, but it's an opaque integer.

Emails are used by crates.io today and must be verified before publishing, but they have similar drawbacks to usernames in that individuals may want to change their email address without losing access to their crates

However, if you think about it emails are basically just a (username, domain_name) tuple. In fact so are GitHub usernames (ignoring the mutability). All forms of namespacing based on usernames have an implicit DNS domain name in them.

Notes on where namespaces attach

A site like GitHub attaches the namespace to the username. All of a user's personal repositories are published under https://github.com/{username}/. This does not necessarily have to be true in crates.io, and in fact if a user is allowed to be an owner of crates with different prefixes then it simplifies some of the questions around transfer of ownership.

The options are:

  • One prefix per account (GitHub model)
    • One less obvious drawback to this is that a user would only be able to have a single package with any given crate name. It's not obvious that someone would want to be the owner of multiple yaml crates, but maybe they've got some personal circumstances that have lead them to such a situation.
  • One prefix per crate, which is common in some ecosystems that use UUIDs for package identity, but UUIDs have already been ruled out by the requirements.
    • Random per-crate prefixes would also interefere with mirroring/replication of crates between registries.
  • Some external source of truth for prefixes, which get associated with a crate in an N:M graph.
    • Of course the obvious answer when asked "globally unique human-readable identifier" is a DNS domain name.

What's the story for organizations?

Organizations (corporate entities, non-profits, groups of blokes in sheds) also publish packages to crates.io. They would probably want to publish crates under their corporate identity, such as google/protobuf or facebook/buck2. Allocation of these namespaces is reputationally meaningful and needs some sort of approval process, whereas per-user namespaces should generally be issued freely. Not to mention the increased rate of renames, and potential for name conflicts -- there can be dozens of companies with the same name but in different jurisdictions.

This conflicts with the desire of the crates.io administrators to not spend time manually dealing with namespace-related issues.

Also, there are examples of the same legal entity having multiple organizations. On NPM Google has @google/ (e.g. @google/clasp) and @google-cloud/ (e.g. @google-cloud/storage). Allocation of organization namespaces, and of prefixes within organization namespaces (!!), seems likely to result in increased support load.

The obvious solution is defer to some higher authority, which leads us back (once again) to DNS domain names.

Ok so DNS domains

If a package can be given an optional namespace which is a DNS domain name then that solves several of the known requirements:

  • Existing non-namespaced packages work exactly as before, including creating new packages.
  • DNS domain names don't have a concept of renaming, the name is the name, so packages namespaced by a domain name have a stable identifier.
  • If the DNS ownership is checked when the crate is first created, and then crates.io ownership is used for subsequent publishing, then the ownership transfer story would be the same as for non-namespaced crates.
  • DNS domain names are human-meaningful, generally.
    • Like, nothing stops someone from registering aaaaaaaaaaaaaaaaaaaaaa.xyz, but nothing stops someone from uploading crates with UUID names today.
  • The requirement of not coupling identity to a third-party platform ... ehh, I get not wanting to couple it to a single vendor such as GitHub, but surely relying on ICANN is fine?
  • I wouldn't expect an increase in customer-support load on the crates.io team from this, since DNS domain names are self-service from crates.io perspective.

Regarding economic unfairness

Every time DNS domain names as namespaces comes up, there's an objection based on economic fairness. Leasing a top-level domain name costs money (recurring), being a tenant of someone else's domain name tethers you to their judgement. I get it.

But.

A cheap .com is like $15 a year, and these namespaces are optional. If someone doesn't want to pay a DNS registrar then they can use a free host like github.io, if they don't want to do that then they can publish under the existing non-namespaced system, and if they don't want to do that then they can file an RFC asking the crates.io team to set aside the .user.crates.io namespace for free namespaces. There's lots of options here, and more importantly they're easy to adopt at any point in the future if it becomes at all important.

Regarding namespaced crates being second-class

I sometimes see replies along the lines of:

If there's a crate named xml and a different crate named jdoe.bavaria-middle-school.de/xml then obviously nobody would choose the second one! Adding namespaces to crates.io will introduce a class divide between the cool early crates with snappy short names and the uncool latecomer crates. This is unfair, and namespaces should not be added until that unfairness can be addressed.

Posts like this do not seem to realize that class divide already exists, in that there are fortunate crates allowed to be on crates.io and unfortunate crates banished to the hinterlands of GitHub. Namespaces do not make them equal, but they're like 95% equal, which seems good enough.

Also, man, it is annoying to ask for a thing that would make my life better and have someone else deny you the thing because they think your life would not be as good as they would want their own life to be. If you don't want to publish under a namespace then don't do it. Ask for a filter on the crates.io UI to hide them if the sight of code from people who started using Rust after 2015 offends you.

Regarding ownership transfer, or: the Cargo / crates.io team's requirements are unreasonable

These requirements (summarized), if combined together, are in conflict with the concept of namespaces:

  • Packages may not be renamed.
  • Namespaces should be human-meaningful, not an opaque identifer.
  • It should be possible to transfer ownership of a package.
  • The name of a package should not misrepresent the owner of a package.

Using starlark as an example, its ownership changed from Google to Facebook. If it had been namespaced then that would have looked like google.com/starlark becoming owned by Facebook.

There are multiple ways to cut this knot, but all of them violate at least one of the requirements:

  1. If packages can be renamed then transferring the ownership would rename it from google.com/starlark to facebook.com/starlark.
  2. If namespaces can be opaque then it would have been named something like {4cc84f47-26ff-4bac-a99c-1b05f90e223d}starlark both before and after the transfer.
  3. If ownership of namespaced packages cannot be transferred then facebook.com/starlark would have been created and google.com/starlark would be replaced with a facade, then archived.
  4. If the name is allowed to be misleading then Facebook owning google.com/starlark is fine.

I'm personally in favor of (3), or at least a lightweight version of it. If creating a crate requires auth_check(new_crate.namespace, user.id) then you could require the same sort of check on transfer.

  • If Google transfers a crate to Facebook then I don't think Facebook would add Google to their domain's authorized key list, so the crate would have to be archived and re-created.
  • If the crate is transferring along with the domain, for example when a company is aquired, then the domain can put the new owner's keys in its authorized keys list and the ownership on the crates.io side is automatically approved.

Regarding duplicate crate names

This requirement is also in conflict with the concept of namespaces:

  • The possibility of two packages having the same Rust identifier is considered undesirable and should be designed against.

The whole point of adding namespaces is to allow multiple packages with the same crate name to co-exist on the crates.io registry.

It's this requirement that makes me believe the Cargo and/or crates.io teams are fundamentally against the concept of package namespaces. To the best of my understanding the members of these teams think a crates.io where xml, rust.opensource.google/xml, and janedoe.xyz/xml co-exist is a negative outcome.

Either this requirement goes away, or there's no point in even considering any sort of package namespace designs.

Regarding the selection of sigils

There's all sorts of sigils available in ASCII (we're not doing the APL thing) but they've been around for long enough that every single one has aquired one or more special meanings in various computer contexts.

  • example.com/foo
  • example.com$foo
  • example.com@foo
  • example.com#foo
  • example.com%foo
  • example.com~foo
  • example.com:foo
  • {example.com}foo
  • (example.com)foo
  • foo@example.com

It doesn't really matter which one is used, so I somewhat arbitrarily picked example.com/foo on the theory that people are familiar with that syntax from normal URLs.

/, %, and # would need to be escaped in URLs, % and $ have special meaning to the shell, ~ has special meaning on Windows, @ and # are used for Cargo package IDs, / and : are used for Cargo features.

  • For Cargo stuff the presence of a . can disambiguate, parallel = ["foo/rayon"] and parallel = ["example.com/foo/rayon"] is kinda unfortunate but not ambiguous.

Braced syntaxes like {example.com}foo and (example.com)foo don't have an obvious syntactic downside other than looking really weird. I don't know how adventerous people's feelings are with the syntax.

foo@example.com has the advantage of looking like an email address, but again @ has meaning to Cargo, and I'm not sure about the ergonomics of breaking lexicographic ordering.

Regarding security and timing of ownership verification

When publishing example.com/foo for the first time it's obviously important to verify that the person publishing the crate is authorized by example.com to publish crates under its aegis.

Is the same true for publishing subsequent versions to the same crate?

  • If domain ownership is checked once, at crate creation time, then the semantics are closer to today's crates.io. You can claim the name, then you're the owner in the crates.io database and you can do what you want with that authority regardless of what happens to the domain later.
  • If domain ownership and crates.io ownership is checked on each publish then it's more secure, but losing control of a domain would "brick" the crate without the intervention of crates.io admins. And I'm not sure what they could do, since package renaming isn't allowed. Maybe it's fine? I'd personally be fine with "you lose your domain you lose your namespaces", but then I've never lost control of a domain that I cared about.
  • If domain ownership but not crates.io ownership is checked, then losing control of a domain would let the new owner publish new versions of the existing crates. I think this is obviously bad and should be forbidden.

Misc notes on /.well-known vs TXT records

From what I can tell the most popular way to verify a DNS domain name is to ask the user to put a crafted TXT record into DNS. This has a couple downsides:

  • DNS propagation is potentially slow. It wouldn't be fun to be told you have to wait 48 hours before you can publish your new library.
  • DNSSEC exists but I don't have a good feeling for how widespread it is, so crates.io would need to do ... something? ... to mitigate the risk of allowing a crate to go out based on spoofed DNS data.
  • The crates.io codebase doesn't currently do explicit DNS record lookups so that's additional implementation risk.
  • TXT records are limited in size. I don't know how large the list of authorized users might get in a project's namespace with lots of contributors, and it wouldn't be fun if there was a hard limit imposed by DNS response size.
  • I want to allow free hosting providers with subdomains (specifically github.io) to work with this, and they don't allow custom TXT records.

Verification could be done with the ACME protocol from Lets Encrypt, but that seems pretty heavyweight for what is fundamentally just getting a list of user IDs.

GET https://{namespace}/.well-known/something.json is easy to explain, easy to understand the security properties (it's TLS), crates.io already does HTTPS fetches, and it works with any shared hosting that lets the tenant upload files (which is ... all of them, lmao).

Use cases that are mostly (but not entirely) out of scope

Crates that should be one package with multiple sub-libraries

Going back to Cranelift and Serde, it's common in Rust for a single project to have multiple crates.io packages that are really just one big crate. The serde and serde-core and serde-derive crates aren't independent, they're more like serde and serde::core and serde::derive where the compilation units are just split up for build hygiene.

While this is something that domain namespaces could slightly help with, IMO a better approach is to evolve Cargo's existing "open namespaces" support into something like the C/C++ ecosystem support for multiple libraries per package -- see "Prior art > C/C++" section below.

You could imagine something like this being uploaded as a single tarball to crates.io:

serde/
  Cargo.toml # [package] name = "serde"
  core/
    Cargo.toml # [package] name = "serde::core"
  derive/
    Cargo.toml # [package] name = "serde::derive"

And then when someone wants to depend on Serde, their Cargo.toml would look like this (kinda pseudo but you get what I mean):

[dependencies]
# `serde` is the actual package on crates.io
serde = { version = "1.0" }

# package = "with::colons" indicates that this is a sub-library of another
# library that's already depended on. The version will be pinned to the
# primary library, and there's no separate entry in `Cargo.lock`.
#
# This also solves the issue of what the `rustc` name of the sub-library
# should be, the user specifies it right here.
serde_derive = { package = "serde::derive" }

So while I recognize the technical feasibility of domain namespaces to solve that particular problem, I think it would be better to approach it more directly.

My understanding is that the Cargo / crates.io team have largely not considered multiple libraries per package as a design, so I don't know whether it's viewed with the same level of animosity as namespaces.

Use cases that are completely out of scope

C++ style scoping via synthetic top-level modules

My understanding of the Cargo feature called "open namespaces" is that it's a way to group multiple packages into a sort of synthetic top-level crate, so that the rustc syntax for modules within an external crate can transparently cross package boundaries.

When this feature is enabled there would no longer be a guarantee that x::y::z identifies a symbol within the crate x, it might come from a completely separate crate in a separate package and Cargo (or rustc?) synthesizes a new x at build time to re-export symbols from all packages that are getting fused together.

The motivation is (as I understand it) similar to the sub-library discussion above, where serde and serde_derive are separate top-level symbols in today's Rust but there is a desire to expose the latter to the user as serde::derive to better reflect the actual relationship between their codebases.

The use of the term "namespace" to describe this behavior is derived from C/C++, where namespace is used to add prefixes to symbols as a workaround for the language's lack of import scoping.

I don't really have a good understanding of the motivations or reasoning behind this feature, so as a very small favor, I ask that the topic of package namespaces not be conflated with synthetic module scoping even if the terms used to describe those features have some overlap for historical reasons. I want to keep the discussion focused on package namespaces to the extent possible.

1 Like

It's worth noting that requiring an HTTPS fetch to .well-known/something.json does increase the cost of this compared to the DNS-based approach. That is, if I have a domain, it's easy for me to set DNS records without extra cost most of the time, but if I want to serve files that's extra cost, often extra monthly cost. And then if I want to serve HTTPS, I either need to pay more money, or set up automation with lets encrypt. DNS avoids these costs (and it also avoids the need to keep my server's software up to date).

Regarding DNSSEC, DoH (DNS-over-HTTPS) is more widely deployed and has superior security properties. So I don't think spoofed DNS data is a real risk.

I think they're just in conflict with your approach for namespaces. I don't find any of your proposed solutions to this very compelling. It's not clear what you mean by facade. Do you mean an actual crate that re-exports the other?


I don't really think this is a good approach. DNS is fundamentally mutable, which is a property that we don't really want for the registry. People lose access to their domains all the time, and other people come into control of those domains (and it's possible for this to be done intentionally by bad actors).

3 Likes

What happens when a domain expires and another entity gets control over it? Should the existing crates get kicked out? Should they even be able to publish new crates in that namespace? If not, how would that be verified?

5 Likes

NPM already had a wave of supply chain attacks via custom domains with lapsed registration. An attacker would register the domain immediately after it expires, then use the account recovery via email to take control of the account. I don't think opening crates.io up to attacks in this style is wise.

It also seems odd to require paying recurring fees to third parties such as domain registrars.

8 Likes

It's worth noting that requiring an HTTPS fetch to .well-known/something.json does increase the cost of this compared to the DNS-based approach. [...]

Supporting alternate methods of domain verification (TXT records, ACME, postmaster@ email, etc) can be added later if there's enough demand for them. For the initial version of an already-contentious feature I think it's important to use a verification mechanism that is easy to understand and deploy.

Every current user of crates.io has a GitHub account and therefore free access to a {username}.github.io domain, most domain registrars offer a small amount of free static file hosting, and many free hosting providers (including github.io) can be used with custom domain names.

I think they're just in conflict with your approach for namespaces. I don't find any of your proposed solutions to this very compelling.

[...]

I don't really think this is a good approach. DNS is fundamentally mutable, which is a property that we don't really want for the registry. People lose access to their domains all the time, and other people come into control of those domains (and it's possible for this to be done intentionally by bad actors).

Which approach to namespaces do you think would be better? Based on your response I'm guessing you'd prefer an opaque per-crate identifier such as a UUID, which per the thread is not a design that the crates.io team is willing to consider at this time.

What happens when a domain expires and another entity gets control over it? Should the existing crates get kicked out? Should they even be able to publish new crates in that namespace? If not, how would that be verified?

This is described in both the first post and the more detailed notes posted subsequently.

In the current proposal updates to existing crates are authorized according to crates.io ownership rules, which mean that (1) domain transfer doesn't grant crate ownership and (2) domain expiration doesn't revoke publishing permissions on crates.io.

Alternative choices are described, such as requiring verification on each upload. This would improve security at the potential cost of ecosystem churn.

The verification process is also described, at great length. Are there any points you found unclear?

1 Like

NPM already had a wave of supply chain attacks via custom domains with lapsed registration. An attacker would register the domain immediately after it expires, then use the account recovery via email to take control of the account.

NPM doesn't support domains as package namespaces, account takeover affects packages regardless of whether they're namespaced or not, and package namespaces are unrelated to security measures put in place to prevent account takeover.

Scenario, in order:

  1. User A registers domain X.
  2. User A publishes a crate under the X namespace.
  3. Domain X lapses.
  4. User B registers domain X.
  5. User B publishes a different crate under the X namespace.

My understanding of your proposal is that this would be permitted? imo this is a serious issue, as two things would be shown as connected despite having no common ownership.

4 Likes

I'm not sure I understand why that would be bad, since the two crates would have different names, but if you wanted to mitigate it then some notion of namespace ownership in crates.io would be fine. The policy of which users are allowed to publish which crates is fully under the control of crates.io, and it could be made arbitrarily strict.

People would almost certainly look at namespaces as a sense of common ownership, providing a certain level of trust/reputation behind it.

4 Likes

To some extent that's true -- if I see rust-lang.org/libc then I know it's published by the Rust libs team -- but the Go ecosystem shows that for packages published by individuals the level of trust/reputation derived from identity is clamped to zero.

Updates to packages posted on someone's personal homepage get a higher level of scrutiny compared to packages from golang.org or go.googlesource.com, because you don't know anything about the person who wrote it. This is true regardless of whether the update is being published by the same person who published the initial version.

Note that this is also broadly true in today's crates.io -- I have a certain level of trust that crates published by github:rust-lang:libs are trustworthy, but a lot of crates are just some individual, so there's no telling what's in them and their updates need to be reviewed much more carefully.

I don't think introducing namespaces changes that dynamic.

1 Like

You're assuming that people will publish under personal (sub)domains, not professional. I'm envisioning something like tokio.rs, which is unquestionably used as a source of "officialness". There are many widely used crates that have their own domain (bevy, tokio, and serde all immediately come to mind). While it's not likely that the domains will lapse, it is definitely something that needs to be considered as an inevitability for something.

3 Likes

DoH protects the communication with the recursive resolver using TLS. DNSSEC protects the communication between the recursive resolver and the nameserver using offline signatures. They are complementary to each other.

1 Like