Child Thread: Survey of registry namespace designs for Cargo and Crates.io

For background context and links to other threads, see Survey of organizational ownership and registry namespace designs for Cargo and Crates.io. This thread assumes that one has been read.

This thread is for summarizing approaches for organizational namespaces in the registry. Other general approaches include:

Debating which proposal Cargo and Crates.io should go with is off-topic for this thread. Please create your own thread.

Prior art

Those relevant for this thread

Note that other ecosystems continue to exist with flat namespaces, including

  • RubyGems
  • PyPI
  • Hex
  • Hackage
  • CRAN
  • LuaRocks

References

CPAN

Perl has partially open namespaces, meaning you can add new modules, or namespaces, under an existing one but you can't add arbitrary content to another namespace (TODO: verify these details).

These namespaces are reflected in CPAN, meaning there is a 1:1 relationship between the package's name in the registry and how you use access that package in your Perl code (TODO: is it 1:1 like packages-as-namespaces or like Python where it is by convention). Perl and CPAN allow arbitrary levels of nesting. These namespaces are public, meaning anyone can publish to them.

Culturally, the Perl community focused on coherent APIs, organizing around function, and did not abuse this for organizational namespacing.

References:

npm scopes

Packages can be named @myorg/mypackage. The @ is to avoid referring to githuborg/repo and to disambiguate import paths.

myorg is a concept on the registry server.

Packages are private to myorg by default.

Scopes are optional. It seems that exist projects have mostly kept to their flat names while new projects have used scopes. Scopes are mostly used for organizations.

Scopes are first-come-first-serve. Scope-level account management exists to deal with ownership, transfer, dispute resolution, etc.

Compared to use cases:

  • Must trust the namespace; no additional verification
  • Squatting and typo squatting is reduced to the namespace portion
  • Not friendly to renames or namespace transfers

Compared to requirements:

  • Using / complicates the URL story for crates.io and docs.rs
  • Using @namespace/ avoids complications in parsing feature activations (@ disambiguates /) and PackageId Specs (@ at start is unambiguous in short form, @ disambiguates / in URL form`)
  • Access of namespace in Javascript doesn't translate to Rust code and a design there would be needed

References:

Packagist (PHP)

Packages can be named myorg/mypackage

TODO: any more details?

Maven groupId

The groupId uses a reverse domain naming. These namespaces are verbose.

The groupId corresponds to a controlled domain using a DNS TXT record. Governance of namespaces is delegated to ICANN. Github users without a dedicated domain can use io.github.username.

To avoid breaking users, some organizations choose to continue to publish under a previous name.

From Package Management Namespaces | Andrew Nesbitt

In January 2024, security firm Oversecured published MavenGate, an analysis of 33,938 domains associated with Maven group IDs. They found that 6,170 of them, roughly 18%, had expired or were available for purchase.

Compared to use cases:

  • Users can't trust the ownership of namespaces due to domain expirations
  • Squatting and typo squatting is reduced to the namespace portion

Compared to requirements:

  • Not friendly to renames or namespace transfers
  • Squatting is delegated to ICANN
  • Barrier for participation is higher on a technical and potentially financial level
  • Access of namespace in Java doesn't translate to Rust code and a design there would be needed

References:

Nuget prefix reservation

Organizations can register package name prefixes that give them exclusive access to publishing under. These packages get a badge. Badges are permanent.

Existing packages at time of prefix registration are still allowed to exist and don't prevent registration of the prefix but they do not get the reserved prefix badge.

Only one owner of the prefix needs to be an owner of a package to get the badge. Package owners who don't own the prefix cannot remove owners that do. An owner of the prefix must always be the owner of the package to ensure the badge is permanent.

Sub-prefixes can be carved out and transferred.

An owner can mark a prefix as public, allowing anyone to publish to it.

Requesting a prefix is done by emailing nuget.org maintainers who also handle disputes. Unclear what verification is done.

Compared to use cases:

  • Squatting and typo squatting is reduced to the prefix portion
  • Not friendly to renames or prefix transfers

Compared to requirements:

  • Could break up existing culture of publishing packages under other's prefixes
  • Potential for abuse with people reserving prefixes used by other projects
  • Verification process is pushed onto the crates.io team

References:

Potential solutions

Partially-open Rust namespaces (in-work)

See Survey of organizational ownership and registry namespace designs for Cargo and Crates.io for the description.

While people may choose to misuse this for organizational registry namespaces, they will have to deal with misalignments and could set a negative example for others, putting them down the wrong path.

As this is focused on APIs and not Organizations, the following are examples of features that are out of scope:

  • helping users with renames
  • verification of a namespace's ownership
  • arbitrary depth to allow both organization and API namespacing
  • preventing participation in a namespace across dependency sources

While the following are examples of potential features that would be in scope:

  • Aligning the package and registry name with the Rust namespace
  • Suggesting other packages from the namespace (e.g. in cargo add)
  • Navigating within the namespace (e.g. on crates.io and docs.rs)
  • Relaxing the orphan rules within a namespace

As the registry namespace is the Rust namespace, nesting under an organization's name would also lead to more verbose code.

Compared to use cases:

  • Must trust the namespace; no additional verification
  • Squatting and typo squatting is reduced to the namespace portion
  • Not friendly to renames but renames are only a problem if the root package is being renamed
  • Not friendly to transfers but transfers are only a problem if something is being promoted / demoted out of the official API and not based on transferring ownership

Compared to requirements:

Unreserved prefixes (no-op)

Maintainers could name their package <namespace>-<name>.

For brevity in code, maintainers can drop the namespace within Rust by setting lib.name = "name" or users can rename the package in their dependencies table.

Compared to use cases:

  • No trust in the namespace as anyone can publish in it
  • Not friendly to renames or namespace transfers

Compared to requirements:

  • If using the lib.name trick, how to reference the package in Rust code is unclear though that is an existing issue for renames and kebab case (cargo#15887)
  • Without the lib.name trick, access within Rust is verbose

Future possibilities:

  • Encode this in the ecosystem by adding to .cargo/config.toml a cargo-new.prefix = <string|bool> with a default of $USER for packages not being added to workspaces
    • advice will be given for changing this
    • Users can always disable with cargo-new.prefix = false
    • Repos that want to default to their Rust namespace can set cargo-new.prefix = "namespace::"
  • Resolve cargo#15887, making discovery of Rust namespace more discoverable

Organizational registry namespaces

There are many variants of this that have come and gone where most of the parts can mixed and matched.

[lib] and [[bin]] crates that infer their name from package.name would not include the namespace. Alternatively, the user could be forced to rename the package but that is more of an advanced feature that we shouldn't force on all users. Import collisions are possible which impacts usability. Like with multiple semver-major versions, users can rename the package to avoid collisions.

Specifying a namespace:

  • A separate package field (e.g. package.organization)
    • Still needs a separator syntax for dependencies and cargo add unless you also add the field/flag to them which makes things more verbose and makes it easy to accidentally leave off in the dependency and get something from the flat namespace
    • Having separate fields makes it less likely people will be confused about how to reference the package in Rust since it can just be dropped
  • @namespace/name
    • Looks like namespace should be used in Rust but generally it is dropped instead
    • Works within Cargo but complicates docs.rs and crates.io URL schemes
    • Common prefixes for CLI completions, at least, adds friction to using completions
  • name@namespace
    • Like with numeric suffixes, seems easier to train users to drop the suffix in Rust code than a prefix though resolving cargo#15887 can help
    • Starts with a more unique name, making CLI completions easier
    • If namespaces are relevant to sort order, then this runs counter to that but in that case, would partially open Rust namespaces be more appropriate?
    • However, causes ambiguous short-form PackageID Specs
  • name#namespace
    • Similar to name@namespace but avoids the PackageID Spec issue
    • For PackageID Spec URL form, would be percent encoded
    • Framing this as an organizational tag can help in training users to drop the suffix when using name in Rust code though resolving cargo#15887 can help

Namespace reservation:

  • None, reserved on first-publish
  • None, explicit, high friction reservation workflow
    • No official API, against the terms of service to automate
    • Extreme defaults for rate limiting (e.g. 1 a day) and maximum namespaces per account (e.g. 3)
  • Github Org
    • Org names are mutable which means you can lose access to your package and other people could publish to it
    • We want to decouple crates.io from github
    • Git hosting is an implementation detail and users should feel free to migrate hosts without breaking users
  • Reverse fqdn
    • Barrier for participation is higher on a technical and potentially financial level
    • Ownership is mutable which means you can lose access to your package and other people could publish to it

Helping with renames and transfers:

  • Add Index redirect which get resolved as the entry they point to
    • Moving the everything under the old name to the new name and redirecting the old name to the new name would likely be simplest
      • Assumes no version conflicts between the two names
      • TODO: can this work with the resolver?
    • Or you can register redirects with the server so that when you publish to a package, redirect entries get added to the old name
      • Care might be needed if we want to allow old versions under the old name to work with Cargo <1.19 as older versions of Cargo don't ignore unknown entries (#4026)
      • TODO: can this work with the resolver?
    • The redirect may have a minimum MSRV for access
      • Likely need safe guards that entries that used to exist have a higher MSRV (and maybe publish date) than when redirect support was added
      • Or you redirect the new name to the old name but then we can't treat one name as canonical
    • Could potentially help with normalizing _/- in package names on the registry side (only the most obvious: all one or all the other and not every combination of mixed)
    • Registry should enforce redirect cycle checks
    • Should registries resolve multiple redirects into one redirect?
  • Add mutable metadata to the registry and include a way to direct users to the new name of a package
  • Have the resolver unify package names across namespaces
    • If maintainers decide to generically name their packages (e.g. foo/derive) instead of using partially-open Rust Namespaces, then this will lead to bad results
    • Likely wouldn't work with the resolver as the resolver needs to be able to enumerate all versions for a package at once and not as we discover each workspace used
    • Could also have problems with ambiguous versions

Publish permissions would only be managed at the namespace level. Any publish permission changes at the package level will return errors. Motivation:

  • It supports the emphasis that this is organizational related. All of the packages should be coming from the same entity anyways.
  • This operates as an MVP, giving us an opportunity to gauge the need for finer grained controls and collect use cases to see if we can come up with ways that keep the emphasis on organizations while giving them the flexibility they need
  • Feedback we've heard so far, organizations only want a single publisher for their packages.

References:

Regarding renames and redirects — GitHub allows renaming accounts and repos, serves redirects for a limited time, and allows reuse of old names. Non-permanent redirects effectively make the identifiers non-permanent and non-unique over long term.

It's a footgun, because it's easy to assume that a user can be identified by their login, or a repo by its URL, but that's not reliable long-term. There are already cases on crates.io where accounts behind a login used to publish a crate have changed. GitHub API has unambiguous numeric IDs for accounts, but there are lots of external places where GitHub account names and repo URLs appear without a trustworthy ID. Disambiguating them requires keeping history of login => id mappings and knowing when each name has been written. It's hard to do, and very easy not to do. Similar problems would affect packages if they could be renamed without permanent redirect.

Renaming is super useful from user perspective. If you ever add renaming, then please don't allow reclaiming old names as easily as GitHub. It has the downside of keeping old names occupied/wasted, but otherwise all names lose their uniqueness. Alternatively, give crates some ID/GUID behind the scenes that can be looked up and used as the real permanent identifier that survives renames.

5 Likes

In my experience that depends how the organization is structured, and whether the namespace is per project or actually per org. If it's per project, then it could work - large orgs would create project namespaces when needed (but then the org would want to have a reserved namespace and visible ownership for its projects, making it a 3-level namespace...).

But if the namespace is the org, then it can be a problem for large complex organizations. They can have many different teams/departments/divisions/subsidiaries/regions and half-merged acquisitions, making them behave like multiple different orgs that operate separately (publishing things of varying importance and visibility, by different teams, at different schedules), but externally share one namespace.

This problem comes up in Web specs, where the same-origin policy is a fundamental unit of separation. Features get designed with per-origin assumptions (quotas, permissions, isolation), but then a task like "just add a metatag at the root of the domain" turns out to be very hard when hundreds of teams share a major namespace like microsoft.com. If crates.io literally got a namespace like microsoft, having it owned by a single account wouldn't suffice. They'd build their own account management and publishing pipeline in front of it, and crates-io would lose detailed information about who actually published what. Complex orgs want complex account management. That's why "SSO tax" exists :slight_smile:

5 Likes

You have hit on an important point (both here and in the other thread at Child Thread: Survey of organizational ownership designs for Cargo and Crates.io - #2 by kornel): the real world is messy, and simple hierarchical designs rarely work.

Perhaps it would be better to not go down the route of hierarchical namespaces or ownership at all. Tag systems tend to be more flexible and reflect reality better. I don't know what such a design would look like in detail for this case (tag based namespacing sounds... difficult), but perhaps it is something worth thinking over.

Indeed, that's another axis in the design space. The indicators of ownership/authenticity/grouping don't have to be exactly the same thing as unique identifiers of packages at the protocol level. There could be additional metadata available in the registry or crates-io API, to be used as part of the user interface of crates.io, Cargo, and by 3rd party supply chain tooling, without package identifiers themselves having to carry all that information.

For example, cargo add @epage/clap could be interpreted as "add the clap crate, but only if it's owned by epage", and maybe save that requirement in Cargo.{toml,lock} like clap = { version = "4", but-update-it-only-if-still-published-by="epage#id6743" }. So you could have ownership-representing "namespace" at the UI level, explicitly locked provenance for cargo update, but without anything having to change in registry URLs or Rust identifiers (there are still other reasons to have more controlled naming, but the point is that the crate naming scheme alone doesn't have to be the solution to all the use cases).

2 Likes

Hence the thread Child Thread: Survey of organizational ownership designs for Cargo and Crates.io

Homebrew Taps is another take for registries and namespacing.

It's technically quite similar to private registries in Cargo, but Homebrew's user interface makes all the difference. It's trivial to add a tap without editing any config files, so installing packages from taps doesn't feel more complicated or second-class compared to installing core Homebrew packages. It feels more like just installing namespaced Homebrew packages.

I can run:

brew install filosottile/musl-cross/musl-cross

which is a shorthand for:

brew tap filosottile/musl-cross
brew install musl-cross

(and brew tap user/repo is a shorthand for brew tap namespace-prefix URL with a GitHub URL implied, but you can specify any other URL and then any prefix you want).

These commands subscribe to the "alternative registry" of filosottile/musl-cross and make its packages seamlessly available for installation and updates. brew info musl-cross finds the package from the tap (it requires a prefixed name only when there's a naming conflict between taps).

So even though brew install $package mostly behaves as one global shared namespace for packages, it only searches the taps I've added, not the whole world. So it's not really global, but more like a per-user curated namespace, populated only from user's preferred sources.

This is an interesting compromise for curation vs openness. The Homebrew core tap is curated, which is reassuring it won't suddenly spring drive-by typosquatted packages. At the same time it feels pretty open, because adding another tap is so easy, and once the tap is added, its packages blend with the core packages instead of being second-class. It also makes it very easy to publish company-internal tools without polluting a public registry.

Homebrew doesn't have dependencies as sophisticated as Rust/Cargo, so the taps design doesn't have an answer for dealing with dependencies-of-dependencies and mixing of registries.

4 Likes

I feel this is a big part of what makes this design of implicit merging work. Along with the fact that taps generally are for adding packages that don't exist, so that in general brew install $package is almost always going to install it from where you expect, or give a "not found" error if you forgot to add the tap.

DIsallowing reclaiming is under the requirements at Survey of organizational ownership and registry namespace designs for Cargo and Crates.io. Solutions like UUIDs are being discussed at Child Thread: Survey of alternative identifier designs for Cargo and Crates.io - #6 by dlight.

From my understanding, the feedback received that you are responding to included very large, complex organizations.

Thank you for sharing for getting more insight into how others deal with things.

There had been past discussion along these lines but I felt they were too far out from the existing design that I marked them out of scope. Discussing this further is likely better for Survey of organizational ownership and registry namespace designs for Cargo and Crates.io.