[WIP, Pre-RFC] Federation

Introduction

Federation is a mechanism by which different registries can be consistent with eachother.

Motivation

We want separate, independent registries. We want projects to have their own, official registries. We want them to go down without causing everyone else issues. We want to be able to audit them. We want to be able to restrict them. We want to be able to adjust them as we see fit.

Goals

  • Less dependence on crates.io mod team to do what we want.
  • Consistency/deduplication across registries.
  • “Local” (same city/country/continent/ocean/etc) registries for improved caching (especially behind e.g. satellite connection). Since (currently) crates.io never uses HTTP for anything, it’s impossible to use a transparent proxy here.
  • Official registries, e.g. “the Diesel project registry”/“the Tokio project registry”.
  • Consensual redundancy. (user must manually select the registries that are still alive in order to use them - no MITM potential.)
  • Allowing crates that depend on example.org/foo to be published on crates.io.

Non-goals

  • Allowing crates to be published on any registry and later transferred to the proper registry based on namespace and signature. (this could be left to another RFC, as it’s not incompatible with this design.)
  • Mastodon compatibility.
  • ??? [WIP]

Outline

  • This RFC defines a namespacing and registry management technique/system called “federation”. The namespaces are the domain names of the instances/registries hosting crates.
  • Additionally, we bring forth some important changes to Cargo and crates.io:
    • Dependencies of the form depname = "version" will generate a deprecation warning, unless the crate explicitly sets a default registry. If no default registry is set, crates.io is used.

      E.g., the following produces no warning, because default_registry was specified:

      [package]
      name = "example"
      version = "0.0.0"
      authors = ["foo@bar.example.org"]
      default_registry = "crates.io"
      
      [dependencies]
      foo = "0.3"
      

      Had default_registry not been specified, it would’ve produced a deprecation warning. crates.io should thus accept no new packages that don’t specify a default_registry. It’s also possible to not use default_registry at all, and instead specify foo = {registry = "crates.io", vesion = "0.3"} [TODO either figure out how to specify the registry as part of the name, e.g. foo@crates.io = "0.3", or mark this as an unresolved question]

    • Cargo gets a new config option: the user’s instance. This is the instance to be used for fetching all crates, as well as publishing crates. Additionally, the publishing instance may be overridden by Cargo.toml or a command-line option. This will default to crates.io.

    • As mentioned previously, all crates are fetched through the user’s instance, which defaults to crates.io. As such, crates.io will (permanently) cache remote crates, and there’ll be an API for remote registries to push their updates to crates.io. (this makes security updates propagate faster.)

    • We also expand the current policy of DMCA takedowns - take down the crate and all crates that depend on it - to also include cached crates. Additionally, to keep such crates from resurfacing, they also get added to a blacklist, which is checked whenever new remote crates are pulled in.

    • [??? I may be forgetting something here but I can’t remember what it was]

  • This RFC defines a limited API between registries. It also defines some things that need to be registered with various other things - MIME types, .well-known’s, etc.

Flowchart

[TODO: explain this in english]

cargo update
    |
    v
GET https://$USERREGISTRY/.well-known/cargo.txt
    |
    v
GET https://$USERREGISTRY/$APIPATH/index.whatever
    |
    v
GET https://$USERREGISTRY/$APIPATH/crate/$NAMESPACE/$NAME --> [server] GET https://$NAMESPACE/.well-known/cargo.txt --> etc, same process as above --> add it to local index
    |
    v
build local index

cargo publish
    |
    v
GET https://$USERREGISTRY/.well-known/cargo.txt
    |
    v
POST https://$USERREGISTRY/$APIPATH/publish
    |
    v
[server] for each $REGISTRY in $REMOTE_REGISTRIES do < GET https://$REGISTRY/.well-known/cargo.txt --> POST https://$REGISTRY/$APIPATH/publish >

(the latter is so the remote registry doesn’t need to keep spamming GET requests, it increases overall fediverse stability and improves security, as security updates are pushed rather than pulled.)

Notes: Both servers can decide which servers they’re going to talk or not talk to. They can also decide what kind of operations they’re gonna accept or disallow. For example, an audited registry would not accept remote publishes.

When publishing, the server needs to specify some things about it, like its domain, signature/certificate, etc. The signature/certificate is added to prevent impersonation. The actual implementation details of this are [WIP/Pre-RFC]. The same mechanism could be used to authenticate clients. (No OAuth tokens - use public keys.)

Unresolved Questions

???

Prior Work?

The most widespread federated protocol is plain old email. It’s not the most suitable source of inspiration for a federated package manager, however.

The second most widespread federated protocol is probably ActivityPub. It definitely has had more success than XMPP in the “federated” (server-to-server communications) part. This is what this RFC is based on.

???

???

3 Likes

Could you explain how this solves the Silo Effect?

Given I depend on (Soni-registry/serde) and another of my dependencies depends on (crates-io/serde), how is the serde version selected? (Note that if two different versions are used, everything breaks.)

What requirements do registries have to maintain consistency with each other? What happens when package lolcatz is published to two different registries with different, incompatible contents concurrently?

1 Like

given Soni-registry/serde as a fork of crates-io/serde, you’d need to Cargo.toml it to map crates-io/serde to Soni-registry/serde.

early updates/registry-level overrides can just directly override the remote crate.

federation and domain-based namespacing go hand in hand.

I guess this Pre-RFC tries to describe something similar to what I've proposed here:

In other words soni-registry/serde will be a mirror of crates-io/serde. Of course you will not be able to patch serde like this, to do it you will have to use [patch] section and redirect serde to its fork.

crates-io/serde (on soni-registry) will be mirror of crates-io/serde (on crates-io).

Here, take a look at this: https://activitypub.rocks/

This is what mastodon is based on. If you’re on instance example.org, your local handle is e.g. @Soni (with an @Soni@example.org alias), where a cybre.space user would be @SoniEx2@cybre.space on your instance.

You can fetch @SoniEx2@cybre.space's contents either through your instance, or directly from cybre.space. The former is usually preferred, for privacy reasons.

So, every crate (dependency and the registry metadata) is tasgged with its registry of origin?

This is an important detail that you gave no indication of in the original post.


With all due respect, and apologies for drawing patterns where there may be none, this feels like a common problem with your proposals.

It doesn’t look good for you when your first appearance on the forum is basically “how do I fork Rust” and then further interaction boils down to an underspecified “I want this”. Paraphrasing you, your proposals benefit you, and any other effect is a side effect.

If you want to be taken seriously, I’d heavily suggest picking one of the dozen or so beginnings of ideas that you’ve shared on these forums, and creating a full, thought out RFC for it. This is a Request For Comment, and thus you should be open to discussion about the proposal, and ready to change your position if better arguments are provided. Most importantly, though: even though it isn’t an implementation plan, it needs to be specific about how it effects all surfaces it touches.

As is, your proposals read more like a wishlist than an actual actionable item that can be properly discussed. We don’t have your background knowledge. Convince us of what you think will be beneficial to the ecosystem as a whole.

14 Likes

I actually did very clearly specify that cargo fetches a domain/name pair from the user’s chosen registry, and the user’s chosen registry forwards it to the remote registry.

GET https://$USERREGISTRY/$APIPATH/crate/$NAMESPACE/$NAME --> [server] GET https://$NAMESPACE/.well-known/cargo.txt --> etc, same process as above --> add it to local index

(altho I should’ve said something like “add to server’s index” or something)

🤷 Guess we now know better than to skip horizontal scrolling of text to the end of a 172-character line in a flowchart that actually contains detailed information rather than just summarizing something which is given in detail in prose.

You see a problem there?

I’m not good at converting my brain’s flowcharts into text (or even into flowcharts, I guess…). The best I can do is bullet points, which isn’t exactly “text”… but okay, I see your point.

Why would projects host "official registries" though? What advantage does this offer to them (compared to the current crates.io situation)? Especially if the crates are cached by crates.io anyway.

2 Likes

To associate the crates with the official domain.

So they don’t have to deal with crates.io moderators.

Caching makes it more advantageous, not less.

Do you have any example of big projects wanting something like that? If I were to manage a big project I wouldn’t want to deal with the infra and maintenance work to setup a custom registry.

4 Likes

Given that federation and namespacing go hand in hand, and ppl have mentioned namespacing in relation to big projects, there is some desire for it.

Personally, I’d run my own registry for my own projects, That makes them clearly associated with me, and I don’t have to deal with name conflicts, unavailable names, the mods, etc.

While some kind of namespacing is required for federation to work, federation is not needed if the only thing we want is namespaces. Federation adds even more code to maintain for cargo/crates.io, and adds a lot of complexity and costs on the projects that want to host their own registry. Unless there is an explicit desire for it from big projects I don't think it's wise to spend time designing/implementing it.

Namespacing alone solves your problems with naming conflicts and unavailable names. What are your concerns with crates.io moderation?

2 Likes

The reason crates.io disallows publishing crates with dependencies not hosted on crates.io is that crates.io wants to guarantee that everything is available. If there is a crate that violates Rust's code of conduct that crates.io declines to publish, crates.io would also decline to cache/federate that crate as well. I'd like to see an eventual RFC address this issue.

Creating a server that caches crates.io and has more crates that crates.io doesn't have, and using that server in cargo, should be technically possible today (likely not as easy and "out-of-the-box" as we'd like it to be, but possible). I think this RFC would be stronger if there was an accompanying implementation that folks could look at and interact with. Do you have any plans for implementing that part of your proposal?

1 Like

The community wants something different from the mods, and I agree with them. so I don’t wanna associate myself with the crates.io mods.

sure, the problem here is more of the “they never moderate anything” kind, rather than “they moderate too much”. however… I still wanna leave crates.io over this, but I don’t wanna deprive the general rust community (especially newcomers) from using my crates / force the general rust community to use my registry.

1 Like

as such, it can also decline to publish crates that depend on such crate. I don't see the issue?

What's the point of hosting your crates on your own server because you're against crates.io policies if your users are still downloading the cached copy crates.io fetched from your registry? I mean, yeah, technically you aren't using crates.io, but the practical result is the same. Is it worth doing all the work just for this?

1 Like

Yes, it is. It would also bring me one step closer to finally closing my github. (rust still has me tied to github in other ways, tho, but w/e)

While crates.io would be the default for new cargo installs, in the future it could be changed so you could pick your default from a list.

The team already said they would welcome other OAuth providers implemented in crates.io! The only thing missing is someone with the time to implement the feature.

4 Likes