Silo effect of alternative registries

Reviews are a front-end feature, so they don’t affect alternative registries (another website can add reviews without needing to control how dependency resolution works, and doesn’t even have to be a registry).

The alternative registries are brought up, because the crates-io team doesn’t have capacity to build and maintain new features or enforce antisquatting policies, so I imagine building a reviews system and fighting comment spam and voting brigades is going to be dismissed as a job for another site too.

This thread is about a fundamental distributed-systems problem, network consistency, one which federated systems do not need to solve reliably (since they can just ignore unresponsive instances), but a dependency resolution system absolutely does (compilation results should not depend on opaque state of remote caches).

With all due respect, repeatedly pushing for support of the latest hotness (in this case, federation) as some kind of panacea is unhelpful. I understand you're excited about federation, but it is ultimately the wrong shape of distributed gadget.

5 Likes

I’m not excited about federation.

I’m unhappy with the current system, and I’ve seen federation actually work in quite similar situations.

I’ve seen mastodon instances rise and fall. mastodon handles it quite well, and the remnants of some of those instances still linger around in some other instances. while it’s a feature of the software rather than the protocol, it seems to be a desirable feature in any federated dependency resolution system.

in any case, I’m not pushing for it because I’m “excited about [it]”, I’m pushing for it because it works, and it solves many of the problems you’re seeing with (non-federated) alternative registries, like (de)duplication issues, having to push to separate registries separately, and so on.

That is anecdata. I counter that I've seen federated systems have problems, like fragmentation. I don't think this is a meaningful line of discussion.

What feature? Mastodon does not solve topological sort problems; it aggregates the output of multiple remote producers. You do not want to live in a world where different producers might disagree on a universal fact (i.e., the contents of a crate) which would make your solution to this graph problem depend on who you talk to at a fixed time, since crate upgrades are no longer atomic. If you add malicious agents and a sprinkle of social engineering, you've got yourself a proper mess where you can't trust the registries, and now everyone with any competent security team is running their own air-gapped registries.

1 Like

Fragmentation is a feature. And it’s, IMHO, something we need.

I suspect a lot of us feel the exact opposite (i.e., that we need to actively avoid and reduce ecosystem fragmentation), so could you try to articulate why you think this is important? It’s not just because you wanted to “fork Rust”, right?

4 Likes

Please stop. You’re derailing the thread. In this thread I’m trying to discuss actual issue with a real implementation of Rust tool.

3 Likes

I disagree — the point is not about having an alternative repository with an identical set of crates, but an alternative repository which can make its own decisions about which crates to allow, who owns which crate, and maybe even publish their own patches.

Say we had an alternative repository, greatcrates.net, which our hypothetical user wants to use. Say we have some dependencies:

  • bloom, only published by greatcrates.net, depending on petal 1.1.5
  • flower, published by crates.io and greatcrates.net, depending on petal 1.1
  • petal, published on both but with version 1.1.5 only available on greatcrates.net

Now, bloom only has a single source and though flower has two, both have an identical version, so a checksum is enough to deduplicate. But petal has two different versions available, so what does Cargo do? Including both is redundant and may cause issues if the lib has internal state. Using the older version from crates.io is apparently incompatible. Using the newer version from greatcrates.net may be fine, but since flower depended on the original publication on crates.io we can't know this.

It is questionable whether simply enabling an extra repository should automatically pull in newer versions of packages available from that source — especially since a user might enable the repository with the intention to use only a single package. So foo = "0.2" should not simply mean look for foo, version 0.2, in all repositories.

One possibility might be that dependencies are always namespaced by the repository, but that crates may have a provides name with alternative names for the same lib. Going back to our petal example, greatcrates.net/petal could have provides = ["petal"] (assuming no namespace is required for crates.io), which tells Cargo that using greatcrates.net/petal is a drop-in replacement for crates.io's petal, thus safely de-duplicating dependencies.

(Note that as well as allowing each repo to have its own namespace, repos could also have caches, republishing crates directly from other repos. Ultimately, it must be up to the user which repos to trust.)

2 Likes

That sounds like a badly designed federation…

I think that alternative registries should be able to mirror dependencies from other registries (with a potential renaming) and published crate must depend only on crates from its own registry (which could be mirrored from other one). The mirrored crate will have at least the following fields: name, name and registry from which its mirrored, name and registry of the source. The latter two usually will be the same, but it could be useful to allow chains of mirrors. As a consequence non-mirrored crate re-publishing between registries should be heavily discouraged.

In the case of a private company registry they will manually mirror crates from crates.io (and maybe other registries) after source review. One could imagine that some companies will create private registries with payed access and reviewed crates, maybe they will even provide some liability guarantees. Some registries will mirror updates automatically based on webhooks or periodic checks. Of course it could work in other way as well, crates.io can approve some registries as a source of crates to mirror. To make this approval automatic or manual is up to discussion. Personally I think that ideally such source registries must use crates.io sub-domains.

Lets say it will be rand.crates.io, now you publish hc128 to it, which will be automatically mirrored to rand_hc128 on crates.io. So if some crate has in the dependency tree hc128 from rand.crates.io and rand_hc128 from crates.io, cargo will be able to solve version constraints and select a single crate version by using the fact that rand_hc128 is a mirror of rand.crates.io’s hc128. The main restriction will be that mirror versions should be the same as for source. It’s possible to remove this restriction as well (by making registry to return list of hashes which fit a given constraint), but I think it’s not worth the additional complexity.

So if some crate has foo twice in its dependency tree, one foo = "0.2" from crates.io and foo = "^0.2.1 from altregistry.org which mirrors foo from crates.io, but on crates.io the latest version is 0.2.5, but on altregistry.org it’s 0.2.3, then cargo will request versions which fit the relevant constraints from both registries (for crates.io it will be 0.2.0 - 0.2.5 and 0.2.1 - 0.2.3 for altregistry.org) and will select 0.2.3 as the latest one. But if there is a dependency on foo = "^0.2.4", then cargo will use two foo versions 0.2.3 and 0.2.5. Not ideal, but I think it’s a reasonable behavior here.

There is also a question from which registry we should download crates (note that we already know source name, source registry, version and hash of the wanted crates). The most logical option is to use source, but a private company would like to use crates from its own registry. I think the latter case is better solved with appropriate Cargo.toml options (e.g. blacklisting all registries except company owned, or giving it the highest priority).

1 Like

This is a good goal. I'm just worried about a world where surprises can arise due to bad actors and malicious registries, which is something all decentralized systems are vulnerable to. Without consistency, you're going to need to make choices about defaults... and odds are that will something officially sanctioned, like crates.io.

Right, this sounds like a good world to live in, because we no longer have confusing consistency problems. But, I certainly don't want to live in a world where the dependency resolution problem might have different solutions depending on which order you look at registries in.

It's very important to have non-crates.io registries, if, for example, you want total audit-able control of your code supply chain (which is an extremely reasonable thing to want if you handle sensitive data or don't want to entrust your SLA on a third party).

In the original context of "if you don't like how crates-io (not) manages abandoned crates, make your own registry" ability to modify crates and their ownership is the whole point.

There's an implicit assumption that user has to trust the registry not to do anything malicious (just like you have to trust cratesio today). I don't see a way around it, because ability to change ownership of crates (and thus ability to modify them) is one of the goals.

Sure, but by default Cargo will only know about crates.io, no? At least, that’s the only sane default beyond “all registries are opt-in”, which is nuts.

I certainly don’t want Cargo hooked up to random registries I don’t trust by default. I implicitly trust (for the purposes of discussion) crates.io, because it’s administered by the same organization that distributes the compiler binaries.

Alternative registries are opt in. Currently it’s a tedious change of config and extra property for every dependency, so there’s no way to use one by accident.

I don’t want cargo hooked up to random registries. I don’t think anyone would want that.

but it would be interesting if the only registry it’s hooked up to could hook up to other registries.

i.e., registries would be hooked up to other registries, not the tools.

I don’t really understand the problem then (which is on me; I might have skimmed the OP incorrectly). Having Cargo “figure out” which registries to use based off some setting at the top of your buildfile seems like a footgun. If what you want is “I want to use this registry as the default instead of cargo.io”, that is an extremely reasonable knob.

There’s no way to use anything instead of crates-io. You can say “in addition to crates-io, if I specifically say I want a package from another registry that I’m nicknaming “foo”, then check that URL”.

There is, through source replacement.

Thanks! I wasn’t aware of that (I thought only per-crate replacement is supported). That may be the solution to externally curating cratesio.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.