Defining Dependency Versions Remotely

One of the biggest reasons to use a workspace is to keep your ten internal crates on identical versions of tokio etc. However, in order to delegate the versions out like this, now you have either monorepo or an alternative registry.

Nix, which has solved this problem of remote version definitions forwards and backwards, has been instrumental in solving this problem for my non-Rust dependencies, which have all the benefits of one shared set of versions but independent local overrides and upgrade cycles. Detsys, who uses Rust, understands the issues quite well Introducing the Determinate Nix Installer

The solution likely doesn't need to support nesting since we don't telescope dependencies up through multiple layers like Nix would. The remote aspect is what's super useful.

How it would work:

  1. A remote definition would contain a bunch of dependency versions in a Cargo.toml and a lock file to resolve concrete versions
  2. Dependent crate or workspace would pull that definition and derive a local lock file.
  3. Dependent crates delegate their definitions clap.workspace = true style.

How upgrades would work:

  1. Whenever you want to change Tokio versions at your company, bomb dot com, you update that central version and update the lock file.
  2. One by one, as you work on your services, you pull the central definition and re-derive a new local lock file.

Boom. All your services can update independently, but they all get a predictable set of versions as if you are using a remote registry. It is the best of all worlds.

My early noise was filed as an issue here:

I'm really wanting this style of updates at Positron and will end up using a remote registry ASAP to obtain this capability. We should not need remote registries when we are so early that we're still getting crates off of crates.io.

You won't be able to do this via a Cargo registry, at least not in the current design.

Released versions are immutable. Lock files apply to all registries. Lock files don't update automatically by design.

You could implement global version updates via a tool that updates lockfiles in your projects, like dependabot or renovate do.

You can centrally influence version selection done by cargo update by making your projects depend on a crate that has an exact version requirement like tokio="=1.1.1". That's limited to compatible version ranges, and Cargo won't apply this across major versions.

As I woke up, you reply seemed to be from another discussion entirely, so I'm going to clarify things and recommend completely re-framing and re-thinking the problem and expectations to anyone reading.

All of our versions are fully specified, ie no globs. In general, we want exact versions, centrally defined and resolved to a central lock file. That central lock file should be used by each repo to maintain a locally derived lock file that only contains what is necessary. The dependent repo can override dependencies in-detail by replacing thiserror.remote-workspace = true with a non-delegated version specification such as `thiserror = 0.2.1"

dependabot

Dependabot and similar tools only update the abstract versions listed in the Cargo.toml. I want the concrete versions in the Cargo.lock to only use versions that are specified in a remote Cargo.lock. The key issue we can solve is dispersion across multiple repos.

Released versions are immutable.

This sounds like it's about crates released to a registry. That's not really an affected use case.

When building web services with containers etc, an early-phase team working from the default registry is concerned with preventing dispersion in development and while building containers, not crates.

So the message is about dispersion dispersion dispersion. We don't want to build containers with a salad of versions where each crate is resolving concrete versions in its local lock file. For all dependencies in use, we want them to come from fully specified sets that were concretely resolved in a lock file. A remote lock file solves this. A workspace solves this, but enforces monorepo or a custom registry.

Custom registry is a fine solution, but it generally is being provided by a SaaS that early-stage teams just don't want to bother with. Even if we have such a custom registry to provide versions, every dependency will just use version "*". That is worse than a workspace for locking. Instead of delegating versions to a locked set of concrete versions, we would be delegating to a registry that might need to contain multiple versions because some of our repos had added versions to the custom registry.

A centrally managed lock file and a centrally managed workspace that contains concrete dependencies is clearly the most precise and scalable solution, but currently these workspaces force monorepo or very specific file and git structures that might as well be monorepo onto everything.

To clarify: are your dependencies fully specified in Cargo.toml or just in the lockfile? You talk about versions being fully specified as if that's a special thing, but that's always the case for Cargo.lock.

So if a remote workspace is specified, you want Cargo to use both the version range from that workspace Cargo.toml and the exact version from that workspace Cargo.lock, correct?

The closest you can get to this today is something like this:

workspace and crate using it

remote-workspace

  • Cargo.lock
  • Cargo.toml
[workspace]
resolver = "2"
members = ["../.."]

[workspace.dependencies]
tokio = "1.40.0"

your-crate

  • central/workspace [git submodule -> remote-workspace]
  • src/*
  • Cargo.toml
[package]
name = "child-crate"
version = "0.1.0"
edition = "2021"
workspace = "central/workspace"

[dependencies]
tokio.workspace = true

But that doesn't quite do what you want:

  • no individual lockfile for the crate, everything is applied to the one in central/workspace
  • target is also placed in central/workspace

I wonder if you can fake this by having a crate with every possible dependency pinned as your “central lockfile”, and then have all your leaf crates depend on it (via git dependency) with optional = true. It would be a pain to update though.

Instead of optional = true, you can put them in a [target.'cfg(any())'.dependencies] table.

Could you explain the motivation for why you want dependencies kept in sync across your many-repo environment?

For example, a primary reason to do it within a workspace is to reduce building multiple versions of the same dependency.

You talk about versions being fully specified

I just mean we don't use any globs in our Cargo.tomls. Every version is "0.1.0" or similar.

But that doesn't quite do what you want:

Agree, you're example and analysis are both perfectly demonstrating that we can get really close, but that the result is kind of like monorepo with extra steps.

Let's ask this backwards. Why can I not limit the dependency dispersion if my organization uses several repos?

I could understand monorepo for a workspace with lots of interdependence and code sharing. Our central common tools live in such a workspace. However, the code below is not a good reason to force workspaces and crates into the same repo:

[workspace.dependencies]
# ...
pin-project-lite = "0.2.7"
rand = "0.9.0"
serde = "1.0"
serde_json = "1.0"
sqlx = "0.8.3"
thiserror = "2.0.12"
tokio = "1.44.1"
tower = "0.5.2"
tower-http = "0.6.2"
tower-layer = "0.3.3"
tracing = "0.1.41"
tracing-subscriber = "0.3.19"
zeroize = "1.8.1"

If you can see how my Nix dependencies work, there is a central set of pins. Each repo is picking just one version of those pins, usually the newest, but upgraded at their own pace. Each repo can override any dependency in detail as necessary. The result is automatic and low-dispersion when nobody is doing anything manually. It has full control when someone needs to do something particular. IMO it's the best of both monorepo and multi-repo characteristics.

I wonder if you can fake this by having a crate with every possible dependency pinned as your “central lockfile”

I tried this. Unless I missed some detail, I needed to use a git submodule and the submodule cannot be a parent of the crate, so it wouldn't work. It's a very mild amount of a pain to do the submodule update, but it would solve the dispersion problem very cleanly. When you look at it, the only drawbacks of this solution (besides that I couldn't get it to work) are related to git, so if cargo would do the fetch and use that remote lock to derive the local lock file, it would be a perfect solution.

I'm feeling much more well understood.

  inputs = {
    pinning.url = "pinning";
    nixpkgs.follows = "pinning/nixpkgs";
    fenix.follows = "pinning/fenix";
  };

This is what one of our simple Nix flake input expressions looks like. It basically re-wires that repo to use an exact set of locked versions from its upstream "pinning" flake. With almost no work, all our repos use very sparse and tightly controlled sets of versions. We will have very few incidence of some slightly different versions causing bugs. When we do, we can still override everything in detail, just for that repo. We can also (because Nix) propagate such an override to every single repo by including them in the pinning flake's overlay. It's really nice. We are really close to having this degree of niceness in Rust with just basic cargo.

Bare version numbers are glob. If that was in Cargo.toml you were not doing what you thought you were doing.

Presumably OP's workspace contain multiple binary crates, between which dependencies aren't currently unified. The binaries may however access the same file/database/shared state. If the dependencies are not kept in lockstep, this can result in communication errors: semver guarantees are generally reliable for the direct callers of an API, but I/O behavior tend to change on minor/patch updates all the time.

2 Likes

I did call that a semver violation. You can't assume that multiple programs a user installed are updated in lockstep unless they are literally part of the same package.

You can talk to OP. But anyway no, that's not how our deploys or system coupling work. That sounds like blue-green deploy with rigid GRPC coupling.

Rather, the motivation to avoid tons of versions being in use is simply to take less exposure to variability in dependencies. Fewer versions in use is always better. Less entropy is always going to behave more predictably than more entropy.

Rust's community is usually reluctant to implement a feature request if the motivation offered is rather vague. I was speculating what concrete problems you were facing… I can understand if that was not the most plausible scenario.

1 Like

what concrete problems you were facing

In short, monorepo is an extremely heavy requirement on web service architectures that naturally want to split repos. Cargo has some implicit preferences for monorepo in order to make use of its capabilities to handle duplication of version descriptions and coordination of version locking.

As you mentioned, the Cargo.toml is a recommendation. Only a lock file is reliable. The only way to share deterministic version locking between multiple crates is a workspace. The only way to even de-duplicate the version descriptions in multiple Cargo.tomls is a workspace. However, this enforces monorepo, which brings with it all of the lack of flexibility of monorepo.

We build containers. We do CI. We do many things that check out code. We can either 1) complicate our git workflows to the point that we need extra hands just to manage all of the merging or 2) have multiple repos and mange the dispersion. Managing optimistic concurrency is easier than managing locking, so we are going with 2 by centralizing documentation, centralizing common tools, but building services out of independent repos.

Allowing different parts of the system to update independently is good. Allowing them to experience dispersion in their dependencies so that upgrade bugs we fix in one service show up differently on another service is very, very bad. Allowing in-detail overrides per repository is good because it is a relief valve against forcing us to task-switch because one container needs an update that breaks another.

In Nix, we can subordinate a lock file to a remote lock file. We can subordinate the version descriptions to the remote flake. We can override every single dependency in detail in every repo. The sum of these capabilities is extremely valuable. It de-duplicates version descriptions without coupling repos. It coordinates version locking in independent repos without excessively restricting the freedom of dependents.

I have not opined about how such a tool would affect a crate for distribution via a registry, but given that they would have already built from a lock file at that point, it likely would not affect anything. However, the repositories would become implicitly dependent upon something that is not a crate. That could be good in the following way: A crate may desire to stay in lock step with a popular set of libraries. They may subordinate their versions over some set of dependencies to a shared remote definition, one handled by highly individuals perhaps more trusted than even the crate publisher. The effect is that their Cargo.tomls would no longer be scattering duplicate descriptions of versions and they may more frequently use the same versions and more compatible versions. Our supply chains in the public could enhance durability in some cases, delegating rather than globbing dependency versions.

1 Like

Thank you for covering your use case. To be clear, doing so is important so that

  • We make sure we are solving the right problem
  • We make sure the solution solves that problem
  • We understand what wiggle room we have for how solving this can tie into other solutions
  • We understand the benefits so we can weigh it against the costs
  • As Cargo is opinionated by design, we can weigh out how "blessed" this workflow is, from one extreme of us encouraging this work flow to the other extreme of intentionally not support features that would assist in the use of this problem (this is extremely rare), and everything between.

If I'm understanding correctly, you find this is important to do so that you can ensure bug fixes get propagated to all of your repos. However, you still want to give repos operational flexibility to override this.

Taking from that statement:

  • This involves transitive dependencies, and not just direct dependencies, so version requirements are not sufficient
  • But upgrades will involve breaking changes, so you'll also need to deal with version requirements
  • There needs to be a way to tell between being behind and overriding
  • Ideally there is a way to track overrides and work towards eliminating them or else the sought benefit could be lost

So the following would not be sufficient on their own

  • Having an "everything" package with target."cfg(false)".dependencies using = on everything as that won't help with version requirements
  • Delegating a dependency source to another dependency as that only affects direct dependencies
  • Providing an alternative source for workspace.dependencies as that only affects direct dependencies
  • Pointing to an "everything" lockfile as that won't help with version requirements

Some of those could possibly be mixed together but the results would be mixed and a bit of a kludge.

A more extreme option is if we get the cargo plumbing commands, then you could implement you own logic for loading manifests or for resolving of dependencies.

Honestly, I feel like the best solution to this problem is to use something like RenovateBot

  • You have a shared definition of versions by what is the latest within the registry / git repo
  • This applies to direct and indirect dependencies
  • This biases towards staying up-to-date, rather than stagnation
  • You can "override" the version by rejecting an update PR in an individual repo
  • You are unlikely to forget to remove an "override" because of the dependency dashboard and because an update PR will be emitted on the next update
2 Likes

In this Nix example, I understand that your local flake lock file will be locked to a specific git revision of pinning, and deferring to that pinning flake's version definitions of nixpkgs and fenix for the other inputs.

Comparing the one solution posted above,

It seems like here too, Cargo will create a local lock file inside the leaf crate, based on the contents of the pinning crate.

What is the difference between the cargo and nix use cases? (I'm probably missing something obvious, but couldn't find it after rereading the thread)

Just to further dig in to details:

  • In Nix, you update the pinning flake. My understanding is that you have to manually ask nix to update the local lock file in a leaf to see the updates. (otherwise, what good is a lock if it always moves underneath you)
  • In Cargo, it seems like the same manual action (cargo update) is needed, with the same result of updating the local Cargo.lock

UPDATE: Seems I posted 11 minutes too late,

If I'm understanding correctly, you find this is important to do so that you can ensure bug fixes get propagated to all of your repos. However, you still want to give repos operational flexibility to override this.

And the inverse, not wanting new bugs and variations of behavior to be propagated into our repos through unmanaged dispersion.

Incidental to your reasoning, which I agreed with, more variables are coming to mind:

  • Our leaf crates probably have new cargo error modes where cargo would decline to write a new lock file because an added dependency has an ambiguous or contradictory relationship to the central lock file
  • Cargo CLI would need a new flag or command to differentiate between updating the view of the central lock file versus updating any globbed or new dependencies in the local Cargo.toml.
  • If we add something to the local versions, which requires writing a lock file, is updating the view of the remote lock file required? My preference for only explicitly opting into potential workflow disruptions (an operational concern) may differ from the best default for unfamiliar users that see cargo creating a new lock file and believing they have the newest remote versions. That's maybe an RFC question.

I understand that your local flake lock file will be locked to a specific git revision of pinning

Yes, the local flake lock will have resolved "pinning" to a specific rev (via a flake registry that is highly specific to our streamlining).

This is correct. The dependent repos are locked. In order to receive the latest set of central versions, they must opt into the update. In practice, this means we can CI them against pushes to the central flake. If they fail to build, we can handle it on our own schedule and if they succeed, we can open a machine PR into the repo, allowing us to accept, roll out, and canary the changes on our own schedule.

it seems like the same manual action (cargo update) is needed, with the same result of updating the local Cargo.lock

This is the outcome I would recommend. It is making me realize Cargo would need to resolve the version of the central definitions to a fixed output, locking the view of the central definition, in the same way that Nix locks our upstream "pinning" to a rev.

This biases towards staying up-to-date, rather than stagnation

Up-to-date can mean taking more exposure to the supply chain (until we use a private registry) and more unforced variability. Handling the PRs for each repo is its own source of repetitive friction. We may use automation, but probably not any bot that treats each repo as if there is no central source of truth other than the registry.

  • You have a shared definition of versions by what is the latest within the registry / git repo

We cannot freeze registries without a private registry. While a private registry is a great solution for supply chain control, it is not ideal for every company in the early going.

Critically, delegating to a private registry could still enable mistakes to happen. In order to give the registry control, we would specify globs in our repo Cargo.tomls. If one crate in our inventory wants to update a registry version, if all of our repos are delegating to the registry, they will begin to update to that new dependency even though we only wanted to try it on one repo. We can have a private registry that has newer versions than are locked in a central lockfile.

implement you own logic for loading manifests

Seems like a solution. As long as it would work with tools like cargo leptos, it would work. I think the question of which to prefer depends one:

  • Would the remote definition be so generally useful that it would rival existing features in popularity and value?
  • Would the operating principle conflict with other plugins if not built into cargo and therefore having that inherent preference in support?

I hope that is all accurate. I will return to operating. Thanks for everyone's time.

What steps are you taking for managing this when updating the central set of versions with Nix? What steps do you expect to take with Cargo?

We run in pinning:

nix flake lock --update-input fenix

To get a new Rust compiler & toolchain and

nix flake lock --update-input nixpkgs

to get new non-Rust and pre-built Rust dependencies.

Following this, in each leaf repo, we can run

nix flake update

And since our leaf repos exclusively get inputs from pinning, it is equivalent to nix flake lock --update-input pinning. Afterward, the central lockfile is in effect in the leaf repo.

Fenix and Nixpkgs are not our only inputs, but they are the most important ones. In particular, every input in pinning is overridden to use pinning's nixpkgs rather than their own. Other than during upgrades and while a repo is broken and rolled back, we have one set of nixpkgs in use. That is true on dev machines (a home manager module uses pinning as an input), containers, and in CI.

For updates of Rust dependencies, in the pinning repo, if we bump a version definition and run cargo generate-lockfile or cargo-update <spec> I would expect a new lock file to result. cargo update <spec> should make fewer changes usually.

Following that update, in the leaf repos, we would most likely run cargo generate-lockfile. We might instead use cargo update <spec>. If there are locally defined or overridden versions, the default would be only to update portions of the lock file covered by the new view of the central lockfile. The more blunt cargo generate-lockfile should regenerate everything, first updating our view of the central lockfile and then resolving anything not specified within it. The preference is not clear since in practice we will strive to make them equivalent.

The use of <spec> is an RFC question, but my instinct is that this is a new kind of pkgid-spec. However, there is likely only one, and a flag or negative flag could serve the purpose as well. We can't use workspace since the leaves may be workspace repos.

Unlike Nix, Cargo will, in my expectation, have no need or desire to delegate versions to a summation of remote lockfiles. However, my use cases would never realize such a need, and that's an RFC question.

In practice, since pinning nixpkgs and pinning Rust dependencies are the same responsibility, and because they would occur simultaneously (so that we minimize the intersection of sets of updates), we would still have a repo named "pinning" that would now also control Rust dependencies too.

I would expect all options that allow overriding versions of indirect dependencies to be available in the remote workspace and for that to only affect the generated lockfile. The leaf repos would never care about the Cargo.toml, only the Cargo.lock.

It's been a while since I read about the resolver logic, so beware of false assumptions that slipped in.

I was not asking how you are managing your pinning but of preventing new bugs and variations of behavior. Do you review dependencies? Have test upgrades you do on core packages before updating the global set, etc.

1 Like