Integration with mono-repos via intermediate directories

I'm hoping that this sparks some conversation about alternative solutions for Add named path bases to cargo (v2) by dpaoliello · Pull Request #3529 · rust-lang/rfcs · GitHub

From that RFC:

As a project grows in size, it becomes necessary to split it into smaller sub-projects, architected into layers with well-defined boundaries.

One way to enforce these boundaries is to use different Git repos (aka "multi-repo"). Cargo has good support for multi-repo projects using either git dependencies, or developers can use private registries if they want to explicitly publish code or need to preprocess their sub-projects (e.g., generating code) before they can be consumed.

If all of the code is kept in a single Git repo (aka "mono-repo"), then these boundaries must be enforced a different way: either leveraging tooling during the build to check layering, or requiring that sub-projects explicitly publish and consume from some intermediate directory. Cargo has poor support for mono-repos: the only viable mechanism is path dependencies, but these require relative paths (which makes refactoring and moving sub-projects very difficult) and don't work at all if the mono-repo requires publishing and consuming from an intermediate directory (as this may very per host, or per target being built).

Possible Solutions

Base paths

The RFC proposes adding a "base" path table to Config.toml which can then be referenced by path dependencies in the Cargo.toml, thus allowing an external build system to setup the intermediates directory and Config.toml that points to it.

This approach has the advantage of being simple to implement and understand. A path dependency is the correct model for how Cargo should handle incremental builds and lock files for these types of dependencies.

The downside is that it does introduce yet another way to reference dependencies in Cargo (although it is a sub-variant of path dependencies) and adds another place where Cargo.toml depends on Config.toml.

Expand "directory source"

Cargo supports directory sources which are intended to be created by cargo vendor and have checksums for all crates: Source Replacement - The Cargo Book

One could imagine expanding support for these by creating an "externally managed" directory source without checksums or an index which can be referenced like a normal registry. Again, the external build system would need to generate a Config.toml with the registries that it will be setting up.

This has the advantage of being easy to understand but will require changes within Cargo to specially treat these registries like path dependencies for the purpose of incremental builds and lock files.

Support environment vars in Cargo.toml

As suggested in https://github.com/rust-lang/cargo/issues/10789, if Cargo supported expanding environment variables within TOML files, then this could be done without any other changes to Cargo's TOML files.

Are workspace-root-relative paths an option here (to be clear: not a feature that currently exists), or is it a requirement that the packages not be in the same Cargo workspace?

Workspace relative paths don't work for two reasons:

  1. We don't want all the packages to be in the same workspace (since we don't believe that workspaces will scale to the number of packages in the mono-repo, and because this breaks the layering in the mono-repo).
  2. The path to the intermediates directory is flavor or host specific (e.g., if the sources are in /repo/src/... then the intermediates may be in /repo/i/x64/release/...).
1 Like

I think an important consideration should be added to this. I believe the use case for this isn't just any monorepo but when your repo is so large that loading all of it as a workspace (or tree of workspaces) into memory would be prohibitive and maybe even checking out all files is prohibitive.

1 Like

For myself, I feel like a Cargo.toml that relies on the system configuration should be the exception and making it too easy could lead to people accidentally falling into bad patterns that will burn them in the future (e.g. I know of places where building software requires precisely configured mappings from VCS to disk). I feel like a feature like this should be on the level of custom registries to configure and use and should encourage checked-in mappings.

I'm not sure how checked-in mappings would work in this case - one my requirements is that the dependencies may be in a directory that is per-host or per-flavor being built, thus only the external build system knows where it is located. So either the mapping is generated, or can read some external state (e.g., env vars or some other file).