Local Crate Registry (without vendoring)

Background

Hey all, I've been looking at integrating cargo with the ROS 2 build tool, colcon, as part of the ros2_rust project. Without getting too deep into it, colcon is a python tool which will invoke various build systems (cmake, setuptools, cargo). Here we've got a cargo crate, which will depend on message_package (not through cargo, but rather an adjacent ROS specific way), and colcon will make sure that message_package is built first.

.
└── ros_workspace/
    └── src/
        ├── message_package/ (cmake)
        └── my_crate/        (cargo)

As a result of the message_package build, a rust crate will be generated

.
└── ros_workspace/
    ├── src/
    │   ├── message_package/ (cmake)
    │   └── my_crate/        (cargo)
    └── install/
        └── message_package/
            └── share/
                └── message_package/
                    └── rust/
                        ├── src/
                        ├── build.rs
                        └── Cargo.toml

Current Approach

Currently, we consume this generated crate by patching in a .cargo/config.toml that colcon creates. So my_crate/Cargo.toml declares the dependency like this

[dependencies]
message_package = "*"

And the generated .cargo/config.toml section looks like this

[patch.crates-io.message_package]
path = "<$HOME>/ros_workspace/install/message_package/share/message_package/rust"

I would like us to get away from this for various reasons, but to keep it succinct here, its confusing for our users, and will generate warnings if we don't actually use the patched dependencies.

Alternatives

So with that preamble out of the way, I am interested in using a local registry of some kind for these generated crates.

[dependencies]
message_package = { version = "*" registry="local" }

I've looked at cargo-local-registry and cargo vendor, but I don't believe either really accomplish what I'm looking for. I do not want to completely vendor all dependencies from crates.io, only a specific few that a developer opt's into.

[source.crates-io]
replace-with = "vendored-sources"

[source.vendored-sources]
directory = "vendor"

# or if using local-registry
# local-registry = "/path/to/vendor"

If I try to do source replacement without a registry (either directory or local-registry)

[source.local]
directory = "/path/to/generated/crates/"
# local-registry = "..."

I get this error

Caused by:
  registry index was not found in any configuration: `local`

It seems like any sort of source replacement still requires a registry with an index. I can actually define an index locally, by using a file URI to point to a dir containing a config.json, which then has a dl key that points to a local file URI for the requested crate.

[registries.local]  
index = "file://<$HOME>/ros_workspace/install/index"
.
└── ros_workspace/
    └── install/
        ├── ...
        └── index/
            └── .git/
                ├── me/
                │   └── ss/
                │       └── message_package
                ├── ...
                └── config.json

config.json at that path

{  
  "dl": "file://localhost/<$HOME>/ros_workspace/install/{crate}/share/{crate}/rust/{crate}-{version}.crate"  
}

This requires me to create a local git repo for the associated index, populate said index, and create .crate files for the generated crates. Additionally, this approach caches quite a bit in ~/.cargo/registry/. The generated crates will likely change contents without updating version numbers (i.e. local development) and will then fail a checksum.

What I Would Like

Ideally, I would like to be able to define an index locally, that points to the generated crates in the install/ folder. These crates would not have any checksum checks, wouldn't need to be packaged into .crates, and can be looked up in a similar way that their download paths are found via indexes (i.e. the dl key in the config.json).

.
└── ros_workspace/
    └── install/
        ├── ...
        └── config.json
{  
  "dl": "file://localhost/<$HOME>/ros_workspace/install/{crate}/share/{crate}/rust/{crate}-{version}"  
}

It's not my birthday, but it would also be awesome if I could define this via an env var, like CARGO_REGISTRIES_<NAME>_INDEX

Am I Missing Something?

If you made it this far, thanks for hearing me out! With that said, is there some way to get closer to my ideal without any new features in cargo? Did I miss anything?

I don't really know the answer to the question but here are some things to make consider:

With multi-library cargo projects we normally use Cargo's workspace support. But I guess this is not an option for your use case?

It might be useful for you to look at what for example bazel or buck2 are doing. Surely a lot of the same problems would come up there?

Regarding checksums, some relevant links are

Cargo makes a fundamental assumption that registry and git dependencies are immutable. I know we at least leverage this for reducing overhead for rebuild calculations. We're also looking to leverage this in our caching work (Per-user compiled artifact cache · Issue #5931 · rust-lang/cargo · GitHub).

I wonder if 3529-cargo-path-bases - The Rust RFC Book may be of use for these kinds of scenarios.

Overall it would be good to make sure we are all on the same page of the driving requirements, e.g. why path dependencies can't be used, why these are being generated dynamically rather than statically, etc.

I'm not sure if this is exactly what you're hoping for, but perhaps it could be inspiration? margo is a minimal crate registry that uses static files.

With multi-library cargo projects we normally use Cargo's workspace support. But I guess this is not an option for your use case?

Colcon does support using Cargo workspaces, but that doesn't really help with the generated crates.

It might be useful for you to look at what for example bazel or buck2 are doing. Surely a lot of the same problems would come up there?

I'm not familiar with buck2, but AFAIK bazel's rust_library completely avoids cargo and links .rlib's directly. This isn't a very viable option for us as we still want everything to build with or without colcon.


I wonder if 3529-cargo-path-bases - The Rust RFC Book may be of use for these kinds of scenarios.

I think this RFC would be useful. It feels like I am overloading the term "registry" here for two different use cases.

  1. Immutable (root-of-trust?) crates, which we pull from external sources (i.e. nominal usage)
  2. Local crates, on disk, which may or may not change, where version numbers are basically irrelevant (i.e. [patch], or path = "...")

Why should I need to think of things like checksums, if cargo can fully resolve the path to my dependency locally. If we extend the second case far enough, its basically a mutable registry of some sort right?

The RFC sort of touches on this

Developers still need to know the path within each path base. We could instead define path “aliases”, though at that point the whole thing looks more like a special kind of “local path registry”.


Overall it would be good to make sure we are all on the same page of the driving requirements, e.g. why path dependencies can't be used, why these are being generated dynamically rather than statically, etc.

These are good questions. I can only provide context for this specific use case with colcon.

why path dependencies can't be used

We could use path dependencies, but a rust crate, that is a ROS package, may install to a different path if a user configures colcon or their ROS workspace differently.

[dependencies]
message_package = { path = "../install/message_package/share/message_package/rust" }

--install-base INSTALL_BASE The base path for all install prefixes. The default value is ./install.

Also, users don't need to have the package in ros_workspace/src/, they could clone it in nested folders as well, ros_workspace/src/foo/bar/ (colcon unfortunately does not have a top level manifest like Cargo workspaces do :frowning: )

why these are being generated dynamically rather than statically

In this case its to play nicely with the broader ROS ecosystem. We generate python code and C++ libs from these IDL(-like) files as well.



Another idea I've been kicking around, is perhaps do something like -sys crates. In this case, I wouldn't find a C lib, but rather find the generated rust library, and then include!() the entirety of it. I could upload these to crates.io and just avoid much of this.

This also starts to show some cracks though because the dependencies for these -sys-like crates would need to be the exact same as the generated crates, and the version numbers are somewhat wonky (the contents of the library would change depending on what environment variables were present when you built). Additionally, if someone had a local message package, they'd need to invoke that CMake project in a build.rs if they are within the same workspace so we could do both, generate the code, and depend on the crate.

I'm not sure if this is exactly what you're hoping for, but perhaps it could be inspiration? margo is a minimal crate registry that uses static files.

Thanks for the link, I'll have to explore how this differs from a traditional cargo registry. If it caches to ~/.cargo though I think I'll run into the same issues.