Initiative, a group of changes to make vendoring easier

I don't really understand the hashes, are you really saying that the git hash of every single compatible version has to be added to the cargo.toml.

Because if yes how do you update a crate 5 levels deep that just released a critical security fix?

Only a single one.

I don't know if you understand merkle trees. If you add one single hash, the package can "pin down" a dependency package, which includes all contents of that specific commit.

If anything in the commit gets changed the entire hash changes which enables very decentralized, secure, and reproducible devops.

The point is, you can include arbitrary stuff in that specific commit, including a statement saying this commit is 0.1.1 and should be compatible with all 0.1.x

I read what you wrote. My point is that hashes on top of semver means changing the human interface for the worse, and that's a he'd pass as far as I'm concerned.

Treating hashes as first-class citizens, does not mean you have to use that commit, only.

The package manager is free to make decisions. Say, normally you conform to semver.

In case of security critical situation the package manager can ignore it and forcefully patch it

But if the package manager is allowed to use commits that are set in the future of the one specified, how is this different to just a link to the repo?

The cargo.lock file is for reproducable bulds, the cargo.toml enables updating the dependencies without having to notify the crate author that links to it

1 Like

There is no real 'free' here. You can't stick a random repo in as some dependency.

What I want to compare it to is, some package managing that uses literal URLs in dependency declaration. They break so easily. I always use VPNs and often get blocked.

The freedom that referencing by hash brings is that it allows you to get the files from potentially unlimited servers as long as the contents hash to a merkle root as declared.

But, yes, in the simplest implementation you are restricted to one specific version, verbatim

The core idea here is to introduce enough metadata that allows you to securely download dependencies from indefinite servers, securely.

You can't just do this to new versions. You have to involve more cryptography, and I can write long essays on this which I don't want to for now.

I suspect for the hashes thing cargo.toml does something close but I dont have the time to dig into their implementation.

If you want me to say something that is actionable, I want cargo registry to regularly publish a signed index over crate names -> hashes mapping, if they are not doing this right now. This allows trustless mirroring (which means you do not need to trust thirdparties at all in any way)

About the signed crates index there is this github repo Crates io index which contains the hash of every published crate. Though i think the commits are not signed

This is checked automatically ( i think) if you disable the sparse index protocol

I wonder if crates.io has a keypair and regularly publishes signed indices over all data (I guess not?)

Not yet, but people are working on this with the goal of supporting mirrors. https://github.com/rust-lang/rfcs/pull/3724

While you addressed a bunch of my questions, I can't really read out which core thing you are dissatisfied with with the current standard solution? Is it temporary downtime of crates.io? Permanent downtime of crates.io? Compromise of crates.io? Something else entirely?

2 Likes

cargo vendor works well enough in cases where you indeed need vendoring such as reproducible builds, airgapped development, and being robust to infrastructure risks. I am writing it as someone who uses it in one of my projects. The 10k LoC problem is not a problem in such cases, because you want for all your dependencies to be vendored, which is commonly understood as making copies of all your dependencies and tracking them together with the rest of your project.

Notably, you can make the vendored LoC number much smaller if you target a limited set of targets by using cargo-vendor-filterer. A big chunk of vendored LoC is from auto-generated platform-specific crates like windows-* which can be excluded if you target only Linux.

But as others have wrote, it's unclear which problem are you trying to solve in the first place. Judging by your comments, you just have some vague idea of how you want stuff to work, without explaining why you want it.

3 Likes

Would git subtree help you in any way as an alternative to submodules?