The vendored crates tend to contain a [workspace]. They work okay when I use =version. They error when I use them as submodules, saying there are conflicting workspaces.
I don't want to use {git=, because I already have the repo checked out locally. Using {git= demands unnecessary roundtrips from local to remote.
I want a solution to all the problems. I vendored a lot. It's a recurring problem.
On a tangent, I have a preference for git submodules because they are inherently more decentralized. Centralized hosting tends to cause problems. Sometimes their sites go down etc. or block my IP.
I think its a pain to add and remove git submodules UX-wise too. Maybe cargo should interop with git to automatically do this submodule vendoring workflow.
I saw there is a command cargo vendor but I havent used it sorry.
Personally I would rather see effort put into changes to make vendoring less necessary; for example, changes to the core language to make it easy for an application to adapt to whatever versions of its dependencies happen to be available, rather than insisting on specific versions.
I ran the command cargo vendor. It's absurd. It made a directories with all dependencies with no version control. Now git shows 10K+ files changed. Am I supposed to commit these 10K amount of bloat? And what if I want to merge changes into upstream?
I'm not sure what the person who wrote the feature intended me to do? This is absurd.
I want users to be able to use my software by cloning and building, full free software philosophy. And sure, the sources should be easily decentralized. Like every dependency is pinned with a merkle root hash. I'm an expert at this.
Pinning dependencies without a very good reason, and a documented plan for when to stop pinning them, is exactly the sort of thing I want to see Rust, and the industry at large, move away from.
More generally, the question I feel I need to ask you is, why do you think you need to vendor at all? What are these library packages that add up to 10KLOC that you can't use as normal dependencies, what's wrong with them, and what's stopping you from working around the problems from within your own code?
Copying code into your code is the definition of vendoring. What did you expect?
Really vendoring like that only makes sense for extremely niche use cases, such as airgapped development. And these days a private registry that mirrors the crates of interest is probably a better option.
I'm not sure how this is related to vendoring at all? Cargo will download dependencies specified in the lock file. Submodules will have git fetch them. Vendoring copies it into your repo. But it doesn't change if it is free software or not?
and what's stopping you from working around the problems from within your own code?
Unfortunately some things are JUST better done by modifying the internals of a component than tinkering with its externals.
Modern programming, stateful or functional, is inherently haunted by encapsulation of state, and logic. It's in many cases, always better to refactor the dependencies than work around them
Pinning dependencies without a very good reason,
Dependencies should be pinned down with a merkle root hash such as with SHA256. This enables absolutely decentralized package management and reproducible builds because its deterministic. You can obtain sources from anywhere, and be sure its correct, and safe.
I do not expect copying code from a git repository into my git repository and have it show up 10K+ additions, because this totally defeats the point of git, besides looking disgusting.
Vendoring should be, like git submodules. Actually, I think cargo should make every copy of package downloaded a git repository, and when I vendor them I can just edit and commit right in there.
If you simply copy sources into your repo you lose the ability to PR into upstream, pull new updates, all the features of version control and community collaboration etc.
Tracking versions by 0.1.x should be abandoned. Replace all of them with git commit hashes.
You can build the language stating that {x,y,z} versions are compatible, on top of that protocol.
And finally, the package manager can solve the dependency graph with Z3
It’s not a merkle root or any other kind of content-addressing hash. Somebody could build an index from crate hashes to the files, but it’s not implicit in the design.
I’ve experimented with generating registries into ipfs before, and it’s possible to do something that works, but it’s really not ergonomic. I think to really integrate cargo with some kind of CAS would require changes to explicitly support them.
This cannot work as it would remove semantic versioning.
You would now need an extra copy of each crate for each minor/patch version every dependency uses. Wich would make communicating between crates using anything other than std impossible.
In effect tokio can never update anything glam can never add anything. This then extends to any crate that is used to communicate between different crates.
Also while this maybe only helps a little the crate cargo-patch allows changing crates without pinning the version by using patch files
I'm a bit confused with what your end goal is here. You mention submodules being more robust and decentralized, but all your submodules point at GitHub, which is very centralized and has significantly worse uptime than crates.io. Also, some of the submodule links 404 for me, maybe because the repos are private, so I don't think I'd be able to clone this repository properly.
To be more robust against crates.io downtime, you can self-host mirrors of crates.io, which isn't necessarily decentralized, but an alternative.
I think it would be very useful to focus on how to solve the concrete problems you have with the current crates.io way of dependency management instead of your specific submodule solution you use to try to get around those.
You mention submodules being more robust and decentralized, but all your submodules point at GitHub, which is very centralized and has significantly worse uptime than crates.io
What do you think really is the act of supporting decentralization? Using an obscure github alternative? I appreciate the act of trying to break their monopoly, but it's already a lost battle. They monopoly over the attention, entry point of repo hubs.
At this point it does not matter any more. Rather, I increase my publicity by using github, which serves my goals.
Using git submodules, regardless where you host it, is decentralizing on its own. Git is a decentralized version control system as we all know. That's it.
We can make a decentralized git router to resolve git-commit to any available hosting platform.
To be more robust against crates.io downtime, you can self-host mirrors of crates.io,
That's not decentralized, because git submodules only point to hashes (preferably SHA2 hashes but git project lags so much they are still using SHA1). Crates.io is an authority, which causes more problems when it goes down. Being an authority, when you want people to trustlessly mirror your data, I wonder if crates.io has a keypair and regularly publishes signed indices over all data (I guess not?)
Notice I have no problem with crates.io being authority. It just needs to be done right. For decentralizing an authority I have more ideas because I dedicate a lot of time to researching decentralization tech.
You would now need an extra copy of each crate for each minor/patch version every dependency uses. Wich would make communicating between crates using anything other than std impossible.
I already said, you can have a language over the hash based registry.
Language stating that {x,y,z} where x,y,z are versions are compatible, etc. Semantic versioning expresses such constraints, over minor versions being compatible. Package manager collects these constraints, and put them into a constraint solver, such as SMT solver. That's how they work.
Thats often the case when I look into designs of non-cryptography-aware people.
I’ve experimented with generating registries into ipfs before
I never implied IPFS is the best choice. It's too bloated and last time I checked the implementations suck. Using it or not, I advocate you to adopt self-authenticating data structures such as presented in IPFS. They are always compatible, like, you can turn a centralized authority, more decentralized by signing its data. The authority stays authority and functioning, but now you can get its data from other servers because it's signed.
Everybody's use case is different, but personally that's a hard pass for me.
Relative to using semver, using git hashes is just painful, because hashes are more human-hostile than semver is, and as the name implies, there is a semantic meaning attached to a semver.
On top of that, it would break massive parts of the ecosystem, specifically the parts that rely on semver at this time, none of which are going anywhere any time soon.
I've been using Rust successfully for about 10 years now without having to reach for vendoring even once, and just using the facilities provided by Cargo and crate.io.
So to me this has the feel of an X/Y problem.
What is it you're really trying to achieve here with vendoring?
I just discovered that, dependent packages do not adopt the [patches] section, which is absurd and annoying, because I already declared I wanted to vendor a crate in the that one package with [patch].