Would it be possible to provide dump files containing all crates on crates.io?
This would be very helpful for:
- Scientists wanting to analyze the data
- Future mars colonists. Their ping times will definitely suck!
- Digital preppers preparing for the next apocalypse where hacked thermostats DDOS level3 or something
- Generally anyone who doesn’t want to rely on the internet for everything, e.g. who wants to be able to use Rust while being completely offline e.g. on planes/trains/etc.
- People in remote places like behind the great firewall of china, or on the falkland islands (falklands only have satellite internet, can you imagine that?). They obviously can’t download a dump, but one can download it once and then use it to set up a mirror.
Other projects like Wikipedia offer similar dumps already.
I’ve recently used crates-mirror to get 7997 crates of the 8089 crates currently on crates.io. But this method is not perfect. You have to download each crate separately, leading to a non trivial overhead in HTTP/TLS negotiations. Also, obviously, it might get in trouble with anti abuse systems.
My download of crates.io and the directory with the .crate files right now takes 6 GB storage space. So its of definitely manageable size. In comparison: wikipedia dumps take tens of gigabytes. And this is before very easy optimisations are applied, like only storing the diffs between .crate files.
Implementation wise it could be e.g. a weekly job, that creates combined .tar.gz/zip files, and uploads them to static.rlo or somewhere else. As crates.io is addition only, its okay to delete old dumps.