Call for testing: Cargo sparse-registry

A way to improve that in the registry would be to on publish, provide an optional expected_dep_forest_size from the number of transitive dependencies used at publish time. This number could be used to provide a reasonable progress estimate quickly.

Well, it works !

I used the skyspell project for a somewhat realistic example.

cargo +nightly -Z sparse-registry update took 0.245s total
cargo +nightly update took 4.924s total

And my crates.io git index was pretty fresh - otherwise, the speed-up may even have been greater.

Nice :slight_smile:

Thank you for this UX feedback! Would you open a ticket on the Cargo repo for us to track this?

Just tried this for the first time on a host where updating the traditional index was going really slow, e.g. (34137/77894) resolving deltas

Wow! What a difference! What was previously taking 10+ minutes was done in less than 10 seconds.

Great work!

4 Likes

Are there any implementations(programs) that I can use to use this option to host an offline copy of crates.io without git usage?

2 Likes

Any webserver will do. python -m http.server can work for testing. In your clone of the index, you'll need to edit the config.json file to point to the URL of your local copy of .crate files.

If you're asking how to download all of the .crate files for mirroring, there are various tools for that. (GitHub - ChrisMacNaughton/cargo-cacher: A caching server for crates + cargo GitHub - C4K3/crates-ectype: Easily create a mirror of crates.io (crate downloads only, not the website) GitHub - weiznich/crates-mirror GitHub - panamax-rs/panamax: Mirror rustup and crates.io repositories, for offline Rust and cargo usage. are some that I have in my notes, though they may not be appropriate for this use case or may be outdated). If you want to write your own tool, it is fairly trivial to walk the index and fetch every https://static.crates.io/crates/{crate}/{crate}-{version}.crate. Just beware that it is currently over 41G.

1 Like

Hello everyone, Databend is a cloud data warehouse written by rust.

We start a pull request ci: Enable cargo sparse-registry in response to this call for testing.

Here is our feedback:

  • sparse-registry works (No build failure, no test failure, everything works)
  • crate index update & download is much faster
    • Before this PR: we need 75s (14:11:34 ~ 14:12:49) to update the index and download crates
    • Within this PR: we need 21s (13:15:30 ~ 13:15:51) to update the index and download crates
  • cargo-audit seems not to work well with sparse-registry

Detailed CI Logs

2 Likes

Is there an issue open on cargo-audit for this? If not, opening one with a reasonably small (number of packages) reproduction would be beneficial (but even a large one is still useful so others can minimize).

1 Like

Submitted as `cargo audit` doesn't work well with `sparse-registry` · Issue #604 · rustsec/rustsec · GitHub

6 Likes

I wrote a small utility to download only crates needed by a project by parsing output from cargo vendor.

I was hoping that beside my created crates folder I could also have a crates.io-index folder, ending with something like this:

$ ls offline-mirror/
crates/  crates.io-index/

I startup a simple python server, while within offline-mirror:

$ sudo python3 -m http.server 80

and the run cargo +nightly build from the same project with the following config:

$ cat .cargo/config
[unstable]
sparse-registry = true

[source.my-mirror]
registry = "http://192.168.42.64/crates.io-index"
[source.crates-io]
replace-with = "my-mirror"

However... it look as if it is requesting files not included in the git clone?

192.168.42.64 - - [26/Jul/2022 02:22:25] code 404, message File not found
192.168.42.64 - - [26/Jul/2022 02:22:25] "GET /crates.io-index/info/refs?service=git-upload-pack HTTP/1.1" 404 -
192.168.42.64 - - [26/Jul/2022 02:22:25] code 404, message File not found
192.168.42.64 - - [26/Jul/2022 02:22:25] "GET /crates.io-index/info/refs?service=git-upload-pack HTTP/1.1" 404 -
192.168.42.64 - - [26/Jul/2022 02:22:25] code 404, message File not found
192.168.42.64 - - [26/Jul/2022 02:22:25] "GET /crates.io-index/info/refs?service=git-upload-pack HTTP/1.1" 404 -2

Any tips, or a RFC describing the HTTP protocol would be great. thanks.

You're missing sparse+ in the registry url. That's what tells Cargo to use the new protocol. Otherwise it attempts to fetch it as git over HTTP.

registry = "sparse+http://192.168.42.64/crates.io-index"

The RFC is 2789 - but it's a bit light on details.

Thanks that worked.

If anyone is interested, the simple project is here: zerus.

Trying this with Miri, where I often do ./miri install --offline to avoid losing time to the database updates.

Results of hyperfine -w1 "./miri install" with the default git index:

Benchmark 1: ./miri install
  Time (mean ± σ):     718.3 ms ± 363.2 ms    [User: 208.8 ms, System: 98.1 ms]
  Range (min … max):   531.1 ms … 1657.5 ms    10 runs

With the sparse index:

Benchmark 1: ./miri install
  Time (mean ± σ):      1.340 s ±  0.666 s    [User: 0.258 s, System: 0.129 s]
  Range (min … max):    0.646 s …  2.660 s    10 runs

So on average the sparse index is a lot slower, probably because I am currently on a fairly poor internet connection and it makes a lot more queries.

Notably, those numbers all appear to be doubled.

Going from randomized (0.7s, 10s) to a certain (1.5s) feels like a win, personally :slight_smile:

No idea where you got those numbers from, but it's going from (0.5s - 1.6s) to (0.6s - 2.6s). It's definitely strictly worse than before, larger average and larger variance.

Thanks for the these results. I'm not familiar with what miri install does. Does it require an index update? How many crates needed to be fetched? Did these tests have an up-to-date local cache?

It's just two calls to cargo install, after setting up RUSTFLAGS and some other env vars.

The hyperfine invocation includes a warmup run so all the caches should be up-to-date.

Note that since this has been stabilized, the flag to set to use it by default is CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse. CARGO_UNSTABLE_SPARSE_REGISTRY no longer has any effect.

This topic has been closed in favor of: