crates.io-index contains all of the packages that are on crate.io, but the metrics ('all-time downloads', 'recent updates', etc.) don't appear to be stored there. Is there somewhere I can bulk download them?
I want to put together my own (very simple) search engine using that information so I can make queries like "Last updated within the past 6 months AND downloaded more than 10,000 times". Basically, there's now a LOT of crates on crates.io, and I want better ways to filter down to what may be useful to me.
I expect the database dumps to contain this info:
The database dumps (experimental). The dump contains all information exposed by the API in a single download. It is updated every 24 hours. The latest dump is available at the address https://static.crates.io/db-dump.tar.gz. Information on using the dump is contained in the tarball. You can find the changelog for the database dumps in GitHub Issue #3617.
@bjorn3 Thank you, that was what I was looking for!
Note that the daily data dump only includes the last 90 days of downloads because the full data is quite a lot and most people don't need all of it. dtolnay has 2014-Aug 2022 data available from the link in this GitHub comment.
So the download counters reset every 90 days? Or does the dump only include info on crates that have been downloaded in the last 90 days?
The download counts for individual crates and versions are not reset. They represent the all-time download counts. Carol was referring to the detailed daily counts in the
version_downloads table. The daily data dump only includes the last 90 days of those per-day stats.