We are happy to announce our first release of RustPräzi, a PoC (Proof-of-Concept) project that downloads all crate versions from crates.io, builds LLVM call graphs and links them into a single large versioned call-based dependency network. Unlike a regular dependency network, a call-based dependency network represents function call chains on both the intra- and inter-package level, supporting graph analytics/queries such as:
Identifying central crate APIs that are important for the stability of crates.io
Impact analysis of deprecated API functions: how many crates are still depending on deprecated functions that should be removed?
Security vulnerabilities: which crates in crates.io are affected by a vulnerable function?
Our current focus is to make it production-grade, like:
Add proper error management, retry mechanism for running failed compilations
Integrate it with cargo and add extensible analysis modes
Incrementally update the graph when a new release is published
Implement a robust query platform with a proper graph database
Vision
We are now looking at possibilities to turn our work into a production-grade tool that benefits the Cargo/crates.io community, both library maintainers and clients with intelligent dependency analysis. In particular, equip the cargo community with a tool that can aid in the stability of crates.io, prevent publications of impactful bad releases by lightweight code vetting (like this fresh incident [1]), and also crate maintainers can understand the changes they make.
FWIW you don’t need LLVM to get a call graph, you could make a modified rustc that outputs the relevant information even in cargo check mode (--emit=meta).
The relevant infrastructure is in rustc_mir::monomorphize - the “monomorphization collector” finds all the statically dispatched calls, effectively building a callgraph.
(rustc then splits the list of monomorphizations it finds into “codegen units” and uses that to know which monomorphizations to codegen and where, but you don’t need that part)
You also get access to rustc’s type information this way, which you might want to use one way or another.
Phenomenal work! I am quite interested in this and pointed your work out to the Secure Code WG:
Another member of the WG has started work on a tool to scan crates.io and extract information about crates with security vulnerabilities, based on information from the RustSec advisory database. You can find that tool here:
I think it would be very interesting if RustSec advisories could collect the relevant information needed to traverse this sort of call graph from the impacted functions in a vulnerable crate to all of its transitively vulnerable dependencies.
I'm also now noticing, in a lot of cases, we probably have the relevant info needed already in most advisories, however it's buried in a prose description, and we'd need to hoist it out into structured metadata for the advisory. Here is an example:
This advisory notes that the SmallVec::insert_many function in the smallvec crate is impacted. It would be neat to trace the callgraph using that as an anchor point.
@bascule many thanks for reaching out to the Secure Code WG! Including metadata about affected code entities in security advisories would make it possible to systematically scan vulnerabilities on RustPräzi (this would be super nice!). Also, thanks for letting me know about the crates-audit, I will have look at it.