[prototype][dev-tool] RustPräzi: a tool to build an entire call graph of crates.io

jhejderup · November 28, 2018, 4:31pm

TL;DR: RustPräzi is like rust-lang-nursery/crater, but creates a single versioned call graph of crates.io

We are happy to announce our first release of RustPräzi, a PoC (Proof-of-Concept) project that downloads all crate versions from crates.io, builds LLVM call graphs and links them into a single large versioned call-based dependency network. Unlike a regular dependency network, a call-based dependency network represents function call chains on both the intra- and inter-package level, supporting graph analytics/queries such as:

Identifying central crate APIs that are important for the stability of crates.io
Impact analysis of deprecated API functions: how many crates are still depending on deprecated functions that should be removed?
Security vulnerabilities: which crates in crates.io are affected by a vulnerable function?

Link to the project: https://github.com/praezi/rust

Link to our preliminary research paper: https://pure.tudelft.nl/portal/files/46926997/main2.pdf.

What is WIP?

Our current focus is to make it production-grade, like:

Add proper error management, retry mechanism for running failed compilations
Integrate it with cargo and add extensible analysis modes
Incrementally update the graph when a new release is published
Implement a robust query platform with a proper graph database

Vision

We are now looking at possibilities to turn our work into a production-grade tool that benefits the Cargo/crates.io community, both library maintainers and clients with intelligent dependency analysis. In particular, equip the cargo community with a tool that can aid in the stability of crates.io, prevent publications of impactful bad releases by lightweight code vetting (like this fresh incident [1]), and also crate maintainers can understand the changes they make.

[1] https://www.theregister.co.uk/2018/11/26/npm_repo_bitcoin_stealer/

Want to know more?

Chat with us on https://gitter.im/praezi/rust

Joseph(@jhejderup), Moritz, and Georgios

pietroalbini · November 28, 2018, 7:27pm

This is great! I hope Crater and RustPräzi will be able to improve each other with new ideas in the future!

eddyb · November 28, 2018, 7:56pm

Wow, this looks amazing!

FWIW you don’t need LLVM to get a call graph, you could make a modified rustc that outputs the relevant information even in cargo check mode (--emit=meta).

The relevant infrastructure is in rustc_mir::monomorphize - the “monomorphization collector” finds all the statically dispatched calls, effectively building a callgraph.

(rustc then splits the list of monomorphizations it finds into “codegen units” and uses that to know which monomorphizations to codegen and where, but you don’t need that part)

You also get access to rustc’s type information this way, which you might want to use one way or another.

cc @michaelwoerister

bascule · November 28, 2018, 8:23pm

Phenomenal work! I am quite interested in this and pointed your work out to the Secure Code WG:

Another member of the WG has started work on a tool to scan crates.io and extract information about crates with security vulnerabilities, based on information from the RustSec advisory database. You can find that tool here:

Source: Zach Reizner / crates-audit · GitLab
Site Prototype: https://crates-audit.zach297.com/

I think it would be very interesting if RustSec advisories could collect the relevant information needed to traverse this sort of call graph from the impacted functions in a vulnerable crate to all of its transitively vulnerable dependencies.

I'm also now noticing, in a lot of cases, we probably have the relevant info needed already in most advisories, however it's buried in a prose description, and we'd need to hoist it out into structured metadata for the advisory. Here is an example:

This advisory notes that the SmallVec::insert_many function in the smallvec crate is impacted. It would be neat to trace the callgraph using that as an anchor point.

Edit: I opened a RustSec issue to talk about this idea here: Collecting metadata for impacted functions in advisories · Issue #68 · rustsec/advisory-db · GitHub

jhejderup · November 28, 2018, 9:24pm

@bascule many thanks for reaching out to the Secure Code WG! Including metadata about affected code entities in security advisories would make it possible to systematically scan vulnerabilities on RustPräzi (this would be super nice!). Also, thanks for letting me know about the crates-audit, I will have look at it.

jhejderup · November 28, 2018, 9:29pm

oh this is great! I will explore creating a call graph this way, many thanks @eddyb!

system · March 25, 2019, 8:31am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: Reviving "Security advisories in crates.io" (RFC PR #1752) cargo	14	2988	April 18, 2019
Idea: Security advisories as part of crates.io metadata tools and infrastructure	38	4547	March 25, 2019
Crate capability lists	32	4808	March 25, 2019
[pre-RFC] Security advisories as part of crates.io metadata cargo	30	4406	March 25, 2019
How to audit and improve Rust crates eco-system for security in general? cargo	15	2938	December 22, 2024

[prototype][dev-tool] RustPräzi: a tool to build an entire call graph of crates.io

Related topics