I recently discovered cargo-llvm-lines
, which prints out how many lines of LLVM IR is generated for each function in a Rust program, and how many times each one is instantiated. In large programs some generic functions are instantiated 100s or 1000s of times, usually ones like Vec::push
, Option::map
, and Result::map_err
.
I have used it to get some nice speed-ups in rustc (#72013, #72166, and #72139) and I added support for it to rustc-perf.
But I'm not entirely sure what it's measuring. If you run this:
cargo llvm-lines --foo --bar
it then runs something like this:
cargo rustc --foo --bar -- --emit=llvm-ir -Cno-prepopulate-passes -Cpasses=name-anon-globals
which causes a bunch of *.ll
files to be generated, and then it greps through them to count the lines of code per function.
Here is the comment from the code explaining the -C
options:
// The `-Cno-prepopulate-passes` means we skip LLVM optimizations, which is
// good because (a) we count the LLVM IR lines are sent to LLVM, not how
// many there are after optimizations run, and (b) it's faster.
//
// The `-Cpasses=name-anon-globals` follows on: it's required to avoid the
// following error on some programs: "The current compilation is going to
// use thin LTO buffers without running LLVM's NameAnonGlobals pass. This
// will likely cause errors in LLVM. Consider adding -C
// passes=name-anon-globals to the compiler command line."
I wrote that comment myself and it seems reasonable, but I'm not certain it's correct, and I'm also not certain that this is the best way to invoke rustc.
So: will this tool really count exactly the lines of LLVM IR passed to the LLVM back-end? I've been comparing the results for opt and debug builds. They're usually fairly similar, but not always. E.g. for the version of ripgrep
in rustc-perf
, here is the first part of the output for a debug build:
Lines Copies Function name
----- ------ -------------
137595 (100%) 3815 (100%) (TOTAL)
4530 (3.3%) 256 (6.7%) core::ptr::drop_in_place
3562 (2.6%) 8 (0.2%) rg::search_stream::Searcher<R,W>::run
2653 (1.9%) 41 (1.1%) core::option::Option<T>::map
2505 (1.8%) 1 (0.0%) clap::app::parser::Parser::get_matches_with
1712 (1.2%) 1 (0.0%) rg::args::ArgMatches::to_args
1696 (1.2%) 8 (0.2%) rg::search_stream::Searcher<R,W>::fill
1584 (1.2%) 4 (0.1%) rg::search_stream::InputBuffer::fill
1369 (1.0%) 18 (0.5%) core::result::Result<T,E>::map_err
1293 (0.9%) 2 (0.1%) ignore::walk::WalkParallel::run
1200 (0.9%) 16 (0.4%) core::option::Option<T>::map_or
1104 (0.8%) 24 (0.6%) rg::printer::Printer<W>::write_colored
1076 (0.8%) 8 (0.2%) rg::worker::Worker::search
1059 (0.8%) 12 (0.3%) alloc::vec::Vec<T>::extend_desugared
1052 (0.8%) 2 (0.1%) rg::printer::Printer<W>::write_match
1028 (0.7%) 8 (0.2%) rg::search_stream::Searcher<R,W>::new
and for an opt build:
Lines Copies Function name
----- ------ -------------
331480 (100%) 9602 (100%) (TOTAL)
16820 (5.1%) 780 (8.1%) core::ptr::drop_in_place
6944 (2.1%) 31 (0.3%) alloc::raw_vec::RawVec<T,A>::grow_amortized
5648 (1.7%) 79 (0.8%) core::result::Result<T,E>::unwrap_or_else
5323 (1.6%) 72 (0.7%) core::option::Option<T>::map
4162 (1.3%) 8 (0.1%) rg::search_stream::Searcher<R,W>::run
4032 (1.2%) 56 (0.6%) alloc::raw_vec::RawVec<T,A>::current_memory
3978 (1.2%) 39 (0.4%) core::alloc::layout::Layout::array
3379 (1.0%) 1 (0.0%) clap::app::parser::Parser::get_matches_with
2693 (0.8%) 24 (0.2%) alloc::vec::Vec<T>::extend_desugared
2616 (0.8%) 50 (0.5%) alloc::alloc::box_free
2568 (0.8%) 24 (0.2%) alloc::raw_vec::RawVec<T,A>::allocate_in
2529 (0.8%) 24 (0.2%) <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::spec_extend
2365 (0.7%) 55 (0.6%) <alloc::raw_vec::RawVec<T,A> as core::ops::drop::Drop>::drop
2320 (0.7%) 4 (0.0%) <alloc::collections::btree::map::BTreeMap<K,V> as core::clone::Clone>::clone::clone_subtree
2302 (0.7%) 90 (0.9%) core::ptr::read
There are some big differences there! What might account for these differences?
A debug build will have debug assertions, which would increase its size, but the opt build is bigger.
The number of copies of generic functions differs significantly. Do opt builds do inlining in the front end? That could certainly account for some major differences.