What exactly is `cargo llvm-lines` measuring?

I recently discovered cargo-llvm-lines, which prints out how many lines of LLVM IR is generated for each function in a Rust program, and how many times each one is instantiated. In large programs some generic functions are instantiated 100s or 1000s of times, usually ones like Vec::push, Option::map, and Result::map_err.

I have used it to get some nice speed-ups in rustc (#72013, #72166, and #72139) and I added support for it to rustc-perf.

But I'm not entirely sure what it's measuring. If you run this:

cargo llvm-lines --foo --bar

it then runs something like this:

cargo rustc --foo --bar -- --emit=llvm-ir -Cno-prepopulate-passes -Cpasses=name-anon-globals

which causes a bunch of *.ll files to be generated, and then it greps through them to count the lines of code per function.

Here is the comment from the code explaining the -C options:

// The `-Cno-prepopulate-passes` means we skip LLVM optimizations, which is
// good because (a) we count the LLVM IR lines are sent to LLVM, not how
// many there are after optimizations run, and (b) it's faster.
//
// The `-Cpasses=name-anon-globals` follows on: it's required to avoid the
// following error on some programs: "The current compilation is going to
// use thin LTO buffers without running LLVM's NameAnonGlobals pass. This
// will likely cause errors in LLVM. Consider adding -C
// passes=name-anon-globals to the compiler command line."

I wrote that comment myself and it seems reasonable, but I'm not certain it's correct, and I'm also not certain that this is the best way to invoke rustc.

So: will this tool really count exactly the lines of LLVM IR passed to the LLVM back-end? I've been comparing the results for opt and debug builds. They're usually fairly similar, but not always. E.g. for the version of ripgrep in rustc-perf, here is the first part of the output for a debug build:

  Lines        Copies         Function name
  -----        ------         -------------
 137595 (100%)   3815 (100%)  (TOTAL)
   4530 (3.3%)    256 (6.7%)  core::ptr::drop_in_place
   3562 (2.6%)      8 (0.2%)  rg::search_stream::Searcher<R,W>::run
   2653 (1.9%)     41 (1.1%)  core::option::Option<T>::map
   2505 (1.8%)      1 (0.0%)  clap::app::parser::Parser::get_matches_with
   1712 (1.2%)      1 (0.0%)  rg::args::ArgMatches::to_args
   1696 (1.2%)      8 (0.2%)  rg::search_stream::Searcher<R,W>::fill
   1584 (1.2%)      4 (0.1%)  rg::search_stream::InputBuffer::fill
   1369 (1.0%)     18 (0.5%)  core::result::Result<T,E>::map_err
   1293 (0.9%)      2 (0.1%)  ignore::walk::WalkParallel::run
   1200 (0.9%)     16 (0.4%)  core::option::Option<T>::map_or
   1104 (0.8%)     24 (0.6%)  rg::printer::Printer<W>::write_colored
   1076 (0.8%)      8 (0.2%)  rg::worker::Worker::search
   1059 (0.8%)     12 (0.3%)  alloc::vec::Vec<T>::extend_desugared
   1052 (0.8%)      2 (0.1%)  rg::printer::Printer<W>::write_match
   1028 (0.7%)      8 (0.2%)  rg::search_stream::Searcher<R,W>::new

and for an opt build:

  Lines        Copies         Function name
  -----        ------         -------------
 331480 (100%)   9602 (100%)  (TOTAL)
  16820 (5.1%)    780 (8.1%)  core::ptr::drop_in_place
   6944 (2.1%)     31 (0.3%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
   5648 (1.7%)     79 (0.8%)  core::result::Result<T,E>::unwrap_or_else
   5323 (1.6%)     72 (0.7%)  core::option::Option<T>::map
   4162 (1.3%)      8 (0.1%)  rg::search_stream::Searcher<R,W>::run
   4032 (1.2%)     56 (0.6%)  alloc::raw_vec::RawVec<T,A>::current_memory
   3978 (1.2%)     39 (0.4%)  core::alloc::layout::Layout::array
   3379 (1.0%)      1 (0.0%)  clap::app::parser::Parser::get_matches_with
   2693 (0.8%)     24 (0.2%)  alloc::vec::Vec<T>::extend_desugared
   2616 (0.8%)     50 (0.5%)  alloc::alloc::box_free
   2568 (0.8%)     24 (0.2%)  alloc::raw_vec::RawVec<T,A>::allocate_in
   2529 (0.8%)     24 (0.2%)  <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::spec_extend
   2365 (0.7%)     55 (0.6%)  <alloc::raw_vec::RawVec<T,A> as core::ops::drop::Drop>::drop
   2320 (0.7%)      4 (0.0%)  <alloc::collections::btree::map::BTreeMap<K,V> as core::clone::Clone>::clone::clone_subtree
   2302 (0.7%)     90 (0.9%)  core::ptr::read

There are some big differences there! What might account for these differences?

A debug build will have debug assertions, which would increase its size, but the opt build is bigger.

The number of copies of generic functions differs significantly. Do opt builds do inlining in the front end? That could certainly account for some major differences.

5 Likes

I also have a question, that may be related. How does it handle functions that were duplicated by CGU partitioning? (from what I gathered from the recent CGU meeting and code, functions that are marked #[inline] are duplicated to each CGU where they are used.

From what I understand, CGU partitioning doesn't take debug/release into account, but some crates (eg. clap, which is also a dependency of ripgrep) override codegen-units setting for some profiles. If this matters, it could explain some differences between debug/release build.

@nnethercote btw. I am really surprised you didn't know about this tool until recently, considering you've been working on compile time for a long time. Now I feel bad I didn't ever mention it somewhere, because I found it pretty early once I got interested in compile time.

I was also surprised! But given the choice between responding to a situation with "this is good" and "this is good, but it should have happened sooner", I generally try to do the former. No point getting upset unnecessarily :slight_smile:

5 Likes