Tracing in the Rust compiler


#1

The compiler has a number of tools for internal tracing and introspection, most of which are various amounts of broken at any given time. There used to be -Ztime-passes, but since the move to a query-driver approach that flag has become all but useless. There is also the -Zprofile-queries flag but

  • It appears to currently be somewhat broken
  • Why roll our own tracing framework when there are already good solutions out there?

I’ve been playing around with some tracing tools from the Linux Tracing Toolkit - Next Generation (LTTNG) and wanted to see if I could put it to use profiling the Rust compiler. I have a basic version working, which can produce pretty graphs if not yet useful insights. I think my approach could be extended to support ETW, but somebody more familiar with that platform would probably have to chime in on that =). First, said pretty pictures:

What you’re looking at is Trace Compass, a tool for analyzing traces in a number of different trace formats; including the Common Trace Format (CTF), LTTNG’s native format. The trace above was generated by compiling the inflate crate. In the center of the image is the raw trace data. On the left is the “project manager” view which should look familiar to those who have had the (mis?)fortune of using Eclipse in the past. At the bottom is where the interesting stuff is going on: each row represents a rustc thread, and each colored bar represents the runtime of a query or provider. Right now this output is generated using TraceCompass’s “data driven analysis” which allows quick mockups of tracing tools by writing a couple dozen lines of XML. If you want to play with this trace and the really rough analysis’s I’ve built thus far, you can download TraceCompass and this trace package or build my branch here. I’ll write up how you can get this running in more detail in a bit, but I’m running out of time to write this post =). This example trace also doesn’t showcase the fact that we’re exporting relatively rich data: all query arguments are being dumped out for example.

What’s not shown here is that you can also capture system state in the trace. What that means is that we can add context like the number of instructions executed in the current thread since startup to every event, or what I/O requests the kernel is seeing while rustc is running. This would allow us to get an accurate picture of what kinds of queries are suffering from what performance issues (e.g. maybe incr. comp. is slow because it’s making lots of small I/O requests and causing excessive seeking on HDDs, or for some reason queries on a particular DefID are slow). Also, the intention isn’t to use TraceCompass as the be-all end-all front-end for this data. Using some Python (and the babeltrace library), it should be possible to convert these traces into flame graphs or other richer formats.

So, what does this cost? It appears about 15% performance. Without the instrumentation, compiling just inflate (and not its dependencies) runs in about 0.36 seconds on my machine (averaged over 50 runs).

        366.383791      task-clock:u (msec)       #    1.002 CPUs utilized            ( +-  0.13% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
            17,706      page-faults:u             #    0.048 M/sec                    ( +-  0.07% )
     1,050,326,560      cycles:u                  #    2.867 GHz                      ( +-  0.08% )
       726,573,268      stalled-cycles-frontend:u #   69.18% frontend cycles idle     ( +-  0.15% )
       541,510,352      stalled-cycles-backend:u  #   51.56% backend cycles idle      ( +-  0.19% )
     1,115,871,244      instructions:u            #    1.06  insn per cycle         
                                                  #    0.65  stalled cycles per insn  ( +-  0.02% )
       198,127,596      branches:u                #  540.765 M/sec                    ( +-  0.02% )
         6,732,660      branch-misses:u           #    3.40% of all branches          ( +-  0.04% )

       0.365815848 seconds time elapsed                                          ( +-  0.20% )

With instrumentation, it takes about 0.42 seconds, again averaged over 50 runs.

        424.312916      task-clock:u (msec)       #    1.010 CPUs utilized            ( +-  0.21% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
            20,550      page-faults:u             #    0.048 M/sec                    ( +-  0.05% )
     1,190,600,723      cycles:u                  #    2.806 GHz                      ( +-  0.18% )
       836,658,847      stalled-cycles-frontend:u #   70.27% frontend cycles idle     ( +-  0.32% )
       661,027,967      stalled-cycles-backend:u  #   55.52% backend cycles idle      ( +-  0.38% )
     1,480,296,678      instructions:u            #    1.24  insn per cycle         
                                                  #    0.57  stalled cycles per insn  ( +-  0.03% )
       275,520,450      branches:u                #  649.333 M/sec                    ( +-  0.03% )
         5,147,622      branch-misses:u           #    1.87% of all branches          ( +-  0.08% )

       0.420062591 seconds time elapsed                                          ( +-  0.37% )

That’s about all I have for now. I’d like to keep working on this for my own use, but I’d also like to make it useful to as many people as possible so any and all feedback is appreciated.

CC @nikomatsakis: here’s the writeup of the tracing stuff I mentioned briefly during the NLL WG meeting today, I think you also mentioned that the Traits WG could find this useful but I’m not immediately seeing a good way to ping them =).


#2

Hmm, that’s higher than I anticipated.


#3

I have also done some work in this area but have used Chrome’s trace_event format instead of lttng. I have some trace examples here sadly I only get the html to load in chrome.

Have used my crate rs_tracing that only prints the json to the stdout. this is my first crate so probably needs mush more work to get is useful.

Have replaced most calls to util::common::time with macros from rs_tracing in my rustc branch, have some linking problems so have not replaced everywhere so there is some gaps in the logs.