The compiler has a number of tools for internal tracing and introspection, most of which are various amounts of broken at any given time. There used to be
-Ztime-passes, but since the move to a query-driver approach that flag has become all but useless. There is also the
-Zprofile-queries flag but
- It appears to currently be somewhat broken
- Why roll our own tracing framework when there are already good solutions out there?
I’ve been playing around with some tracing tools from the Linux Tracing Toolkit - Next Generation (LTTNG) and wanted to see if I could put it to use profiling the Rust compiler. I have a basic version working, which can produce pretty graphs if not yet useful insights. I think my approach could be extended to support ETW, but somebody more familiar with that platform would probably have to chime in on that =). First, said pretty pictures:
What you’re looking at is Trace Compass, a tool for analyzing traces in a number of different trace formats; including the Common Trace Format (CTF), LTTNG’s native format. The trace above was generated by compiling the
inflate crate. In the center of the image is the raw trace data. On the left is the “project manager” view which should look familiar to those who have had the (mis?)fortune of using Eclipse in the past. At the bottom is where the interesting stuff is going on: each row represents a
rustc thread, and each colored bar represents the runtime of a query or provider. Right now this output is generated using TraceCompass’s “data driven analysis” which allows quick mockups of tracing tools by writing a couple dozen lines of XML. If you want to play with this trace and the really rough analysis’s I’ve built thus far, you can download TraceCompass and this trace package or build my branch here. I’ll write up how you can get this running in more detail in a bit, but I’m running out of time to write this post =). This example trace also doesn’t showcase the fact that we’re exporting relatively rich data: all query arguments are being dumped out for example.
What’s not shown here is that you can also capture system state in the trace. What that means is that we can add context like the number of instructions executed in the current thread since startup to every event, or what I/O requests the kernel is seeing while
rustc is running. This would allow us to get an accurate picture of what kinds of queries are suffering from what performance issues (e.g. maybe incr. comp. is slow because it’s making lots of small I/O requests and causing excessive seeking on HDDs, or for some reason queries on a particular DefID are slow). Also, the intention isn’t to use TraceCompass as the be-all end-all front-end for this data. Using some Python (and the
babeltrace library), it should be possible to convert these traces into flame graphs or other richer formats.
So, what does this cost? It appears about 15% performance. Without the instrumentation, compiling just
inflate (and not its dependencies) runs in about 0.36 seconds on my machine (averaged over 50 runs).
366.383791 task-clock:u (msec) # 1.002 CPUs utilized ( +- 0.13% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 17,706 page-faults:u # 0.048 M/sec ( +- 0.07% ) 1,050,326,560 cycles:u # 2.867 GHz ( +- 0.08% ) 726,573,268 stalled-cycles-frontend:u # 69.18% frontend cycles idle ( +- 0.15% ) 541,510,352 stalled-cycles-backend:u # 51.56% backend cycles idle ( +- 0.19% ) 1,115,871,244 instructions:u # 1.06 insn per cycle # 0.65 stalled cycles per insn ( +- 0.02% ) 198,127,596 branches:u # 540.765 M/sec ( +- 0.02% ) 6,732,660 branch-misses:u # 3.40% of all branches ( +- 0.04% ) 0.365815848 seconds time elapsed ( +- 0.20% )
With instrumentation, it takes about 0.42 seconds, again averaged over 50 runs.
424.312916 task-clock:u (msec) # 1.010 CPUs utilized ( +- 0.21% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 20,550 page-faults:u # 0.048 M/sec ( +- 0.05% ) 1,190,600,723 cycles:u # 2.806 GHz ( +- 0.18% ) 836,658,847 stalled-cycles-frontend:u # 70.27% frontend cycles idle ( +- 0.32% ) 661,027,967 stalled-cycles-backend:u # 55.52% backend cycles idle ( +- 0.38% ) 1,480,296,678 instructions:u # 1.24 insn per cycle # 0.57 stalled cycles per insn ( +- 0.03% ) 275,520,450 branches:u # 649.333 M/sec ( +- 0.03% ) 5,147,622 branch-misses:u # 1.87% of all branches ( +- 0.08% ) 0.420062591 seconds time elapsed ( +- 0.37% )
That’s about all I have for now. I’d like to keep working on this for my own use, but I’d also like to make it useful to as many people as possible so any and all feedback is appreciated.
CC @nikomatsakis: here’s the writeup of the tracing stuff I mentioned briefly during the NLL WG meeting today, I think you also mentioned that the Traits WG could find this useful but I’m not immediately seeing a good way to ping them =).