Improve the heap (and cpu) profiling story


#1

I’ve been attempting to track down allocations in a rust binary, and it turns out, it’s not so easy. This post documents my efforts and ends with some paragraphs describing the necessity of easy cpu&heap analaysis in the Rust ecosystem.

Current guides for profiling rust code all focus on CPU performance:

Additionally, it’s relatively easy to use perf itself and come up with flamegraphs.

Unfortunately, there’s little to no information (I could find) about tracking down allocations.

What I’ve tried:

  • Brendan Gregg’s has a page describing how to use DTrace to track down consumption. Unfortunately, DTrace was built for Solaris and is forked for everything else. As written by the dtrace4linux author,

In general, you can take solaris tutorial scripts and use them to try and understand what doesnt work, but attempting to use them ‘as-is’ will frustrate you if you do not know what to look for.

  • Heaptrack, a new tool that requires quite a few boost libraries. Unfortunately, it did not work for me after compiling from source.
  • Valgrind’s massif, which seemed easy enough to use. This did not work for me, but had it, I did not see an option to actually see where allocations are coming from.

I then found out that jemalloc, which I know Rust (by default) uses, has a leak checking guide. I couldn’t find libjemalloc.so.2 on my system, so I downloaded the jemalloc repo and compiled it with --enable-prof. Unfortunately, Rust compiles binaries with their own copies of jemalloc and those copies do not have --enable-prof, which rendered the libjemalloc.so.2 analysis aspect useless.

There does exist some piece of configuration for compiling rustc itself:

# Whether or not jemalloc is built with its debug option set
debug-jemalloc = false

but unfortunately the code path that uses that flag is commented out and, were it not, does not add the necessary --enable-prof flag (I found this out after compiling rust for an hour).

So, I applied this patch:

diff --git a/src/liballoc_jemalloc/build.rs b/src/liballoc_jemalloc/build.rs
index 7dd85ddcc7..1c66955452 100644
--- a/src/liballoc_jemalloc/build.rs
+++ b/src/liballoc_jemalloc/build.rs
@@ -116,6 +116,7 @@ fn main() {
     //if cfg!(feature = "debug") {
     //    cmd.arg("--enable-debug");
     //}
+    cmd.arg("--enable-prof");
 
     cmd.arg(format!("--host={}", build_helper::gnu_target(&target)));
     cmd.arg(format!("--build={}", build_helper::gnu_target(&host)));

and recompiled rust itself. I was finally able to profile jemalloc!

Unfortunately, even though I was able to get a leak summary, jeprof seemed unable to analyze it. I think the profile may be missing some requisite symbol table (in comparing it with the .heap profile generated against the w command).

So, I’m at a loss. My best bet seems to be Heaptrack above but that does not work for me. I may look into that again later. There are a lot of working groups right now, and there are plenty of issues in the dev-tools team. However, I think it’s important to have good runtime analysis. Unless there are guides that I could not find, or tribal knowledge that I do not know, there is no good story for runtime analysis.

I’ve seen discussion about what can and can’t benefit the cpu, or how heavy or not some allocations are. I know the cpu analysis can be done with external tools, but I really don’t know how people are reasoning about allocation performance. It seems to me that any talk about allocations nowadays is either guesswork or crude observation of top.

Go shipped with go tool pprof out of the gate as well as its net/http/pprof package, and Go’s pprof has been invaluable to me. The simplicity it provided has certainly been hard to lose, and Rust runtime analysis currently feels like witchcraft (and I doubt people actually analyze their code all too often simply because the barrier is high).

I’d love it if analysis were as simple as cargo profile [rust binary to run].

Are there existing issues tracking adding good analysis into Rust, and if not, can those be started? Adding easy analysis would be a boon to the ecosystem as a whole, would put to rest theoretical discussions of runtime tradeoffs, and would help eliminate code reversions where performance regressions are only noticed after the fact.


#2

Currently our hope is that custom allocator support will enable the ability to customize jemalloc however you need for situations like this, for example the jemallocator crate I believe should provide options for heap profiling with jemalloc


#3

I’ve had good results using Valgrind for memory profiling after switching to the system allocator (requires nightly Rust, for now). I used Valgrind’s DHAT as described by @nnethercote in a blog post about optimizing rustc. Other tools that work for C and C++ should also work for Rust as long as it’s built with alloc_system instead of alloc_jemalloc.


#4

Oh man I just found my way back here after suddenly (it’s been a while since I’ve needed it) failing to pick up heap allocations… Turns out the API for overriding the allocator is a bit different now, as per the link @mbrubeck posted.

@twmb you might want to give massif and heaptrack another go after switching to the system allocator:

#![feature(alloc_system, global_allocator, allocator_api)]

extern crate alloc_system;

use alloc_system::System;

#[global_allocator]
static A: System = System;

heaptrack was hanging for me after executing the program until I got that right.


#5

Omg thank you for this just experienced literally exact same issue.

I’m a littl disappointed on the new verbosity, since I have several sub crates I have to patch like this every time, but I suppose there’s a good reason