I've been attempting to track down allocations in a rust binary, and it turns out, it's not so easy. This post documents my efforts and ends with some paragraphs describing the necessity of easy cpu&heap analaysis in the Rust ecosystem.
Current guides for profiling rust code all focus on CPU performance:
Additionally, it's relatively easy to use perf
itself and come up with flamegraphs.
Unfortunately, there's little to no information (I could find) about tracking down allocations.
What I've tried:
- Brendan Gregg's has a page describing how to use DTrace to track down consumption. Unfortunately, DTrace was built for Solaris and is forked for everything else. As written by the
dtrace4linux
author,
In general, you can take solaris tutorial scripts and use them to try and understand what doesnt work, but attempting to use them 'as-is' will frustrate you if you do not know what to look for.
- Heaptrack, a new tool that requires quite a few boost libraries. Unfortunately, it did not work for me after compiling from source.
- Valgrind's massif, which seemed easy enough to use. This did not work for me, but had it, I did not see an option to actually see where allocations are coming from.
I then found out that jemalloc
, which I know Rust (by default) uses, has a leak checking guide. I couldn't find libjemalloc.so.2
on my system, so I downloaded the jemalloc repo and compiled it with --enable-prof
. Unfortunately, Rust compiles binaries with their own copies of jemalloc
and those copies do not have --enable-prof
, which rendered the libjemalloc.so.2
analysis aspect useless.
There does exist some piece of configuration for compiling rustc
itself:
# Whether or not jemalloc is built with its debug option set
debug-jemalloc = false
but unfortunately the code path that uses that flag is commented out and, were it not, does not add the necessary --enable-prof
flag (I found this out after compiling rust for an hour).
So, I applied this patch:
diff --git a/src/liballoc_jemalloc/build.rs b/src/liballoc_jemalloc/build.rs
index 7dd85ddcc7..1c66955452 100644
--- a/src/liballoc_jemalloc/build.rs
+++ b/src/liballoc_jemalloc/build.rs
@@ -116,6 +116,7 @@ fn main() {
//if cfg!(feature = "debug") {
// cmd.arg("--enable-debug");
//}
+ cmd.arg("--enable-prof");
cmd.arg(format!("--host={}", build_helper::gnu_target(&target)));
cmd.arg(format!("--build={}", build_helper::gnu_target(&host)));
and recompiled rust itself. I was finally able to profile jemalloc!
Unfortunately, even though I was able to get a leak summary, jeprof
seemed unable to analyze it. I think the profile may be missing some requisite symbol table (in comparing it with the .heap
profile generated against the w
command).
So, I'm at a loss. My best bet seems to be Heaptrack above but that does not work for me. I may look into that again later. There are a lot of working groups right now, and there are plenty of issues in the dev-tools team. However, I think it's important to have good runtime analysis. Unless there are guides that I could not find, or tribal knowledge that I do not know, there is no good story for runtime analysis.
I've seen discussion about what can and can't benefit the cpu, or how heavy or not some allocations are. I know the cpu analysis can be done with external tools, but I really don't know how people are reasoning about allocation performance. It seems to me that any talk about allocations nowadays is either guesswork or crude observation of top
.
Go shipped with go tool pprof
out of the gate as well as its net/http/pprof
package, and Go's pprof
has been invaluable to me. The simplicity it provided has certainly been hard to lose, and Rust runtime analysis currently feels like witchcraft (and I doubt people actually analyze their code all too often simply because the barrier is high).
I'd love it if analysis were as simple as cargo profile [rust binary to run]
.
Are there existing issues tracking adding good analysis into Rust, and if not, can those be started? Adding easy analysis would be a boon to the ecosystem as a whole, would put to rest theoretical discussions of runtime tradeoffs, and would help eliminate code reversions where performance regressions are only noticed after the fact.