Symbol mangling of Rust vs. C++


#1

I want to add support for Rust symbol demangling to Bloaty McBloatface, my generic ELF/Mach-O size profiling tool.

I was really happy to see Bloaty providing some useful results on a Rust binary yesterday. Torste Aikio wrote a proof-of-concept patch to Bloaty to add native Rust symbol demangling, and he opened an issue on Bloaty’s GitHub to discuss.

I was hoping that I could add Rust demangling to Bloaty without requiring my users to put any Rust-specific switch on their command-line. I wanted to do something like:

if (RustDemangle(sym, &rust_demangled)) {
  demangled = rust_demangled;
} else if (CppDemangle(sym, &cpp_demangled)) {
  demangled = cpp_demangled;
} else {
  demangled = sym;
}

My theory was that a symbol wouldn’t successfully demangle as both Rust and C++ – that the set of demanglable symbols would be disjoint. Unfortunately it appears this is not the case. It appears that Rust intentionally uses a scheme similar to C++, to get reasonable output with tools that don’t support Rust.

I can see why that would provide a smoother experience for tools that support C++ but not Rust. Unfortunately it makes it harder to provide dedicated/first-class Rust support in a tool like Bloaty, especially for mixed C++/Rust binaries. If some symbols are Rust and others are C++, ideally both could be demangled properly on a symbol-by-symbol basis.

I haven’t looked in-depth at Rust’s mangling scheme. Is there anything that could give me a hint about whether a given symbol is actually Rust or C++?


#2

Rust symbols end with a hash, which is pretty good indicator. For example, rustc references rustc_driver::main as something like _ZN12rustc_driver4main17hc2f56434c600330fE. That 17h... is the mangling for a 17-byte identifier, starting with h, then the 16 hex digits of the hash.


#3

As suggested by erahm on IRC, if you have debug info mapping symbols to source files, you could also guess based on the filename extension.


#4

If it’s DWARF, you can also look at the DW_AT_producer for the CU that contains the symbol, which will look something like clang LLVM (rustc version 1.26.0).

edit: How could I forget, there’s also DW_AT_language in the CU, which will be DW_LANG_Rust == 0x1c.


#5

I think your best bet is to use the hash, as @cuviper said.

Rust symbols always have such a hash value at the end.

You could try to detect that. If it exists, treat it as a Rust symbol. If it does not, treat it as C++.