Symbol mangling of Rust vs. C++

I want to add support for Rust symbol demangling to Bloaty McBloatface, my generic ELF/Mach-O size profiling tool.

I was really happy to see Bloaty providing some useful results on a Rust binary yesterday. Torste Aikio wrote a proof-of-concept patch to Bloaty to add native Rust symbol demangling, and he opened an issue on Bloaty’s GitHub to discuss.

I was hoping that I could add Rust demangling to Bloaty without requiring my users to put any Rust-specific switch on their command-line. I wanted to do something like:

if (RustDemangle(sym, &rust_demangled)) {
  demangled = rust_demangled;
} else if (CppDemangle(sym, &cpp_demangled)) {
  demangled = cpp_demangled;
} else {
  demangled = sym;
}

My theory was that a symbol wouldn’t successfully demangle as both Rust and C++ – that the set of demanglable symbols would be disjoint. Unfortunately it appears this is not the case. It appears that Rust intentionally uses a scheme similar to C++, to get reasonable output with tools that don’t support Rust.

I can see why that would provide a smoother experience for tools that support C++ but not Rust. Unfortunately it makes it harder to provide dedicated/first-class Rust support in a tool like Bloaty, especially for mixed C++/Rust binaries. If some symbols are Rust and others are C++, ideally both could be demangled properly on a symbol-by-symbol basis.

I haven’t looked in-depth at Rust’s mangling scheme. Is there anything that could give me a hint about whether a given symbol is actually Rust or C++?

5 Likes

Rust symbols end with a hash, which is pretty good indicator. For example, rustc references rustc_driver::main as something like _ZN12rustc_driver4main17hc2f56434c600330fE. That 17h... is the mangling for a 17-byte identifier, starting with h, then the 16 hex digits of the hash.

2 Likes

As suggested by erahm on IRC, if you have debug info mapping symbols to source files, you could also guess based on the filename extension.

If it’s DWARF, you can also look at the DW_AT_producer for the CU that contains the symbol, which will look something like clang LLVM (rustc version 1.26.0).

edit: How could I forget, there’s also DW_AT_language in the CU, which will be DW_LANG_Rust == 0x1c.

5 Likes

I think your best bet is to use the hash, as @cuviper said.

Rust symbols always have such a hash value at the end.

You could try to detect that. If it exists, treat it as a Rust symbol. If it does not, treat it as C++.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.