Big +1 from me for a new mangling scheme.
I’ve found that not being able to back-map mangled symbols to a specific type instantiation is a huge problem in trying to understand backtraces, particularly in async code. I end up with dozens of frames of generic Future combinators but no concrete types to work out what’s actually going on.
A few notes:
The current scheme’s similarity to C++ mangling has been pretty awkward in practice. While feeding a Rust mangled symbol into C++ demangler kinda sorta gives a readable result, it isn’t great - a Rust-specific demangler gives a much better result. The trouble is that if a tool runs the symbol through C++ demangler first, then it becomes useless for Rust demangling. This means that any tool which might deal with a Rust symbol has to be changed to:
- distinguish the symbol as being Rust or C++ (or other)
- feed it through the right demangler accordingly
Right now, the distinguisher is just looking for a trailing hash, but it would be nicer to have something less ad-hoc. I think the ideal would be one of:
- A stock C++ demangler does a perfect job on Rust symbols, so no separate Rust demangler is needed
- A stock C++ demangler does nothing to Rust symbols, so the “demangled” symbols can also be fed through a Rust demangler
- Symbols can be accurately distinguished as Rust symbols just from the symbol alone (ie, without relying on debug info which may not be available)
I think 1. is along the lines of what @eddyb is suggesting. I like it for its generality, but I’m concerned that it relies too much on C++ demangler consistency. In practice at FB we’ve found that different C++ demanglers often fail to produce the same demangling for complex symbols (or simply fail); I’d be worried they wouldn’t handle Rust mangling particularly well. And it potentially constraints Rust if we start adding things to the language which aren’t well represented in C++. And conversely C++ is adding things; what if the mangling between Rust and C++ becomes ambiguous?
It sounds like @michaelwoerister is aiming for 2 & 3 - the _R prefix will make a C++ demangler ignore the Rust symbol, and make them clearly distinguishable. It’s not ideal for the no Rust-aware tooling case, but I think its better as soon as Rust awareness is added.
Throughout this document, it uses the term “generic arguments”, which I find a bit confusing. I read it as meaning function arguments whose types are generic, but I think the actual meaning is more clearly expressed as “generic type parameters” or just “type parameters”. I’m not sure if the former is actually commonly accepted terminology in Rust and I’m the odd one out, but it confused me a couple of times esp in the discussion of align_of() which has no arguments.
In “Closures and Closure Environments”, would the reference section go into more detail about how the closure numbering would actually work? I’m assuming it would be defined in terms of a traversal of the AST. The two cases that occurred to me are:
- Closures can be nested inside others, so presumably a depth-first preorder traversal would mean the outer closure would get N, and the inner N+1 (etc)
- If macros are present then they could arbitrarily generate or reorder closures with respect to the literal source
In Methods (both inherent and trait) its not clear to me what the value of having the instantiating crate is as part of the path. There’s already been discussion on this point, but I’m wondering specifically if this has any interactions with LTO (ie, could it inhibit LTO, or could a given symbol be considered instantiated by multiple crates?).