If it’s simple enough (should be), mrustc might be good enough to transpile it.
I think the reference implementation repo should also contain a native C version for demangling.
This duplicates the implementation increasing the maintenance cost. What value does this add?
AFAICT, the main thing this would allow is for C tools, like gdb, to be compiled with Rust support for platforms that Rust cannot target. Since Rust cannot compile to these platforms, there are no Rust programs to debug there, so gdb with Rust support would be useless, and gdb would just be better off completely disabling Rust support in this case.
Many projects, like GDB, will not want to have a Rust dependency – which is understandable as supporting multiple languages in a build system is a significant cost. For these kinds of projects I’d like that have something that they simply can copy.
GDB can also work on non-natively with core files and with remote targets. However, I expect most of the time your GDB host is going to be a common platform that Rust does support, and the oddball target is the one that’s less likely to have Rust support.
The perf tool also has rust symbol demangling support. We’ll want to update that, and something tells me it will be ~infinitely easier to write some C rather than trying to inject Rust into the Linux kernel build system.
Big +1 from me for a new mangling scheme.
I’ve found that not being able to back-map mangled symbols to a specific type instantiation is a huge problem in trying to understand backtraces, particularly in async code. I end up with dozens of frames of generic Future combinators but no concrete types to work out what’s actually going on.
A few notes:
The current scheme’s similarity to C++ mangling has been pretty awkward in practice. While feeding a Rust mangled symbol into C++ demangler kinda sorta gives a readable result, it isn’t great - a Rust-specific demangler gives a much better result. The trouble is that if a tool runs the symbol through C++ demangler first, then it becomes useless for Rust demangling. This means that any tool which might deal with a Rust symbol has to be changed to:
- distinguish the symbol as being Rust or C++ (or other)
- feed it through the right demangler accordingly
Right now, the distinguisher is just looking for a trailing hash, but it would be nicer to have something less ad-hoc. I think the ideal would be one of:
- A stock C++ demangler does a perfect job on Rust symbols, so no separate Rust demangler is needed
- A stock C++ demangler does nothing to Rust symbols, so the “demangled” symbols can also be fed through a Rust demangler
- Symbols can be accurately distinguished as Rust symbols just from the symbol alone (ie, without relying on debug info which may not be available)
I think 1. is along the lines of what @eddyb is suggesting. I like it for its generality, but I’m concerned that it relies too much on C++ demangler consistency. In practice at FB we’ve found that different C++ demanglers often fail to produce the same demangling for complex symbols (or simply fail); I’d be worried they wouldn’t handle Rust mangling particularly well. And it potentially constraints Rust if we start adding things to the language which aren’t well represented in C++. And conversely C++ is adding things; what if the mangling between Rust and C++ becomes ambiguous?
It sounds like @michaelwoerister is aiming for 2 & 3 - the
_R prefix will make a C++ demangler ignore the Rust symbol, and make them clearly distinguishable. It’s not ideal for the no Rust-aware tooling case, but I think its better as soon as Rust awareness is added.
Throughout this document, it uses the term “generic arguments”, which I find a bit confusing. I read it as meaning function arguments whose types are generic, but I think the actual meaning is more clearly expressed as “generic type parameters” or just “type parameters”. I’m not sure if the former is actually commonly accepted terminology in Rust and I’m the odd one out, but it confused me a couple of times esp in the discussion of
align_of() which has no arguments.
In “Closures and Closure Environments”, would the reference section go into more detail about how the closure numbering would actually work? I’m assuming it would be defined in terms of a traversal of the AST. The two cases that occurred to me are:
- Closures can be nested inside others, so presumably a depth-first preorder traversal would mean the outer closure would get N, and the inner N+1 (etc)
- If macros are present then they could arbitrarily generate or reorder closures with respect to the literal source
In Methods (both inherent and trait) its not clear to me what the value of having the instantiating crate is as part of the path. There’s already been discussion on this point, but I’m wondering specifically if this has any interactions with LTO (ie, could it inhibit LTO, or could a given symbol be considered instantiated by multiple crates?).
I think this was an attempt to include both type parameters and const generics parameters (and maybe even type constructor parameters if we get HKT someday) under a single term.
imo “generic parameters” would be marginally better than “generic arguments” for this purpose, but we don’t really have a nice, unambiguous term for “the things in angle brackets that we have to monomorphize over rather than pass values for at runtime”.
Thanks a lot for your feedback, @jsgf!
- Your description of the approach is accurate: The RFC suggests a mangling that is not compatible with Itanium C++ mangling and a Rust symbol can be identified as being Rust and not anything else by looking at it’s first two characters.
- Regarding generics: I tried to stick to the distinction between parameters and arguments I know from my introductory CS courses: There are formal parameters (as in function or type parameters) and what you provide as the actual value for a parameter is then called the argument. I might not have been entirely consistent in my application of the terms. As @Ixrec rightly notes, I didn’t use the term type parameters because with const generics the arguments could also be constants. I’ll try to make this clearer in the actual RFC.
- Regarding closures and other things where a disambiguation index will be needed: Yes, the detailed description would try to describe this precisely in terms of the (macro-expanded) AST. The example with nested closures is interesting. The inner closure would probably be addressed as
foo::c$0::c$0. And macros can indeed lead to indices that are hard to infer from looking at the unexpanded source code.
- Per crate instantiations of generic functions interact with LTO in that LTO’ed code would still contain multiple copies of the function. The mangling scheme does not define whether things will be instantiated per crate or not though. It just supports it because that’s the compiler’s current strategy. We might want to experiment with ODR linkage again since the last time we did so was at the beginning of 2016, iirc, and LLVM might do better at optimizing under these circumstances now than it did back then. However, this is a discussion that’s not directly related to how symbols are mangled
At some point, deep in template land, Itanium mangled names become so long that the existing compression method is utterly ineffective, and human readability is already lost.
Therefore, I 40% jokingy propose that version code 0 (symbols starting with
_R0) mean the following:
- A Rust mangled symbol, starting at the version code (dropping the _R only), compressed with gzip and encoded with base32.
The compiler will perform trial compression of all symbols greater than (TODO: 96?) characters and MUST use the compressed version if it is (TODO) smaller after b32 encoding.
It is a decoding error to place a v0 compressed symbol inside a v0 compressed symbol. Display the raw symbol. (This avoids quines causing infinite loops.)
“You want compression? Here’s your compression.”
Yes, it’s pretty unlikely IMO that gdb or binutils would accept a Rust dependency. At the same time, I think it’s not important to worry about this. Someone, probably me, will write a demangler for use here.
One other thing I’d like to point out is that c++filt already has hacks in place for Rust – it’s not the case that the current scheme is sufficiently C+±like to just work. In fact these hacks don’t go far enough (there’s also a patch thread somewhere that goes deeper into this).
If one were to take that seriously, then I’d propose using zstd with a standard dictionary computed over some corpus of existing Rust symbols. That would get much better compression than gzip/zlib. And it would be better to use some larger base; at least use the full available character set.
This avoids quines causing infinite loops
Already a huge risk with compression - you can set up exponential explosions with small inputs.
I think it would be wise to make the standard Rust mangling scheme(s), whatever they end up being, intentionally incompatible with the standard C++ mangling scheme(s) used on the same target. This is because, if you have C++ and Rust components linked into the same program, the linker should not resolve a call to
extern void foo(int, int) as targeting
pub fn foo(i32, i32) -> () unless the author of the latter specifically declared it
pub extern "C++".
I don’t have time to give any kind of detailed response but I just wanted to say, to paraphrase Spock: I have been, and always shall be, a friend (to rust defining and owning it’s own principled name mangling scheme).
Good luck! And I look forward eagerly to any resolutions on this front