The rough and short of it is something like: name mangling was introduced by C++ in order to still use C linker tools but also have generic functions. The problem was the C function "ABI" is flat and essentially global (with various caveats) - a function has a one-to-one mapping with its name/use, e.g. printf
(with again some caveats) is the name that both the linker and the dynamic linker search for when resolving that function call by callees.
But a generic function essentially has one source code name, but may need multiple C function names (essentially one for each implementation - as though you hand specialized it a particular version for String
, usize
, or whatever arguments you pass as the generic object, etc.). Hence the solution was to emit a "mangled" name that encoded the parameter types in some manner, and this was the C linking name, which the linker tools could use (the compiler knew to emit the mangled name for a particular call, and the linker is dumb, and just searches for that name). I may take some flak, but it was basically a hack because they wanted to use C linker tools, not reinvent wheels, and so they did it, and moved on.
They famously never specified how to do this in the standard (mangle names). Again, I may take some flak, but I think this was a pretty serious blunder, and I can't see any particularly good reason for not doing so. (and now we can't really link C++ programs compiled with different compilers, due in a good part to lack of standardized name mangling, among other things)
So because Rust has generics, if the ABI we're talking about is for the entire set of rust, then we can't skip name mangling insofar as we have to provide a solution for some very interesting and important questions like:
- how to resolve a "path" to a symbol, given generic arguments
- where the genericity exists/whose responsibility it is
- how to represent genericity (sort of 1)
@cuviper has made some good points concerning 2 elsewhere. So when I talk about a "standardized name mangling protocol" what I'm talking about is engineering a solution to 1-3, but primarily 3, i.e., the nitty gritty of a transformation from symbolic name to a unique, reified reference (assuming we want to use dumb linker tools, which is probably never going to change).
And while we're on the topic, I'll just say I think we should use a variant of Godel numbering, e.g. something like: http://www.cse.unt.edu/~tarau/research/2009/fgoedel.pdf
AFAIK C doesn't have a symbol limit name per say, or if it does, it's probably platform dependent. I don't think any modern static or dynamic linker on Earth right now will have any problem with an extremely large symbol name (which definitely occur with C++). Of the three binary formats, ELF, mach-o, and PE, none of them have any technical limitation on the actual character count.
I'd have to check, but the "a.out
" format might, which is basically the original unix binary format.
If you meant the kind of characters, e.g., utf-8, rust could generate utf-8 symbols for functions which the linker and dynamic linker worked with just fine, until I sort of broke it in a roundabout way: export_name with unusual utf8 breaks new version script based linker · Issue #38238 · rust-lang/rust · GitHub