I'd like to make a tool that is able to generate the mangled name of a Rust symbol so that it could be linked against from another language (probably C or C++), but I don't want to run rustc because it can potentially take a very long time which shouldn't be necessary for just mangling symbols. The problem is that Rust's symbol mangling scheme has multiple valid outputs from the same symbol. Looking at the code (rust/compiler/rustc_symbol_mangling/src at master · rust-lang/rust · GitHub), it seems like it might be very difficult to mangle a symbol without running all of rustc.
So my question is, is it possible to run just the symbol mangling?
I'm OK with having to clone rustc (or some part of it) to make minor changes (for example, visibility) to allow this to be done. I'm also fine with the solution only definitely working with a specific version of rustc, that's to be expected.
As for the specific symbols to be mangled, ideally it would work with everything, but at a minimum it would need to be possible to generate the mangled names of freestanding functions inside crates and modules, and getting it to work with generics would require a compile anyway (to make sure the generic is instantiated for the type).
I know #[no_mangle] exists, the goal is to not require modifications to existing code, and #[no_mangle] has other problems (only works with freestanding functions, can cause name conflicts).
I don’t think you can really do this without the compiler, since the mangled name of a Rust function depends on its input and output types, which means you need to understand generics, type aliases, and imports. Maybe it could be a rust-analyzer query or similar?
At the very least you need to know the invocation that cargo will use to compile the crate. In particular the crate name and all -Cmetadata arguments. You also need to know the exact rustc version string, which crate each type used as generic argument originated from (needs name resolution and by extension macro expansion) and for the legacy symbol mangling version the exact function signature. And rustc can and will change the way various things are hashed/encoded as well as the information you need to mangle symbol names.
I understand that I’ll need to run rustc code to make this work, I just don’t want to have to do a full compile. If there’s a way to make one of the --emit options that doesn’t require running LLVM (maybe llvm-ir?) to have mangled names instead of the unmangled ones, that would work.
I’ll also look into getting rustc to compile some code that just depends on outside symbols and see whether I can get it to output mangled names that way, since compiling a function that just uses a bunch of declared but not defined symbols should be relatively fast, even if a full compile is necessary.
Yeah, I’m aware that it’s a lot. Fortunately, a lot of that can be solved by not actually parsing Rust code (instead requiring the person using the tool to spell out in what crate everything is, for example). Ideally this could be generated from Rust code directly but that’s not the initial goal. I also don’t want to support the legacy mangling scheme, so that won’t be an issue.
it is possible to call into rustc code by using #![feature(rustc_private)]. this is how tools like clippy and rustdoc work. the main downside is you don't get any stability guarantees, so you might end up having to refactor your code to match an internal rustc refactor.
I understand how to call into rustc code and the fact that it doesn't give stability guarantees, the question is what code I actually need to call into. However, it seems like this might not be needed since compiling with --emit=llvm-ir seems to mangle symbols, so it might be usable.