pre-RFC: a new symbol mangling scheme

I’d argue that the spec should say that the mapping between name mangled identifiers and the identifiers used for linkers is implementation defined, since we shouldn’t really say anything about implementation of linking. In fact, I’d rather we define the name mangling scheme outside of the specification, since it’s not really important to the language, but only to specific implementations. We can define a specific name mangling scheme used by rustc, which other implementations can be compatible with.

1 Like

I am not convinced that repeatability is worth fighting for, or even desirable. Consider this: a C++ compiler needs to generate symbol names after seeing just the header file. If it doesn't generate the same symbol name as compilation that created the corresponding object file, linking will fail.
This is not a problem for Rust, which can save symbol names in crate metadata. As long as two implementations are compatible on crate metadata format level, they can generate symbol names however they like.
Am I missing something?

1 Like

would this be any good towards allowing generic statics?

static<T> EVENT_BUS_TO_HANDLERS: EventBTH<T> = EventBTH { v: Vec::EMPTY };

(where Vec::EMPTY is a generic const as opposed to a generic static, so it’s irrelevant here. this would probably use a lazy_static in practice.)

Big +1.

I'll give you a +2 if the mangling library of rustc would be implemented in the nursery and would come with a cross-validated demangling library that extracts all information from a symbol and comes with tools to build the dictionaries for compression/decompression.

What I learned writing cargo asm/cargo llvm-ir is that everybody that needs demangling either writes a Rust demangler by hand, or tapes their own solution on top of rustcdemangle because that library isn't enough for their purposes.

It would be really nice to have a solid library for this.


@Centril

To what extent is the goal here to stabilize a symbol mangling scheme and thus make it a part of “the spec”?

I would say "to no extend". I suggest wording this RFC as a better name mangling scheme for rustc only (whether this evolves in a de-facto standard or not time will tell). Putting name mangling in the Rust spec would require a lot of work (*) and deliver very little value without a full ABI.

(*) we would need a mangling scheme that would work on all past, present, and future platforms that Rust targets. If we put a name mangling scheme in the spec, and a future platform does not support it, then we won't be able to target this platform with Rust. Name mangling is something that screams "implementation defined", as in, the spec might mention name mangling, and might mention guarantees about name mangling (e.g. the type of mapping), but nothing specific about how it works on each platforms.


@gbutler

I think it would be better to not derail this discussion with a discussion about a stable Rust ABI. If you are interested in this, it might be worth it to discuss that in parallel in a different issue. Name mangling would be just a small piece of that discussion.

7 Likes

Yes, I thought about that and I even planned to do it. It would be easy to add a version number just after the _R prefix, like in _R2N...E. _R without a number would mean the first version (as is done for numbering of substitutions). I'd want to make sure that the grammar allows for this.

Definitely agree. Apologies for bringing up something so off-topic.

Yes, "one year" was just some number. What I meant was: maybe we decide in the future that this scheme is flexible enough for future needs and then we standardize. But this RFC does not try to do so. The mangling is still an implementation detail of the current compiler (but expected to be stable enough for tools to start supporting it).

And yes, as stated in the reply to @nicoburns, versioning can and should be supported in my opinion.

The compiler currently currently instantiates monomorphizations per crate (at least with opt-level >= 2). We've experimented with weak_odr linkage in the past and it consistently led to worse runtime performance (LLVM optimizes less well) and it had issues on some platforms (MingW for example did not support it very well). What I'm saying is: The mangling scheme should be flexible enough to support the compiler's current behavior.

Let's keep it respectful please.

Many platforms only support a very limited character set. If we go to the trouble of defining a mangling scheme, I'd prefer it not having platform specific rules, especially since there's a good alternative that doesn't make the common ASCII-only case more complicated.

LLVM does not do any complicated mangling, as far as I know, and I wouldn't want to rely on it doing so either.

As far as I know, linkers are the only reason for having name mangling in the first place.

That's what this RFC is about. Just like Itanium mangling is not part of the C++ standard, this mangling scheme would not be part of the Rust language definition. I'll make sure to include this explicitly in the summary, should we decide to go forward with it.

1 Like

As the text above says: It's not a strict requirement (we've gotten by without it so far after all) but I still think it's a really nice property for an encoding. It also kind of falls out of "decodability" anyway -- and decodability is pretty much a hard requirement for a new scheme because otherwise we could just stick to what we have now.

Yes, that was my plan. Rust codebases should be able to just pull the library in and it should be written in a way that is easy to port to other languages. This also includes a set of "test vectors" that allow to easily verify that an implementation works as expected.

4 Likes

One thing I like about the current scheme is that any C++-aware tool can successfully demangle a Rust symbol about 80% of the time, or at least produce something I can recognise if I squint at it.

How compatible is this new scheme with unmodified C++-expecting tools?

Hm, I found a case that the scheme described above doesn’t cover. There’s no way to disambiguate the two bar functions in the following snippet:

fn foo() {
  {
    fn bar() {}
  }
  {
    fn bar() {}
  }
}

This would need to be handled in some way.

2 Likes

See pre-RFC: a new symbol mangling scheme - #12 by michaelwoerister above. It's close but not entirely compatible. If someone can come up with an elegant way of making it compatible, I'd be interested in hearing about it.

For any definitions inside expressions you’ll need the same mangling trick as what closures do.

To reduce the regular case’s length, we could not annotate the first item of the same name. Otherwise everything defined inside an expression would have a trailing $0

1 Like

Yes, I think it will be something along those lines (it’s what the compiler does internally). Itanium mangling does something similar too: http://refspecs.linuxbase.org/cxxabi-1.83.html#mangling-scope

Ideally we’d get by without $ characters. I’d like to get rid of them.

But then, the current scheme is afaik different on Windows, and the new scheme would be less compatible with e.g. windbg (although I don't know if the current scheme is actually compatible with windows debuggers).

As far as I know, the current scheme is the same for all platforms. I don't think we do anything Windows-specific when targeting MingW or MSVC. If windbg works at the moment, it is probably because it takes function names from debuginfo instead of from symbol names. Debuginfo would not be affected by changing the mangling scheme.

EDIT: Here is the current symbol mangling implementation: rust/src/librustc_codegen_utils/symbol_names.rs at 1c5e9c68ea6c76fe400528de17ebe03e338bac68 · rust-lang/rust · GitHub It does indeed not seem to include anything platform specific.

A related case is hygiene — right now we don't have hygiene on items, but we expect to add it. I think you might want to consider using the same approach used by the "def-path" -- basically, you "erase" the full item name into some string. That string is normally unique— but if not, you disambiguate it with an index within the parent.

If this is not about ABI, I think it would be a better deal to just follow Itanium C++ mangling. We’d at least benefit from instant support by existing tools.
Coming up with elaborate encoding scheme, “inspired by” but not compatible with C++ demangler would seem counter-productive, as it’ll make things worse than the current state in short-to-medium term (that is, until Rust displaces C++ and everyone adds support for our encoding (/s)).

1 Like

Itanium doesn't support some of the things we need: unicode support, different namespaces, and Rust's basic types, for example.

3 Likes