Reserved Names for `#[no_mangle]` and `#[link_name]`

Currently in rust, it is possible for any person to define any name in user code by using #[no_mangle]. While this is desirable in general, it can cause some problems. Specifically it allows users to conflict with symbols defined by or used by the compiler or by the toolchain. It also permits users to define mangled symbols. For example, a user could inject a definition for a generic function, for example, by defining _RINvNtC3std3mem8align_ofdE or (on a compiler using Itanium mangling) _ZNSt3mem8align_ofIdE (and, on ELF platforms at least, this would override the weak definition provided by the compiler). I do not believe this specific issue is documented anywhere. It can also provide names reserved by the C++ or C standard which may be dependend upon not existing, or existing in a particular manner, by the respective runtime libraries.

Would it be a reasonable restriction to limit names defined by a program (though merely declaring would be fine).

Specifically, the behaviour is undefined if a program, through the #[no_mangle] or #[link_name] attribute, defines a function or static object with a symbol name that starts with an underscore, followed by a capital letter, or that contains two consecutive underscores (but a program may declare such functions in an extern block). An implementation is encouraged to emit a diagnostic if a violation of this rule is detected. Note that link_name and no_mangle are necessarily unsafe, as they inherently can result in undefined behaviour or (C++) odr violations anyways (the latter by defining a function declared with C language linkage in a C++ Translation Unit with an incompatible signature).

It may also be a good idea to include that (at least a subset of) the identifiers are reserved in a module that uses an attribute macro defined by the standard library. For example, a program compiled with lccc (a work in progress competitor to rustc) that declares a static (or const) called _ZN5alloc5alloc18__global_allocatorRCu3dynIN5alloc5alloc15GlobalAllocatorE and that uses the #[global_allocator] attribute would conflict with declarations (I don't know off the top of my head what the used declarations for rustc are). While this is, clearly, far-fetched that a user would accidentally declare anything with that name (though someone, may intentionally do so), having the ability to make such a program ill-formed would be desirable.

Note: This would require a change that is technically breaking. However, in the current situtation, unintended and problematic side effects can be observed by programs that violate the proposed rules anyways.

3 Likes

I’m not sure what makes this more UB than a program using no-mangle to override a linked C function with an incompatible ABI? Assuming you match ABI and guarantee the same behaviour, why would it not be sound to override a Rust function with an alternative implementation?

Replacing a mangled Rust symbol using link_name is a problem because Rusts mangling scheme is not stable, so the code might break in a future Rust release. It is not undefined behaviour but unspecified behaviour.

Overriding a named C function is unsafe, but not necessarily unsound if used correctly.

The same applies to overriding a mangled Rust symbol, it's just "correctly" is very difficult (maybe impossible) to do.

In C++, (if such was permitted) doing such would be an immediate violation of the one-definition rule (and probably break a million more assumptions the compiler has about the standard library).

https://github.com/rust-lang/rust/issues/28179

I'm referring more to the specific issue that you can violate assumptions about the environment, such as the existance, or lack thereof, of certain functions which C or C++ programs cannot define. That issue is more of the fact you can generally redefine functions that exist (and break stuff that way).

The fact that at any time, an implementation could break code that uses any reserved name, is what I am referring to. Right now, rustc does not use itanium mangled names, but another compiler may, or rustc could in the future (including by providing the option). In which case, code that defines a symbol by the name _ZN4core3mem8align_ofIdE. This means that you cannot rely on stability of such names, so they must be reserved in one way or another. Technically, in the presense of no_mangle and link_name, it is incorrect for a rust implementation to use any name for mangling.

One additional note, re. C functions. According to ISO 9899, a program that redefines any function from the C standard library in a hosted implementation has undefined behaviour. Full stop. In C++, a program that defines a name in the std namespace, other than a template specialization of a standard library template, has undefined behaviour. In these cases, it is legitimately impossible to redefine such names, because merely defining them is UB. I can presume that in rust, something similar could be applied (especially since crate names are "protected" by the compiler against injection).

1 Like

No, it is incorrect for code to guess the name of a rust symbol, just as it is incorrect to guess the location of allocated memory.

There can be completely legitimate reasons to define functions that are named the same as functions the C standard library. The most obvious is to implement the C standard library in C, but for example malloc/free may be defined by a custom allocator like jemalloc. Another valid reason would be to instrument calls of certain functions.

1 Like

Nothing in any rfc I have seen protects any name that is or that can be the name of a rust symbol. At the very most, the current rust mangling rfc protects the current mangling scheme, but I do not believe it explicitly reserves the name. There are rules that make it incorrect to guess the location of allocated memory, but (to my knowledge) no rule exists that make it invalid to "guess" the name of a rust symbol, including by accident.

Be that as it may, this is undefined behaviour (a notable example, if you try to define memcpy in gcc without -ffreestanding, you get this). The case of implemention of the standard library, this is still UB, but valid because it is part of the implementation, so you know how to avoid it being exploited (I have mentioned the privilege of standard library implementations wrt UB before: "Compiler support and standard libraries get to basically have free reign, because if something isn't defined by the lang, they can just add an extension that defines it").

It is statistically impossible to guess the hash part of rust symbol names. Using a known value for the hash part is not allowed as it may and will change for each new rustc version or even for small changes to how the crate is compiled.

Not a few weeks ago, I was debugging some C code that broke because the user defined a custom erf function and GCC had optimized the expression -erf(-x) to erf(x) (breaking the code as the custom erf was not properly even). The C function names are reserved in all hosted environments, so it is only safe to define these functions when you compile with -ffreestanding (to shift the compiler from hosted to freestanding).

1 Like

I think at least a warning would be helpful. There are many libc and POSIX symbols that users could be overriding by accident, and potentially subtly breaking things at distance.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.