Reserved Names for `#[no_mangle]` and `#[link_name]`

InfernoDeity · January 3, 2021, 3:10am

Currently in rust, it is possible for any person to define any name in user code by using #[no_mangle]. While this is desirable in general, it can cause some problems. Specifically it allows users to conflict with symbols defined by or used by the compiler or by the toolchain. It also permits users to define mangled symbols. For example, a user could inject a definition for a generic function, for example, by defining _RINvNtC3std3mem8align_ofdE or (on a compiler using Itanium mangling) _ZNSt3mem8align_ofIdE (and, on ELF platforms at least, this would override the weak definition provided by the compiler). I do not believe this specific issue is documented anywhere. It can also provide names reserved by the C++ or C standard which may be dependend upon not existing, or existing in a particular manner, by the respective runtime libraries.

Would it be a reasonable restriction to limit names defined by a program (though merely declaring would be fine).

Specifically, the behaviour is undefined if a program, through the #[no_mangle] or #[link_name] attribute, defines a function or static object with a symbol name that starts with an underscore, followed by a capital letter, or that contains two consecutive underscores (but a program may declare such functions in an extern block). An implementation is encouraged to emit a diagnostic if a violation of this rule is detected. Note that link_name and no_mangle are necessarily unsafe, as they inherently can result in undefined behaviour or (C++) odr violations anyways (the latter by defining a function declared with C language linkage in a C++ Translation Unit with an incompatible signature).

It may also be a good idea to include that (at least a subset of) the identifiers are reserved in a module that uses an attribute macro defined by the standard library. For example, a program compiled with lccc (a work in progress competitor to rustc) that declares a static (or const) called _ZN5alloc5alloc18__global_allocatorRCu3dynIN5alloc5alloc15GlobalAllocatorE and that uses the #[global_allocator] attribute would conflict with declarations (I don't know off the top of my head what the used declarations for rustc are). While this is, clearly, far-fetched that a user would accidentally declare anything with that name (though someone, may intentionally do so), having the ability to make such a program ill-formed would be desirable.

Note: This would require a change that is technically breaking. However, in the current situtation, unintended and problematic side effects can be observed by programs that violate the proposed rules anyways.

Nemo157 · January 3, 2021, 8:13am

github.com/rust-lang/rust

#[no_mangle] is unsafe

opened 11:36PM - 02 Sep 15 UTC

geofft

A-linkage P-low T-lang T-compiler I-unsound C-bug

On some platforms (at least GNU/Linux, but I hear Windows and several others too…), if you link together two static libraries that both export a symbol of the same name, it's undefined which symbol actually gets linked. In practice on my machine, the first library seems to win. This lets you defeat type/memory safety without the `unsafe` keyword, by having two crates export a `#[no_mangle] pub fn` with different signatures but compatible calling conventions: ``` rust // one.rs #![crate_type = "lib"] #[no_mangle] pub fn convert(x: &'static i32) -> Result<i32, f32> { Ok(*x) } ``` ``` rust // two.rs #![crate_type = "lib"] #[no_mangle] pub fn convert(x: &'static f32) -> Result<i32, f32> { Err(*x) } ``` ``` rust // transmute.rs extern crate one; extern crate two; fn main() { static X: f32 = 3.14; let y: i32 = two::convert(&X).unwrap(); println!("{}", y); } ``` ``` geofft@titan:/tmp/transmute$ rustc one.rs geofft@titan:/tmp/transmute$ rustc two.rs geofft@titan:/tmp/transmute$ rustc transmute.rs -L . geofft@titan:/tmp/transmute$ ./transmute 1078523331 ``` Despite the stated call to `two::convert`, it's actually `one::convert` that gets called, which interprets the argument as a `&'static i32`. (It may be clearer to understand with this [simpler example](https://gist.github.com/geofft/493c5c17bfdd04b97670), which doesn't break type safety.) On at least GNU/Linux but _not_ other platforms like Windows or Darwin, dynamically-linked symbols have the same ambiguity. I don't know what the right response is here. The following options all seem pretty reasonable: 1. Acknowledge it and ignore it. Maybe document it as a possible source of unsafety, despite not using the `unsafe` keyword. 2. Have `#[no_mangle]` export both a mangled and un-mangled name, and have Rust crates call each other via mangled names only, on the grounds that `#[no_mangle]` is for external interfaces, not for crates linking to crates. ("External interfaces" includes other Rust code using FFI, but FFI is unsafe.) This is analogous to how `extern "C" fn`s export both a Rust-ABI symbol as well as a C-ABI shim, and a direct, safe Rust call to those function happens through the Rust ABI, not through the C ABI. I'm pretty sure that all production uses of `#[no_mangle]` are `extern "C"`, anyway (see e.g. #10025). 3. Deprecate `#[no_mangle]` on safe functions and data, and introduce a new `#[unsafe_no_mangle]`, so it substring-matches `unsafe`. (`#[no_mangle]` on unsafe functions or mutable statics is fine, since you need the `unsafe` keyword to get at them.) All of these are, I think, backwards-compatible.

I’m not sure what makes this more UB than a program using no-mangle to override a linked C function with an incompatible ABI? Assuming you match ABI and guarantee the same behaviour, why would it not be sound to override a Rust function with an alternative implementation?

Aloso · January 3, 2021, 12:13pm

Replacing a mangled Rust symbol using link_name is a problem because Rusts mangling scheme is not stable, so the code might break in a future Rust release. It is not undefined behaviour but unspecified behaviour.

Overriding a named C function is unsafe, but not necessarily unsound if used correctly.

Nemo157 · January 3, 2021, 1:21pm

The same applies to overriding a mangled Rust symbol, it's just "correctly" is very difficult (maybe impossible) to do.

InfernoDeity · January 3, 2021, 1:26pm

In C++, (if such was permitted) doing such would be an immediate violation of the one-definition rule (and probably break a million more assumptions the compiler has about the standard library).

https://github.com/rust-lang/rust/issues/28179

I'm referring more to the specific issue that you can violate assumptions about the environment, such as the existance, or lack thereof, of certain functions which C or C++ programs cannot define. That issue is more of the fact you can generally redefine functions that exist (and break stuff that way).

The fact that at any time, an implementation could break code that uses any reserved name, is what I am referring to. Right now, rustc does not use itanium mangled names, but another compiler may, or rustc could in the future (including by providing the option). In which case, code that defines a symbol by the name _ZN4core3mem8align_ofIdE. This means that you cannot rely on stability of such names, so they must be reserved in one way or another. Technically, in the presense of no_mangle and link_name, it is incorrect for a rust implementation to use any name for mangling.

One additional note, re. C functions. According to ISO 9899, a program that redefines any function from the C standard library in a hosted implementation has undefined behaviour. Full stop. In C++, a program that defines a name in the std namespace, other than a template specialization of a standard library template, has undefined behaviour. In these cases, it is legitimately impossible to redefine such names, because merely defining them is UB. I can presume that in rust, something similar could be applied (especially since crate names are "protected" by the compiler against injection).

bjorn3 · January 3, 2021, 1:51pm

No, it is incorrect for code to guess the name of a rust symbol, just as it is incorrect to guess the location of allocated memory.

There can be completely legitimate reasons to define functions that are named the same as functions the C standard library. The most obvious is to implement the C standard library in C, but for example malloc/free may be defined by a custom allocator like jemalloc. Another valid reason would be to instrument calls of certain functions.

InfernoDeity · January 3, 2021, 3:00pm

Nothing in any rfc I have seen protects any name that is or that can be the name of a rust symbol. At the very most, the current rust mangling rfc protects the current mangling scheme, but I do not believe it explicitly reserves the name. There are rules that make it incorrect to guess the location of allocated memory, but (to my knowledge) no rule exists that make it invalid to "guess" the name of a rust symbol, including by accident.

Be that as it may, this is undefined behaviour (a notable example, if you try to define memcpy in gcc without -ffreestanding, you get this). The case of implemention of the standard library, this is still UB, but valid because it is part of the implementation, so you know how to avoid it being exploited (I have mentioned the privilege of standard library implementations wrt UB before: "Compiler support and standard libraries get to basically have free reign, because if something isn't defined by the lang, they can just add an extension that defines it").

bjorn3 · January 3, 2021, 3:05pm

It is statistically impossible to guess the hash part of rust symbol names. Using a known value for the hash part is not allowed as it may and will change for each new rustc version or even for small changes to how the crate is compiled.

jcranmer · January 3, 2021, 6:16pm

Not a few weeks ago, I was debugging some C code that broke because the user defined a custom erf function and GCC had optimized the expression -erf(-x) to erf(x) (breaking the code as the custom erf was not properly even). The C function names are reserved in all hosted environments, so it is only safe to define these functions when you compile with -ffreestanding (to shift the compiler from hosted to freestanding).

kornel · January 4, 2021, 12:34pm

I think at least a warning would be helpful. There are many libc and POSIX symbols that users could be overriding by accident, and potentially subtly breaking things at distance.

system · April 4, 2021, 12:34pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
No_mangle crate-level attribute language design	9	1163	December 16, 2021
Pre-RFC: unsafe attributes language design	40	3141	January 9, 2023
#[no_mangle] vs modules and pub language design	18	6056	March 25, 2019
Precise semantics of `no_mangle`? compiler	4	22744	March 25, 2019
No #[no_mangle] language design	17	29060	March 25, 2019

Reserved Names for `#[no_mangle]` and `#[link_name]`

Related topics