#[no_mangle] vs modules and pub

Rust uses combination of pub and mod to control visibility of symbols… except #[no_mangle], which sort-of does the same, but in a parallel C universe. I think this is ugly:

  • it’s an extra syntactical noise on extern "C" functions that are meant to be linkable from C,
  • Rust copied C++'s syntax, but did not copy C++ behavior regarding mangling, setting wrong expectations.
  • it’s a gotcha when you write pub extern "C" in the root of the crate hoping to export it (like it works with Rust functions) and it doesn’t work, with no warnings, and unhelpful linker errors, because Rust is happy to create a hybrid of C-compatible calling and C-incompatible linking.

I think Rust would be much more elegant if #[no_mangle] was made largely unnecessary by having sensible defaults for the most common cases where it’s currently used. Last time I grepped a lot of Rust code and found in almost every case falls into one of two categories:

  • mangled extern "C" functions are used as callbacks (function pointers), but they’re actually not expected to have any particular symbol name. Users only care that the symbol doesn’t conflict with anything and that it can be referred to from Rust.
  • no_mangle extern "C" functions are used to export them as public from the crate for purpose of linking with other libraries.

So #[no_mangle] is a special case of visibility and namespacing. Mangled symbols are de-facto bound by Rust’s visibility and namespacing, and no_mangle symbols are public in a global namespace shared with C.

I think there could be implied mangling that nicely bridges C and Rust worlds:

  1. public extern symbols in the root of the create (including pub use) get unmangled symbols,
  2. public extern symbols in non-root modules are mangled as usual,
  3. private extern symbols anywhere are mangled as usual,

The case 1 is what I see as common ground for Rust and C: root of the crate conceptually closest to a global namespace in Rust, so it makes sense for me that public extern "C" symbols there would map to public global symbols in C.

The case 2 I’m unsure about. Some module namespacing is necessary to avoid conflicts. It could either remain Rust mangling, or perhaps the names could be mapped to C++ namespace mangling or C-like mangling with crate & module name prefix (pub mod foo {pub extern fn bar();} -> foo_bar())

The case 3 is for function pointers/callbacks, so it’s fine.

3 Likes

Special casing the root module seems strange; it implies that the root module could get cluttered with callbacks, and that refactoring things into other modules would break things subtly.

Does it depend on where the function is defined, or where it’s made public?

mod thing {
    extern "C" fn callback(...);
}

pub use thing::callback; // mangled or no?
1 Like

It depends where the name is public. The way I see it through C-colored glasses is that:

mod thing {
    pub extern "C" fn foo(...);
}

pub use thing::foo; 

exists as mangle(thing:foo) and mangle(::foo), where mangling of ::name is as simple as exporting name, whereas mangling of namespace::foo is complicated.

  1. This would be backwards incompatible and would therefore, only be available as described, in Rust 2.0.

  2. This only results in people not having to type #[no_mangle] and honestly, if you’re writing interop between C and Rust, having to type #[no_mangle] to fix a linker error is really the least of your problems. Though, it’d be cool if there was a lint that warned about a pub symbol in a root module which isn’t #[no_mangle]. That gets 90% of the way there.

3 Likes

I don’t think it is incompatible in practice, because mangled Rust symbols are magic and unusable, so I think it’s very very unlikely that anybody links to from C to Rust-mangled names. I did not find any code that does this.

We don’t want to do anything that would get in the way of stabilizing the Rust mangling scheme in the future, so that other languages can make use of it as well (see the other thread about Rust-based ABIs for why we might want to do that).

I think mangling controls should remain orthogonal to visibility controls.

1 Like

I’m focusing on pub extern "C" here. Stabilisation of current Rust ABI/mangling is not affected. And if some new third ABI/mangling is invented, I presume it’ll require extern "new-abi" anyway.

I think usability of the language suffers from the ideal of orthogonality here. In practice, for exporting of symbols for C, this orthogonality doesn’t exist. C ABI and mangling are highly correlated, and Rust’s default does not match the 99% case.

If you use crate root then the conflict is simple to trigger, just replace “modules” with “crates” from my example. IOW: have two crates both with an pub extern fn foo(...) {...} at the top of them.

That’s no different than two crates having #[no_mangle] fn foo.

Except pub extern fn foo works right now just fine, no matter how many times you repeat it.

That’s why I mentioned in the other topic checking how it’s used. I’m betting nobody uses it like that. It’s either with no_mangle for export, or private or in a module for callbacks.

If the check is mostly syntactical (and it seems so), you can do this with GitHub - brson/stabworld: Tools for analyzing how the Rust ecosystem is using the language and a custom inspection for GitHub - intellij-rust/intellij-rust: Rust plugin for the IntelliJ Platform, like I did for anonymous parameters: Anonymous parameters usages across crates.io crates · GitHub.

@eddyb explained why it’s backwards incompatible. You just seem to be suggesting that no one does that without any data to back up your claim.

1 Like

Here’s the data I have collected so far: No #[no_mangle]

I’ve previously suggested to make all pub extern unmangled, but there were examples where it’d break. Now I’m suggesting a narrower version, limited to only symbols in root, which I think avoids problems found previously and is compatible.

It’s not compatible across crates.

Unfortunately, I’m not sure what the data that you provided actually means. It’s late and I’m tired, so perhaps I’m just missing something obvious that should tell me what the data means or how I can use it to determine that this feature is worth it’s cost.

Note: even if you could prove that this didn’t break any crates on crates.io through a crater run, it could still break code not on crates.io. And at some point that will matter more and more as people continue to use Rust in businesses.

But more importantly, why do you want to change the language? This seems like it would be trivial to write a clippy lint with a warning or even just write a refactoring tool that would add #[no_mangle] for you based on the lint. The issue I have with the proposal is that it’s too far reaching for such a small problem and the obvious first steps (the lint/refactoring tool) haven’t been obviously tried, or there would be mention of the fact that they were tried and deemed insufficient.

And that’s ignoring the fact that this would create issue when teaching FFI, because mangling and calling conventions would have a special case that interact rather than being orthogonal.

Anyways, the most constructive feedback I have is “go write a lint” and let’s see if it gains traction and people use it and turn it on in all their crates. If that happens, you could argue that it’s become a defacto standard in the Rust ecosystem and should be promoted into the compiler and then once it’s on by default in the compiler, you could eventually promote it to an error and then eventually, change the default in 2.0.

1 Like

Your argument about orthogonality is technically true, but I don’t see it as justification for the design. Identical argument could be made about orthogonality of number of digits and arity of numbers, but it’d be wrong to use that to justify #[no_ternary] u32 as a good design.

Mangling and calling convention are orthogonal, and there can exist Rust-mangled C-exported symbol, but it’s a very exotic combination, much less common than C-mangled C-exported symbol.

And mangling and privacy are also orthogonal concepts. You could change mangling of a private symbol that isn’t exported anywhere, but that serves no practical purpose.

In practice these orhogonal concepts have certain combinations that are much much more useful than the others, and combinations that do nothing but cause problems. I wish Rust recognized that, and had the nice, obvious syntax for the useful combinations, rather than the problematic ones.

One of the sentences in my post mentions orthogonality. It isn’t a requirement and I’m not interested in a philosophical argument about whether it’s always good to have or not in programming language design.

Some of the other sentences in my post are more pragmatic and even make suggestions or ask questions. I’m more interested in discussing those.

It doesn't affect the conclusion of your argument, but cargobomb actually does test some Rust projects on GitHub that aren't on crates.io.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.