Pre-RFC: Defining function aliases

Problem

C supports aliasing of function names. To be precise, GCC defines function attributes that can be used to define function aliases with different linkage types. Rust does not, to my knowledge, provide a similar feature.

While translating libxml2 with the C2Rust transpiler, I came across the following pattern relying on function aliases:

int sum(int a, int b);

extern __typeof (sum) sum__internal_alias __attribute((visibility("hidden")));
#define sum sum__internal_alias

// define sum__internal_alias
int sum(int a, int b) {
    return a + b;
}

int (*const int_ptr)(int, int) = ∑

int call_sum__internal_alias(void) {
    return sum(2, 2);
}

#undef sum
extern __typeof (sum) sum __attribute((alias("sum__internal_alias")));

int (*const ext_ptr)(int, int) = ∑

int call_sum(void) {
    return sum(2, 2);
}

int return_one(void) {
    return int_ptr == ext_ptr;
}

The above is condensed version of the pattern used in libxml2 to optimize calls when building a shared library. This lets internal calls (i.e., calls that do not cross a module boundary) avoid indirection through the global offset table to increase performance. See https://github.com/GNOME/libxml2/blob/mainline/elfgcchack.h and its uses for details.

My colleagues and I attempted two approaches to automatically translate this pattern into Rust:

First Attempt: Function Wrappers

For the purposes of translating C to Rust, we considered emitting wrapper functions for each alias. While this adds a level of indirection when calling a function alias, it has the advantage of letting us control the visibility of each alias. Unfortunately, the function wrapping approach would not fully match the semantics of aliased functions in GNU C which requires that function pointers are equal for aliases.

Second Attempt: Function Attributes

Rust includes two possibly useful function attributes:

  • link_name specifies what symbol to import for a given function inside an extern block. This lets us rename an extern function locally (i.e., inside a single module). Unfortunately, what we want here is a way to assign two names to the same function globally.
  • The symbol names in the compiled output of a crate are derived from the names used in the source code by default. The export_name attribute exports a function under a different name in the compiled output. While it is possible to specify multiple exported names, it does not seem possible to export a function under different names as only one of the export_name attributes has any effect.

First Solution: Extending the export_name Attribute

Since the export_name attribute comes close to the functionality we seek, we could extend it to allow multiple uses of the attribute to take effect thus exporting a function under multiple aliases like so:

#[no_mangle]
#[export_name("sum__internal_alias"]          // use visibility of fn definition by default
#[export_name("sum", visibility = "default")] // explicit visibility (default|protected|hidden)
pub(crate) unsafe extern "C" fn sum__internal_alias(a: i32, b: i32) -> i32 {
    a + b
}

Having to specify the exported name twice is clunky so we could optionally change the semantics of export_name to also export the function symbol under is source-level name by default. Since export_name does not create a source-level alias, this solution would require the C2Rust transpiler to track and resolve calls to aliased functions inside the same translation unit.

Second Solution: Adding an alias Attribute

Adding a brand new alias attribute to introduce aliases at the source level is another possibility. In that case, we could simply emit the following Rust to create an alias:

#[no_mangle]
#[alias("sum")]
pub(crate) unsafe extern "C" fn sum__internal_alias(a: i32, b: i32) -> i32 {
    a + b
}

In this case, one would be able to write Rust code that calls sum from anywhere else in the same crate. This approach also minimizes the additional complexity of the C2Rust transpiler and refactoring tool. The alias attribute should support an optional visibility key in this case as well.

We are seeking input on these suggestions; if there is some other way of accomplishing our goals, please let us know. In case there is consensus around a particular solution, we're happy to contribute a patch. Looking forward to hear everybody's thoughts and suggestions!

1 Like

:+1: in general. I'm weakly in favor of a separate attribute, because aliases are unsupported on some platforms (e.g. Darwin), so it's useful to distinguish them from the functionality that works on all platforms.

(Honestly, I don't know why LLVM doesn't support aliases on Darwin – I can't think of any aspect of the assembly syntax or object format that would keep it from "just working". But Darwin uses LLVM as its native compiler, so that also means that no existing C code uses alias on the platform. Since you're portraying this feature as mainly a compatibility hack for translating C libraries, there wouldn't be much point trying to extend it to do things C can't do.)

If you just want this for making a C-to-Rust translation work, I don't see why you can't just do delegation in the obvious, boring way

unsafe extern fn sum(..) {
  sum__internal_alias(..)
}

and let the inliner deal with it for you. In my experience, linker aliases are added as optimizations, and I don't see how emitting extra function calls, and assuming trivial inlining deals with it, is any different.

I would expect that, in a C->Rust translation, you wouldn't really care about what paying the cost of the extra function call, or the increased code size, since you're going to have to refactor that anyway (and almost certainly lost whatever wins that that hack got you).

Relatedly, what's Rust's behavior for internal calls within a shared object? This doesn't strike me as the sort of optimization you'd want to need to do in Rust... but, seeing as Rust mostly believes in static linking, I could believe we do the unreasonable thing.

1 Like

The reason that function wrappers are not viable is because C code that relies on a function and its alias to have the same address would break if we were to take that approach (see return_one in the first code snippet). As I wrote (apologies if this was not sufficiently clear):

I'm pretty sure this is getting into the type of thing that is verboten in ISO C[1]. I believe your return_one example is absolutely not guaranteed to work. In fact, I think ISO C implies that in certain situations involving dynamic linkage, &f == &f may be false or even UB, though I can't cite the relevant ISO standard sections off the top of my head.

GNU C extension semantics are not something I think Rust should aspire to emulate.

[1] For the most part, Rust inherits a lot of ISO C's semantics by virtue of a de facto "what LLVM does" semantic, until the UCG group finishes their work.

3 Likes

Moreover, this renaming hack is based on How to write shared libraries, which states the following:

1 Like

What kind of code depends on these function pointers comparing equal? Is there a reason anyone would want them to compare equal other than "legacy code"? If the C->Rust translation was being done by a human, would that human want to propose attributes like this instead of changing the code to not make that assumption? How common is code that does this aliasing? How common is code that depends on libraries doing this kind of aliasing?

Unless I'm missing something huge, this seems like a corner case where the original code is simply weird enough that I doubt we want to support it "verbatim modulo transpilers" in the first place.

1 Like

By the way, I've just done some experimentation on my own:

Given,

use ::std::os::raw::{c_int, c_uint};

#[no_mangle]
pub
extern "C"
fn sum (x: c_int, y: c_int) -> c_int
{
    sum_wrapped(x, y)
}

#[no_mangle]
pub
extern "C"
fn sum_wrapped (x: c_int, y: c_int) -> c_int
{
    x.wrapping_add(y)
}

#[no_mangle]
pub
extern "C"
fn equal () -> c_uint
{
    (sum as *const () == sum_wrapped as *const ()) as _
}

#[no_mangle]
pub
extern "C"
fn equal_2 () -> c_uint
{
    let sum = 
        sum as *const () as usize + 0
    ;
    let sum_wrapped = 
        sum_wrapped as *const () as usize + 0
    ;
    (sum ^ sum_wrapped == 0) as _
}

running

rustc -C opt-level=2 --edition=2018 --crate-type=cdylib -o libfoo.so foo.rs && gdb -q libfoo.so --ex 'disas sum' --ex 'disas sum_wrapped' --ex 'disas equal' --ex 'disas equal_2' --batch

yields

Dump of assembler code for function sum:
   0x00000000000009b0 <+0>:	lea    (%rdi,%rsi,1),%eax
   0x00000000000009b3 <+3>:	retq   
End of assembler dump.
Dump of assembler code for function sum_wrapped:
   0x00000000000009b0 <+0>:	lea    (%rdi,%rsi,1),%eax
   0x00000000000009b3 <+3>:	retq   
End of assembler dump.
Dump of assembler code for function equal:
   0x00000000000009c0 <+0>:	xor    %eax,%eax
   0x00000000000009c2 <+2>:	retq   
End of assembler dump.
Dump of assembler code for function equal_2:
   0x00000000000009d0 <+0>:	mov    $0x1,%eax
   0x00000000000009d5 <+5>:	retq
  • With opt-level >= 2, a wrapper and its wrappee do coalesce, at least as long as the ABI remains the same;

  • This "optimization" is however not observed when performing a classic logical comparison between the two function pointers.

  • However, if we (ab)use "arithmetic" operations to perform the comparison, Rust does end up using the actual addresses of the functions, thus making the coalescing observable.

    • Indeed, with opt-level=z:

      Dump of assembler code for function equal:
         0x0000000000000a28 <+0>:	xor    eax,eax
         0x0000000000000a2a <+2>:	ret    
      End of assembler dump.
      Dump of assembler code for function equal_2:
         0x0000000000000a2b <+0>:	mov    rcx,QWORD PTR [rip+0x202586]        # 0x202fb8
         0x0000000000000a32 <+7>:	xor    eax,eax
         0x0000000000000a34 <+9>:	xor    rcx,QWORD PTR [rip+0x202565]        # 0x202fa0
         0x0000000000000a3b <+16>:	sete   al
         0x0000000000000a3e <+19>:	ret    
      End of assembler dump.
      

this.

4 Likes

Maybe I'm missing something, but why can't you just implement the alias as a constant function pointer?

fn sum(a: u32, b: u32) -> u32 {
    return a + b;
}
#[allow(non_upper_case_globals)]
const sum__internal_alias: fn(u32, u32) -> u32 = sum;

fn main() {
    assert_eq!(sum(5, 6), 11);
    assert_eq!(sum__internal_alias(5, 6), 11);
    
    let int_ptr = sum__internal_alias;
    let ext_ptr = sum;
    assert_eq!(int_ptr as *const (), ext_ptr as *const ());
}

(playground link)

I agree with this sentiment in general. But I do think supporting/enabling 1:1 C→Rust translations should be a hard goal for rust, and we should not rule out emulated gnu C extensions out of hand.

That said while translating a moving target C codebase, I find it's helpful to have an automated step before C→Rust which applies a patch or some cleanup, in this case removing the function alias pattern. This patch can usually be reapplied on top of other concurrent changes. Perhaps another solution would be to teach C→Rust to simply resolve the alias directly.

1 Like

I'm totally ok with translating UB-free ISO C99/11 to Rust, since (I think, off the top of my head) ISO C has strictly more undefined behavior than Rust, so there's no real semantic issues there. I think this is a useful goal for converting projects in C to Rust.

GNU C is an unspecified mess[1]. I think trying to emulate its extensions is possible but needlessly excruciating.

This strikes me as a bad idea. I think the value in C->Rust is for C projects that want to move into the 21st century; trying to maintain a translated fork is going to prevent you from doing the sort of invasive refactots you want to do once you're writing Rust (and which you need for performance; I don't think that pointer manipulations are as efficient as reference manipulations, for example).

[1] Anecdata from today: I am so, so done fighting with __attribute__((naked)).

I don't mean to target a moving C codebase over a long period of time, but it's extremely helpful to be able to do testing, benchmarking, and deployments on the new rust code without pausing development on the C side. Extra manual steps means any rust artifact automatically falls behind, or the rust "switch" needs to be thrown all at once but less than fully verified. In my experience this would leave rustification dead in the water.

After everything is working great and the smoke has cleared, you can cut C and refactor your heart out.

:100:

Can you explain why it should be?

1 Like

Hmm, that's fine, but if you own the code, you should probably focus on making it proper ISO C before trying to move to Rust? I think my point of "supporting weird GNUisms is a bad idea" stands.

1 Like

Maybe I'm missing something, but why can't you just implement the alias as a constant function pointer?

In the ELF binary, your two symbols are not equal in value or size. The sum symbol points to the function itself in the .text section, while sum__internal_alias is a function pointer represented as a pointer stored somewhere in .data.

0000000000000000 g     F .text.sum      0000000000000004 sum
0000000000000000 g     O .data.rel.ro.sum__internal_alias       0000000000000008 sum__internal_alias
1 Like

Perhaps another solution would be to teach C→Rust to simply resolve the alias directly.

Sure, but this whole discussion started with aliases that were exported from a shared library, i.e., libxml2. In this particular case, libxml2 exports one alias as a default-visibility symbol while keeping the other hidden (which could be handled internally by C2Rust like you said), but another library might export multiple public aliases of the same global. The latter is not currently supported by Rust, and we'd have to work around that in C2Rust in some very ugly ways.