Support symbol versioning with #[export_name]

Sorry, this is a resurrection of a 4-year-old thread, since I wanted to ask a few follow-up questions and have some additional design discussions:

At the moment, one fairly large pain-point with using Rust to write C APIs intended for wider use (i.e., pure-Rust .sos) is the lack of proper symbol versioning support. At best, you can find a handful of tricks that are claimed to resolve the issue but usually do not work and fail with hard-to-understand linker errors. Maybe I'm just unlucky or missing something, but I have never managed to get these workarounds to work. Obviously some folks don't need or use symbol versioning but if you are building a library that is going to be used somewhat widely then symbol versioning becomes a necessity.

@zackw (sorry for pinging you) said that LLM does not support __attribute__((semver)). This does appear to be true, but LLVM does understand raw .semver ... ASM instructions in the context of LTO as far as I can see (i.e., it doesn't see them as "opaque ASM instructions"). For instance:

I couldn't find any real documentation about it, but it seems like .symver foo1,foo@A and .symver foo2,foo@@A are fully supported? Is this not sufficient for what we would need to have #[export_name] support symbol versioning?

I think that the API proposed in the above thread by @zackw looks reasonable, and I would be willing to work on it if there aren't any other issues I've missed. Here is the previously proposed API (using the new unsafe(...) wrapper):

#[unsafe(export_name = ("foo", version = "LIBX_1.0"))]
pub fn old_foo () {}

#[unsafe(export_name = ("foo", version = "LIBX_2.0", default_version))]
pub fn new_foo () {}

#[unsafe(export_name = [
    ("bar", version = "LIBX_1.0"),
    ("bar", version = "LIBX_2.0", default_version),
    ("foobar", version = "LIBY_0.5")  // an obsolete fork
])]
pub fn i_am_many_bars () {}

EDIT: I think #[unsafe(export_name = (name = "foo", version = "LIBA_1.0"))] is maybe a little better, if a little verbose...

1 Like

I'd certainly encourage you to try working on this!

I don't trust the LLVM back-end support for symbol versioning to be adequate, but the thing is, it's never going to become adequate until there's actual demand from front ends. The situation right now is that all the C libraries that use symbol versioning have kludges that work for them, usually at the price of giving up on LTO, and they're prepared to live with that (honestly I would be scared to turn on LTO for libxcrypt even if the symbol versioning weren't an obstacle, because it's crammed full of code written by cryptographers, who historically have taken the attitude that if the C standard says their code has UB then that's a bug in the standard). So there's no demand. You could be the demand :wink:

4 Likes

Glad to be a guinea-pig! :wink:

I did see some comments about -Wl,--wrap and LTO issues, so I wouldn't be surprised to hear that things might still be half-baked but at least there is something there, I guess? I can try to hack something together and see how far I get (though I guess I would also need to submit an RFC for this).

For context, I am working on a Rust library that I expect to get quite heavy use from Go and C users via a cdylib so I really would like to have symbol versioning as soon as possible (otherwise we're back to endless incompatible SOVERSION fiddling).

Just to add some notes from my previous attempt so they don't get lost to the sands of time. You can kind of do this with global_asm!(".symver func,func@@LIBFOO_1.X") today, it just has a few pretty large caveats:

  • The primary use-case for symbol versions doesn't appear to actually work. If you try to define a new function that implements the old behaviour for compatibility, global_asm!(".symver func_old,func@LIBFOO_1.Y") doesn't actually create the symbol versioned symbol. (This is the furthest I got when I last looked into this.) Basically, you can only define symbol versions if the symbol name is the same AFAICS.
  • You always need to configure a custom (dummy) version script with cargo:rustc-cdylib-link-arg in build.rs. Linkers prioritise .symver so this is only necessary to create version nodes for the versions you plan to use (#[export_name] could add these automatically in this proposal).

Some folks online suggest using -Wl,--wrap=<symbol> or -Wl,--defsym=<sym1>=<sym2> but I couldn't get any of that to work at all.

EDIT: Actually, I managed to get it to work with the following caveats:

  1. Make sure you configure the version dependencies properly. If you forget to do this, the versioned symbol name for the old function gets silently dropped(?).
    LIBPATHRS_0.2 { local: *; };
    LIBPATHRS_0.1 { } LIBPATHRS_0.2;
    
    combined with
    #[no_mangle]
    pub extern "C" fn foo() {
        eprintln!("foo@LIBPATHRS_0.2 (default)");
    }
    global_asm!(".symver foo, foo@@LIBPATHRS_0.2");
    
    #[no_mangle]
    pub extern "C" fn __foo_v1() {
        eprintln!("foo@LIBPATHRS_0.1");
    }
    global_asm!(".symver __foo_v1, foo@LIBPATHRS_0.1");
    
    will give you the right symbols. However...
  2. AFAICS the internal symbol __foo_v1 is still exported even if you use local: *; in the version script (i.e., it shows up in objdump -TC and nm -D).
1 Like

Rustc already generates version scripts for symbol visibility. It should be possible to use them for the intended use case of defining symbol versions without depending on LLVM at all.

Yeah, LLVM support should only be needed to make LTO work correctly. (Link-time optimizers that don't understand symbol versioning are liable to miscalculate which definitions are visible outside the library being linked.)

A poor workaround is to version your symbols manually and then add the unversioned function symbol declaration in your provided header with __attribute__((alias)). That is, in C terms, instead of

__asm__(".symver xyz_old,xyz@VER_1");
__asm__(".symver xyz_new,xyz@@VER_2");

extern void xyz_old(void);
extern void xyz_new(void);

you have

extern void xyz(void) __attribute__((alias("xyz_new")))

extern void xyz_old(void);
extern void xyz_new(void);

This obviously loses any benefits of using the common symbol versioning scheme, such as still being able to link to xyz_old via the name xyz given the correct symbol versioning map file. (AIUI, alias will always result in directly linking to the impl function instead of a name/tag symbol pair.)

While it does feel like Rust should be able to emit ELF versioned symbols, and #[export_name] is sufficiently target-specific to allow using the @ symbol manually, it's worth it to explicitly note that Rust tries to be cross-platform, and the PE format (Windows) doesn't support versioned symbols in any way simply compatible with ELF's symbol versioning, although import libraries can accomplish a similar thing, IIRC.

1 Like

One (ugly) option would be to call it elf_version or something similar.

For what it's worth, I've done a reasonable amount of testing using a similar setup to the one I described above and have yet to run into any actual issues. @zackw did you have a particular concern in mind that I should test?

I'm testing with a crate with a few dozen functions with several versions per symbol and it does work with everything I've tried so far (there are some sharp corners where if you give the function a different name to the symbol then the function gets exported as well as the versioned symbol name, but this is one of those things that I suspect is difficult to solve outside of rustc because I think this is an issue with the version script being generated by rustc -- I have private: * in my version script).

1 Like

#[unsafe(export_name = ("foo", elf_version="LIBX_1.0"))] and #[unsafe(export_name = ("foo", elf_default_version="LIBX_2.0"))] aren't too ugly for me, and they do make it clear that this is something ELF-specific.

It's been a while, but I recall these being the major issues:

  • Symbols having the wrong visibility after linking (e.g. your __foo_v1 getting exported when it shouldn't have been).
  • Calls getting resolved to the wrong versions of foo.
  • The compiler thinks a function that's exposed as a versioned symbol, isn't exposed at all, and optimizes on that basis. This can cause assembler or linker errors (e.g. the assembler gets .symver __foo_v1, ... but there is no definition of __foo_v1 available to it). Worse, it can cause runtime crashes (e.g. a caller from outside the library invokes foo@LIBPATHRS_0.2 with the standard calling convention but the compiler thought it could see all the callers of that function so it messed with the set of call-preserved registers).
  • Link-time optimization makes all of these things more likely to happen.
1 Like