A Stable Modular ABI for Rust

I'm obviously missing context somewhere, but what's T-Lang? :slight_smile:

The Rust Language Team, often abbreviated T-Lang after the team labels used on GitHub, which all start with T-.

2 Likes

If y'all don't mind me asking, what's the 'new process'?

1 Like

Note I am not part of T-Lang. I know about this by following the discussion on zulip.

https://hackmd.io/StXzJPw7SriuM4COL_YfEw?edit

This is a draft that Niko posted.

1 Like

I thought back about what I wrote, and I think it can be made even easier.

Inside a crate, the compiler should be able to use whatever ABI it considers to be the best. There is only one place where the ABI is important: the interface between two libraries/executable. The compiler could choose to use the Rust ABI or the one of the interface, or a mix, whatever is best for the code that isn't part of the interface.

There are two things that cross the interface between two crates: functions and structs.

If the consumed library is in source code form, then there is no need to specify the ABI. The compiler should be able to use whatever ABI it consider best, just like if both the consumer and the consumed were a single crate.

If the consumed library is a binary, the consumer wants to match it's ABI of the consumed library. I don't see any motivation to have different entry points that would have different ABI¹. So I think the consumer should be able to select the ABI of the consumed library with an additional flag in Cargo.toml in the dependency section. I think this should be required for consuming pre-compiled libraries (but maybe defaulting to the C abi is better). All struct that would cross the ABI (aka all struct that are used as an argument of a function of the external library in binary form) should be explicitly marked with #repr(...) and the ABI should match the one specified in the Cargo.toml (otherwise it should be a compilation error).

¹ Note: if for some reason, a library libA must have some of its public function consumed using the ABI B, and the rest using the ABI C, it is always possible to split it in 3: libA_core would contains all the functionalities, but shouldn't be consumed directly, libB would be a thin wrapper over the part of libA_core that would be consumed with the calling convention B (and depends on libA_core), while libC would be a thin wrapper over the part of libA_core that would be consumed with the calling convention C (and likewise would depends on libA_core).

Finally, when creating a binary (either a static or dynamic library) from the source code of a library, we should be able to specify the ABI of all entry points (the public functions and the public struct) as a cargo/rustc flag. If a struct has a #repr(...) it should be a compiler error. This make it possible to create different binaries, each with their entry points using a different calling convention, without having to change anything the source code.


To sum-up:

  • the only places where the ABI is important is the interface
  • the ABI of the function shouldn't be specified in the source code. Only the ABI of struct consumed by other libraries pre-compiled in a static or dynamic library
  • the compiler should be allowed to use whatever ABI it wants internally (either match the ABI of the interface, using the Rust ABI, or a mix…)
  • the ABI of a pre-compiled library (static and dynamic library) should be set with a rustc/cargo flag when compiling the library, and would be common for all entry points of the library (you can always split a library to have different ABI for different entry points). This flags would control both the ABI of the public function and the public structs.
  • the ABI when consuming a library in source form, should be left to the compiler (it is not considered an interface)
  • the ABI when consuming a pre-compiled library (static or dynamic library) should be set in the Cargo.toml, as an additional option in the dependency section, and all structs passed as arguments to those function should have a matching #[repr(...)] (otherwise it's a compilation error).
  • the only place where it is allowed (and required) to add #[repr(...)] is on struct that are used as arguments of pre-compiled libraries.
6 Likes

Two (possibly dumb) questions:

  • How do you write glue code? It looks like libA_core is unable to call to libB or libC, so if libB tries to pass through to libC, it can't be done.
  • Would this prevent the compiler from doing things intelligently? Under the earlier suggestions, the code (including the representation annotations) are available to the compiler from the start (they are likely to be in the same crate, if not in the same module), whereas this proposal sounds like the compiler will only be able to do linkage-level optimization tricks. If the compiler has full information and access right from the start (with the source code), it may be able to choose a layout in memory that is in the intersection of both ABIs1, making the translation a zero-cost abstraction.

1The example I'm thinking about involves byte alignment of certain types. Some ABIs require alignment along some byte boundary, whereas others will permit misalignment. If the compiler is aware that some pair of objects are going to be interrelated in some manner, it may choose to align everything along the byte boundary that both ABIs find acceptable.

#[repr] is still useful even in pure statically-linked Rust for unsafe code.

3 Likes

@robinm But as a more fundamental question: if you compile a crate that links against a distributed-as-binary Rust library, how does the compiler know what API and ABI that library has?

Should the compiler determine it from metadata embedded into the library binary, like today? That could work, but, the metadata format is currently an implementation detail, which is possible since you can currently only link against binaries compiled with the same compiler version. This format would have to be stabilized.

Or would there be some kind of "interface crate", like a -sys crate or a C header file, that sets out the API and ABI in source form?

That's right, I should have said that #[repr(...)] should be used for manipulating anything externally defined binary object (libraries, network packets, blob, …).

For optimization purpose, it could also be needed to specify things (like alignment) even for struct only used purely internally, but I think this is a much rarer need (that still need to be supported).

Both libB and libC are just glue code. Let say that you have foo() that need to use the ABI B, and bar() that needs to use the ABI C. libA_core would define both foo() and bar(), then libB would re-export foo() with the correct ABI (ABI B) and libC would re-export bar() with the ABI C. At most (if the compiler cannot do it explicitly), libB and libC would just convert the code from a given ABI to something that isn't based on a specific ABI. It's a bit like the current crate that just create an unsafe -> safe wrapper around C libraries.

I don't think so. It may require the cooperation of the build system however. If your code consume something form libfoo that use the C ABI, and the sending it in libbar that also use the C abi, it should be possible for cargo to tell rustc to use the C ABI everywhere, instead of converting from C ABI to Rust ABI for internal use, then doing the reverse (Rust -> C ABI) when sending it to libbar.

I cannot answer this question correctly, because I don't have enough knowledge on how it is done today for the various existing ABI. From my understanding it needs to be passed explicitly.

In any case, if we want to have Rust libraries that use the full type system, a Rust-compatible ABI needs to be stabilized. Currently the only ABI support by rustc is the C ABI that isn't rich enough. A possible Swift ABI would allow to use the full type system of Rust (according to my understanding of the current discussion), but wouldn't be the more efficient for Rust <-> Rust communication.

Forking from your main points... I actually, I think you just hit on one of the features that a stable ABI should have! What if every binary is guaranteed to have a function with a signature similar to the following:

#[repr(C)]
pub fn meta() -> str;

The return value is a JSON formatted UTF-8 encoded string that contains all the meta information to interface to the binary. Since it is guaranteed to exist, you would just ask the binary for the information on its API/ABI/etc. The ABI information could be parsed out automatically, as could the interface. Tools could be developed that automatically generate the appropriate interfaces for different languages (kind of like swig), etc.

Possible things that could be put into the returned string:

  • A meta section that contains a semver compliant value to indicate what the version of the returned schema is. This could contain additional information about the schema itself, and has nothing to do with the binary that is being discussed. Additional useful information could include a URL to where the schema definition can be found, the issue tracker for the schema, etc.
  • A section that contains meta information about the binary itself. E.g., the version, the authors, the issue tracker, code-signing, etc.
  • The API. I'm personally partial to something that is written in JSON which will permit maximal information to be stored with the API (e.g., in addition to the machine-parseable portion, you could have free-form natural-language documentation for each part of the API stored with the API. You could even have documentation in many different languages, with the end user being able to select the language of their choice). The machine-consumable portion of the JSON spec would probably need to be an offshoot of DWARF or some other debugging spec, instead of ELF. Note that the binary itself could still be fully stripped ELF or some other platform-dependent format; I'm just talking about the contents of the JSON string here.

This idea would need to be very carefully considered as there are numerous implications. First off, badly chosen schema will be a security risk. For example, if the natural-language documentation was written in HTML with full javascript, then the binary could be safe, but the documentation could be dangerous. This isn't something that we normally need to consider. Furthermore, any standards chosen need to have a version number that they are pegged to so that we can unambiguously state what the contents of meta() are, and how to interpret them. Finally, I don't know how much this would increase the size of the binary being produced, which could be a big problem for embedded systems. In short, this is a starting point that requires significant design thought before proceeding.

I don't think I'm being clear enough. Suppose you have library libB and libC, with libB using ABI_B and libC using ABI_C. libC defines the function really_useful_function() that libB would like to use. Using the original proposal, it would be possible to write glue code that translated from ABI_B to ABI_C fairly easily. How do I do that using your proposal?

My premise was "You have a single library, libA that for some reason (I don't see why this requirement could exist) have some entry points that need to be using the ABI B and some other the ABI C". So you split it in 3: libA_core, libB and libC. LibB and libC are exclusively here to be the glue code around the entry points of libA_core. So by construction "libC defines the function really_useful_function() that libB would like to use." cannot exist (really_useful_function() would be part of libA_core).

That's what I thought you were thinking.

The issue is that there is a lot of useful code floating around out in the wild that is unlikely to ever be upgraded and which only has one ABI available. We need some way of continuing to use it from other languages in spite of this fact. FFI can help with this, but both requires the programmer to do the work manually, and requires that the programmer fully understand the ABIs that are in use (which is never guaranteed). That's why comments like A Stable Modular ABI for Rust and A Stable Modular ABI for Rust were brought up. Your proposal doesn't address these use cases.

EDIT: mess-up quotes

You are speaking about something else. What you want to do is to be able to consume libC using the ABI C from libB. How libB is consumed is independent from the ABI of the libraries it consumes.

libC is compiled in a way to expose the ABI C for its consumers. libB consume it using ABI C. libB is compiled in a way that expose ABI B for its consumers. Am I missing something?

I'm not sure if you're missing something, or if we're talking past one another at this point. Your original posts didn't make it clear to me how you would glue libB and libC together; the only thing I saw was that libB could call some functions in libA, and libC could call some functions in libA. I didn't see any method for libA to call functions in both libB and libC because your proposal didn't appear to have a method specifying more than one ABI at a time (if I'm wrong, please correct me). Does this make sense?

I just had a sudden thought (not good when you've been coding for 14+ hours!). How do function pointers cross an ABI boundary? What about closures? I'm specifically thinking about cases where someone tries to pass a closure as a callback between libraries that have different ABIs. If there is an obvious answer that I'm missing, I apologize, but I really have been coding for about 14+ hours today, so I'm starting to miss the obvious.

The only consistent answer to that is that the calling convention of the function pointer (extern "Rust", extern "C", extern "Swift") is part of the function type, the same as layout (#[repr(Rust)], #[repr(C)], #[repr(Swift)]) is part of the struct type.

f libC call function of libA, it depends on libA. If libB call function of libA, it depends on libA. If libB also calls function on libC, libB also depends on libC. I don't see how it is not covered by my initial post.

My proposal didn't say that you can have only one ABI at a time. Maybe it wasn't explicit enough, but the idea is:

  • For each dependency, you consume it globally with a single ABI.
  • Each dependency can be consumed using a different ABI.
  • When compiling a library crate, all the public function are going to be compiled using a given API (and the consumer must match this API). All structs that can be consumed by those public entry points must be compatible with that API, otherwise it's a compile error. The source code of the entry points should be ABI agnostics, but once compiled, they are compiled for consuming a given ABI.
  • A given library can be compiled multiple time with different flags to create multiple binaries, each one exposing a different ABI.
  • The public ABI of a compiled library doesn't need to match the ABI on the consumed dependencies.

Example:

LibD when compiled expose the ABI D. LibA depends on libD, and when compiled exposes the ABI A. libB depends on libA, and when compiled exposed the ABI B. LibC depends on libA and libB. libA is consumed using the ABI A, and libB with the ABI B. LibC when compiled exposes the ABI C. Finally our binary depends on libB and libC and consumes them using the ABI B and C respectively.

                                 /-> libB <- ABI B <-+--------------------\
libD <- ABI D -> libA <- ABI A <+                     \                    +-> binary
                                 \---------------------+-> libC <- ABI C -/

There's one big small problem with "just globally setting ABI/call convention/repr of all entry points": #[repr(Rust)] isn't completely unspecified.

If I have

type ErasedPtr = ptr::NonNull<c_void>;
enum ErasedMaybePtr { Ptr(ErasedPtr), Null }

I can assume that sizeof(ErasedMaybePtr) == sizeof(*mut c_void) and that transmute between ErasedMaybePtr and *mut c_void is sound.

If this type is used at the API boundary, it would be incorrect to compile it with an ABI/repr that doesn't provide this documented and guaranteed behavior of #[repr(Rust)].

1 Like

I absolutely agree.

When using transmute, you used an unsafe block, but you didn't validated the invariant. If transmute was a safe function, and if you compiled your crate with an ABI that doesn't provide that guaranty, the compiler would have stopped you.