A Stable Modular ABI for Rust

robinm · June 9, 2020, 7:51am

f libC call function of libA, it depends on libA. If libB call function of libA, it depends on libA. If libB also calls function on libC, libB also depends on libC. I don't see how it is not covered by my initial post.

My proposal didn't say that you can have only one ABI at a time. Maybe it wasn't explicit enough, but the idea is:

For each dependency, you consume it globally with a single ABI.
Each dependency can be consumed using a different ABI.
When compiling a library crate, all the public function are going to be compiled using a given API (and the consumer must match this API). All structs that can be consumed by those public entry points must be compatible with that API, otherwise it's a compile error. The source code of the entry points should be ABI agnostics, but once compiled, they are compiled for consuming a given ABI.
A given library can be compiled multiple time with different flags to create multiple binaries, each one exposing a different ABI.
The public ABI of a compiled library doesn't need to match the ABI on the consumed dependencies.

Example:

LibD when compiled expose the ABI D. LibA depends on libD, and when compiled exposes the ABI A. libB depends on libA, and when compiled exposed the ABI B. LibC depends on libA and libB. libA is consumed using the ABI A, and libB with the ABI B. LibC when compiled exposes the ABI C. Finally our binary depends on libB and libC and consumes them using the ABI B and C respectively.

                                 /-> libB <- ABI B <-+--------------------\
libD <- ABI D -> libA <- ABI A <+                     \                    +-> binary
                                 \---------------------+-> libC <- ABI C -/

CAD97 · June 9, 2020, 8:13am

There's one big small problem with "just globally setting ABI/call convention/repr of all entry points": #[repr(Rust)] isn't completely unspecified.

If I have

type ErasedPtr = ptr::NonNull<c_void>;
enum ErasedMaybePtr { Ptr(ErasedPtr), Null }

I can assume that sizeof(ErasedMaybePtr) == sizeof(*mut c_void) and that transmute between ErasedMaybePtr and *mut c_void is sound.

If this type is used at the API boundary, it would be incorrect to compile it with an ABI/repr that doesn't provide this documented and guaranteed behavior of #[repr(Rust)].

robinm · June 9, 2020, 8:17am

I absolutely agree.

When using transmute, you used an unsafe block, but you didn't validated the invariant. If transmute was a safe function, and if you compiled your crate with an ABI that doesn't provide that guaranty, the compiler would have stopped you.

ckaran · June 9, 2020, 2:18pm

You're right, my mistake! Somehow I got it lodged into my brain that you could only specify one ABI at a time for all of the dependencies, which is definitely not what you said.

Now that my mistake has been solved, I like your proposal. I'm not 100% sure how it will be handled within the compiler; that is, do we invent one ABI to rule them all, or do we have a complete graph, where the compiler directly translates from one ABI to another?

I still think that a hub and spoke topology (one ABI to rule them all) is going to be the best one. It doesn't need to be a 'real' ABI; as far as I know, LLVM bytecode doesn't directly run on any hardware, but it works well as the hub that connects high level languages to real hardware.

If we do decide to go with the hub and spoke model, what needs to be in the ABI spec?

By the way, I propose that this hub ABI be officially called OneRing.

ckaran · June 9, 2020, 2:50pm

The only other question I've got about @robinm's proposal concerns debugging; I don't know enough about DWARF and other debugging formats to know if they are ABI agnostic, so I don't know if they can handle arbitrarily defined ABIs, which could be problematic if you're debugging a function that is internal to a crate, and could therefore (under the proposal) have an arbitrary ABI. I know that you can debug rust code under rust-lldb today (I use it regularly), but I don't know if something special had to be done to make that happen. I also don't know if something like OneRing would make it possible to automatically debug code that has an ABI that was unknown to the debugger when the debugger was originally compiled.

bjorn3 · June 9, 2020, 4:26pm

Yes, at least DWARF is ABI agnostic. You have to manually specify the exact location for every function parameter (register, dereference of register, ...) and type field (offset).

robinm · June 25, 2020, 8:12pm

I discovered that there is an existing crate that generates Rust to Rust stable ABI. I think it's relevant to the discussion.

daboross · June 26, 2020, 8:46am

Just noting, this was mentioned in the first post:

(abi-stable-crates is the repository for the abi-stable crate)

The big downside to this crate, in my opinion, is that it requires annotating every type used and using non-std equivalents of std types like Option, Result, etc.

robinm · June 26, 2020, 10:32am

Whoops! I read it, and forgot it, there re-discovered the crate i another thread!

mzabaluev · June 27, 2020, 12:10pm

The recurring question about developing plugins as dylib crates got me to think that any widely adopted ABI stability solution in Rust ultimately has to aim for an ABI-stable subset of std, installed as a shared library with stable soname, symbols, and possibly stable metadata for generics, inlines, etc. Otherwise, any dynamically loaded binary needs to statically link a subset of std code that it uses, resulting in image bloat and possible opportunity costs in poorer branch prediction, cache misses etc.

Similarly, popular crates shared by many plugins would need to be provided in ABI-stable variants, either by the crate maintainers themselves or as third-party respins.

ckaran · June 27, 2020, 1:52pm

I see your point, but I think that might have its own issues, mainly with std changing over time in slightly incompatible ways, which leads to versioning hell (think of a patch release that fixes a bug, but which breaks user code that worked around/with the bug). That said, shared libraries are necessary to reduce bloat. My suggestion is that we borrow from Apple's bundles idea, and develop a method of adding to the set of libraries at all times, while distinguishing different released versions.

The issue with this is that most operating systems aren't set up for working with bundles. A way around the issue is to adopt a filesystem-in-a-file approach. The 'filesystem' consists of a concatenation of loadable libraries. The individual libraries are serialized using something like Flatbuffers, which makes loading much easier; you jump to an offset within the file, and load the library as if it were an ordinary file. Alternatively, something like CBOR could be used, but that would require the loader to do a certain amount of copying.

Updating the 'filesystem' file would just involve concatenating the new version to the file, and updating a look up table. Additional things could be added as well (digitally signing individual libraries, etc.), but as a minimum viable product, a serialized dictionary of bytestrings is probably good enough.

mzabaluev · June 27, 2020, 2:12pm

This sounds suspiciously like the composable ostree filesystems used in Flatpak. In fact, this could be a way of providing the Rust shared lib environment exactly up to the compiler version that the application was built with, as required by the application's manifest. Different versions of Rust shared libs can be installed side by side as needed for the applications. The rest of the world is not there yet, I think; maybe Microsoft's SxS, the most underappreciated Windows feature out there, could be similarly utilized?

ckaran · June 27, 2020, 2:19pm

Thank you for the link! I'd heard of Flatpak, but had never bothered to look into it before. And I agree, ostree looks like a much more advanced version of what I was thinking about. It might be worth investigating what it and all the other solutions out there do before settling on an ABI.

isaac · June 28, 2020, 9:35pm

A small note: Flatpak is a fairly complex project implementation-wise, and has had quite a few security vulnerabilities. If a similar system were implemented for distributing Rust, special care would have to be taken.

My summary of past few posts, correct me if I'm missing something:

Multiple ABIs can be defined among multiple libraries; the compiler should be able to 'hoist' functions across ABI boundaries. Rusts main ABI was proposed to be named 'OneRing'.
The ABI information for each lib is embedded in that lib's binary, along with other information.
A subset of the Rust standard library should be ABI compliant - different libs might use different versions of std - different versions of dylibs should be bundled, flatpak'd or the like.

I'm working on a revised spec incorporating the wonderful discussion we've had; I hope to release it soon.

isaac · June 30, 2020, 7:06pm

Here's an interesting article semi-relevant to the discussion. Specifically this quote:

Is C, the language the kernel is for the most part written in, being displaced by the likes of Go and Rust, such that there is "a risk that we're becoming the COBOL programmers of the 2030s?" Hohndel asked. "C is still one of the top 10 languages," answered Torvalds. However, he said that for things "not very central to the kernel itself", like drivers, the kernel team is looking at "having interfaces to do those, for example, in Rust... I'm convinced it's going to happen. It might not be Rust. But it is going to happen that we will have different models for writing these kinds of things, and C won't be the only one."

Would a stable ABI increase the possibility of Rust attaining the position necessary for use in the Linux kernel?

josh · June 30, 2020, 7:27pm

Not for the Linux kernel, no. Linux is built from source, and the modules and the kernel will generally be built together. If you upgrade the version of Rust, the next time you build the kernel you'll use the new Rust; the problem of linking old Rust and new Rust together doesn't apply.

It is important to have a robust API/FFI, making it easy to glue C and Rust together, and pass around safe data structures like counted buffers without writing a lot of repetitive unsafe glue code.

HeroicKatora · June 30, 2020, 7:28pm

I don't think so. It already has the necessary C-abi for direct interfaces, the main concerns as far as I remember were bootstrapping the build system, integration into the kernel and a generally immature ecosystem that is suitable for the particular tasks (many no_std use cases in particular only being attainable after 1.36 or even later).

zicklag · June 30, 2020, 7:31pm

I think the stable ABI would make a bigger difference for the feasibility of making a new kernel in Rust, if I understand correctly. Not that the unstable ABI has stopped people like the Redox folks from doing it.

ckaran · June 30, 2020, 7:44pm

Good point, and I 100% agree that special care should be taken! I view all of the container formats out there as good research subjects; we look at what is available, and either adopt one that is already out there, or design a new one using what we learn from what is available.

kornel · June 30, 2020, 8:14pm

How about defining a stable ABI subset for "easy" parts of Rust, such as non-inline functions, non-generic structs?

Then std could be split internally into std-abi-stable and std-hard-parts. Crates could link to std-abi-stable dynamically to reduce bloat, while linking std-hard-parts statically. I imagine over time, as more ABI and std features are set in stone, more things would be moved to std-abi-stable part.

For example, path functions in libstd are generic for P: AsRef<Path>, but std already uses a pattern of

fn join<P>(&self, p: P) { self._join(p.as_ref()) }

to reduce generics bloat and defer implementation to non-generic _join method. That non-generic implementation could be moved to an ABI-stable library without need to support generics in the ABI.

This is similar to how C libraries use macros and inline functions in the .h files. The header files are the API, but not everything is in the ABI. The ABI is only for whatever these macros expand to.

Topic		Replies	Views
Using Swift ABI from Rust	23	10043	September 12, 2019
When is the ABI stable? compiler	16	6152	September 21, 2019
Cross-language safer ABI based on Rust?	33	5425	March 25, 2019
When can we have rdylib? language design	23	3778	July 10, 2021
A rust -> rust declarative ABI tools and infrastructure	5	1423	January 1, 2022

A Stable Modular ABI for Rust

Related topics