A Stable Modular ABI for Rust

I discovered that there is an existing crate that generates Rust to Rust stable ABI. I think it's relevant to the discussion.

1 Like

Just noting, this was mentioned in the first post:

(abi-stable-crates is the repository for the abi-stable crate)

The big downside to this crate, in my opinion, is that it requires annotating every type used and using non-std equivalents of std types like Option, Result, etc.

Whoops! I read it, and forgot it, there re-discovered the crate i another thread!

1 Like

The recurring question about developing plugins as dylib crates got me to think that any widely adopted ABI stability solution in Rust ultimately has to aim for an ABI-stable subset of std, installed as a shared library with stable soname, symbols, and possibly stable metadata for generics, inlines, etc. Otherwise, any dynamically loaded binary needs to statically link a subset of std code that it uses, resulting in image bloat and possible opportunity costs in poorer branch prediction, cache misses etc.

Similarly, popular crates shared by many plugins would need to be provided in ABI-stable variants, either by the crate maintainers themselves or as third-party respins.

4 Likes

I see your point, but I think that might have its own issues, mainly with std changing over time in slightly incompatible ways, which leads to versioning hell (think of a patch release that fixes a bug, but which breaks user code that worked around/with the bug). That said, shared libraries are necessary to reduce bloat. My suggestion is that we borrow from Apple's bundles idea, and develop a method of adding to the set of libraries at all times, while distinguishing different released versions.

The issue with this is that most operating systems aren't set up for working with bundles. A way around the issue is to adopt a filesystem-in-a-file approach. The 'filesystem' consists of a concatenation of loadable libraries. The individual libraries are serialized using something like Flatbuffers, which makes loading much easier; you jump to an offset within the file, and load the library as if it were an ordinary file. Alternatively, something like CBOR could be used, but that would require the loader to do a certain amount of copying.

Updating the 'filesystem' file would just involve concatenating the new version to the file, and updating a look up table. Additional things could be added as well (digitally signing individual libraries, etc.), but as a minimum viable product, a serialized dictionary of bytestrings is probably good enough.

1 Like

This sounds suspiciously like the composable ostree filesystems used in Flatpak. In fact, this could be a way of providing the Rust shared lib environment exactly up to the compiler version that the application was built with, as required by the application's manifest. Different versions of Rust shared libs can be installed side by side as needed for the applications. The rest of the world is not there yet, I think; maybe Microsoft's SxS, the most underappreciated Windows feature out there, could be similarly utilized?

2 Likes

Thank you for the link! I'd heard of Flatpak, but had never bothered to look into it before. And I agree, ostree looks like a much more advanced version of what I was thinking about. It might be worth investigating what it and all the other solutions out there do before settling on an ABI.

A small note: Flatpak is a fairly complex project implementation-wise, and has had quite a few security vulnerabilities. If a similar system were implemented for distributing Rust, special care would have to be taken.

My summary of past few posts, correct me if I'm missing something:

  • Multiple ABIs can be defined among multiple libraries; the compiler should be able to 'hoist' functions across ABI boundaries. Rusts main ABI was proposed to be named 'OneRing'.
  • The ABI information for each lib is embedded in that lib's binary, along with other information.
  • A subset of the Rust standard library should be ABI compliant - different libs might use different versions of std - different versions of dylibs should be bundled, flatpak'd or the like.

I'm working on a revised spec incorporating the wonderful discussion we've had; I hope to release it soon.

4 Likes

Here's an interesting article semi-relevant to the discussion. Specifically this quote:

Is C, the language the kernel is for the most part written in, being displaced by the likes of Go and Rust, such that there is "a risk that we're becoming the COBOL programmers of the 2030s?" Hohndel asked. "C is still one of the top 10 languages," answered Torvalds. However, he said that for things "not very central to the kernel itself", like drivers, the kernel team is looking at "having interfaces to do those, for example, in Rust... I'm convinced it's going to happen. It might not be Rust. But it is going to happen that we will have different models for writing these kinds of things, and C won't be the only one."

Would a stable ABI increase the possibility of Rust attaining the position necessary for use in the Linux kernel?

Not for the Linux kernel, no. Linux is built from source, and the modules and the kernel will generally be built together. If you upgrade the version of Rust, the next time you build the kernel you'll use the new Rust; the problem of linking old Rust and new Rust together doesn't apply.

It is important to have a robust API/FFI, making it easy to glue C and Rust together, and pass around safe data structures like counted buffers without writing a lot of repetitive unsafe glue code.

8 Likes

I don't think so. It already has the necessary C-abi for direct interfaces, the main concerns as far as I remember were bootstrapping the build system, integration into the kernel and a generally immature ecosystem that is suitable for the particular tasks (many no_std use cases in particular only being attainable after 1.36 or even later).

I think the stable ABI would make a bigger difference for the feasibility of making a new kernel in Rust, if I understand correctly. Not that the unstable ABI has stopped people like the Redox folks from doing it.

1 Like

Good point, and I 100% agree that special care should be taken! I view all of the container formats out there as good research subjects; we look at what is available, and either adopt one that is already out there, or design a new one using what we learn from what is available.

How about defining a stable ABI subset for "easy" parts of Rust, such as non-inline functions, non-generic structs?

Then std could be split internally into std-abi-stable and std-hard-parts. Crates could link to std-abi-stable dynamically to reduce bloat, while linking std-hard-parts statically. I imagine over time, as more ABI and std features are set in stone, more things would be moved to std-abi-stable part.

For example, path functions in libstd are generic for P: AsRef<Path>, but std already uses a pattern of

fn join<P>(&self, p: P) { self._join(p.as_ref()) }

to reduce generics bloat and defer implementation to non-generic _join method. That non-generic implementation could be moved to an ABI-stable library without need to support generics in the ABI.

This is similar to how C libraries use macros and inline functions in the .h files. The header files are the API, but not everything is in the ABI. The ABI is only for whatever these macros expand to.

5 Likes

@kornel I do want to see a "safe ABI" subset, which is bigger than C and smaller than fully general Rust.

However, having a shared library for cases like _join would require making the internal _join interface stable, which would increase our stability surface area. And in practice, I don't think most people who want shared library support specifically want it to save disk storage space. They do sometimes want it to save RAM, and a shared library might help a little with that when multiple Rust programs are running simultaneously. But mostly, shared libraries make distribution maintenance easier: you can upgrade a library without rebuilding the world. This wouldn't necessarily solve that problem. And I think it would in practice make distribution of Rust binaries harder for many people, because then they'd need to supply the matching library.

1 Like

The interesting ABI question when talking about kernels, to my mind, is: Suppose a kernel written entirely in Rust + assembly, instead of C + assembly. What subset of Rust types can safely appear in the signatures of system calls, such that user space programs written in Rust can use them with no friction (and ideally without unsafe), but it's still possible to write user space programs in other languages?

It's important to think about both passing and returning values here. For instance, passing &str and &[u8] is straightforward, but being able to return str and [u8] is also desirable and might be a huge pain.

5 Likes

This got me thinking about other possibilities; Erlang has the ability to hotswap code. For servers, kernels, and other very long running processes, this could be a Good Thing™. Can the current ABI support this kind of use? If not, what would be required to make this a standard?

I want to toss i128/u128 into the mix here, not because they are all that hard, but because they illustrate that rust has been extended in the past, and could be again in the future (I don't remember when those values became stable, but I do remember them not being available when I started programming in rust). If this happens again in the future, a stable and immutable ABI will become a burden.

Thinking about all of this really got me thinking about what we're trying to solve. In my mind, we're trying to turn software into a bolt or a screw; a utilitarian object whose interface and guarantees are easy to determine. The ABI is a method of determining the interface and guarantees of the object under examination (library, application, whatever) in a forwards-compatible manner, so that you can decide at runtime if two 'things' are actually able to communicate, and decide if you can replace one instance with another. The closest analogy I can think of is how I can swap one bolt for another bolt in an engine, provided I know both the size and threading of the bolt (the interface), and the yield strength required (the version). If I try to put in the wrong size bolt, it's immediately obvious that the interfaces don't work; likewise, a loader that can't find certain symbols in a library that an application is asking for will fail to load the application.

But finding symbols is the easy part; the hard part is determining if the intent of the interface is unchanged. Semver exists in part to let us know if the intent of an interface has changed; e.g., if fn foo() printed Hello, World in 1.0, but it now prints Frobnicate, the version number tells me if the intent has changed, something the loader won't be able to determine (e.g., was the change a fix for a spelling error, or will the change have a major impact on how the function is used), the versioning information is actually more important because computers can't decide intent, only the programmer can. So we need a machine parseable versioning interface that is guaranteed to remain stable across all versions of the ABI. Ordinary SemVer may not be sufficient (you need a total ordering).

But that is only half the story. If I continue with the machine analogy, think of a large machine like a ship. It is common for a vessel to be undergoing some kind of maintenance while it is in active use. Erlang recognizes this, and has a method for performing maintenance while the machine is in use. But as far as I know, there is no common ABI specification that permits this happen. We do have something similar in that we start and stop applications, and in some cases different applications are able to save their state in a form that a later version of the application is able to pick up where the first stopped, but it would be nice to be able to replace the security module of a running webserver without having to stop and restart the server.

1 Like

Most languages use C ABI to do FFI, so we have to gurantee that our structs and enums fit it. If we want to use dynamic sized types, for example &str or &[u8], we might want to do C's struct Dynamic{int size,void* data} analog, but this is obvious. Things become more complicated if we do trait objects in syscalls: C doesn't have vtbls, but it has function pointer; we could invent some method calling schema, for example, provide void* (*call)(int,void*) C function pointer which call the method from vtbl with tuple from second argument, method is indetified by index, which is first argument of function which we provide.

If we want to do non-FFI trait objects, then ABI can be more complex, but i'm not sure how it could look like.

In case of dynamically sized values, i don't think that we can do that with plain stack, think of segmented stack.

1 Like

I'm actually against this. To quote @matklad:

If this is the route that we're taking (and I really, really hope it is!), then we should ignore C ABI compatibility completely, and develop a new, clean ABI that addresses current needs. Furthermore, if you do need C ABI compatibility, we already came up with a solution; see A Stable Modular ABI for Rust above.

I tent to disagree, mainly because C already have all primitives we actually need, we can do both struct and union in C, that is sufficient to describe anything we need. Slices are repesented like a struct of size and pointer to data. I've wrote the definition of slice above.

Next point is how to deal with representation optimizations, current rust ABI is unspecified intentionaly, to allow this optimizations. We have a bunch of optimizations that change the representation, all of them could be configurable by special macro, effect of which is determined by input params only, note, the compiler might be forced to do pointed optimizations, as well as give gurantees on what they produce, leaving little to no room for their use.