For a library author, how maintenance of a stable ABI would look like?

In the context of discussion of adding a stable ABI, I'd like to brainstorm how Rust and Cargo can better support library maintainers who maintain a stable ABI.

The status quo for #[repr(C)] and C libraries in general is quite poor. It boils down to having to know a bunch of platform-specific implementation details of their ABIs, and just being careful not to break any of the invisible things. If a library author accidentally makes a breaking ABI change, the failure won't be detected at compile time. It will cause data corruption and crashes at run time. The problems may happen only in specific circumstances/usage scenarios, or only on some platforms, which increases risk of shipping a broken ABI. This is hard to test for.

I wouldn't want Rust to stop at just slapping #[repr(stable_v1)] on items and being in the same "hope it doesn't break!" boat as C. There has to be a better way!

What I imagine:

  • I would explicitly label in the source code which structs and functions are part of my stable ABI. Perhaps with fine control, like I guarantee that a field exists, but don't guarantee struct size remains constant. Or if I guarantee a struct size, I can make that a constant that automatically adds padding for later extensibility (_padding: [u8; 40] is tricky, because the right size may vary depending on field sizes and alignments that vary between platforms, and swapping padding for a field requires recalculating padding size by hand, which is error prone).

  • Rust enforces this when the library is used via ABI (e.g. dynamically linked) and doesn't allow acecss to any items that aren't in the ABI, and doesn't inline them (unless explicitly allowed by the ABI definition), doesn't allow storing or passing structs by value if the ABI doesn't commit to their size (I don't want to be forced to wrap every ABI-stable type in Pin). Alternatively, I wouldn't mind having Swift-like "witness tables" for variable-sized memcpy of by-value ABI-stable types.

  • And most importantly: the ABI from the source code can be exported to a text file, something like a Cargo.lock, but describing the binary interface of the crate. The tooling would warn when the source code differs from the "locked" ABI — on any supported target, not just the current host! This will catch unexpected ABI breaks at development time. Thanks to having lock files in version control, I would be able to easily and reliably compare ABI with previous versions of the library to offer accurate changelogs and backwards-compatibility information.

12 Likes

I would love something like the last bullet. Not just for ABI but for public API in general.

10 Likes

This seems like the hardest requirement with regards to tooling. I'd like to embed it into the binrary so that it can be loaded and checked by any dll-consumer downstream (and potentially to embed multiple versions of an API by not tying the types to static symbol names?). In fact, my dream tooling for dynamic linking would run full type-checking of the interface in a manner like the Rust compiler so that there is virtually no distinguishable difference between static and dynamic linking. Since not everyone wants that, I get that producing a separate interface description file is also desirable. This same file would then also serve as the declaration file in the consumer, but since it requires type analysis passes first it's not as simple as running include_bytes! on a path produced by a build script.

3 Likes

I love this! If the text file is sufficiently detailed, you could also use it to generate bindings, kind of like swig or cbindgen do, but if this is done with #[repr(Interoperable_XXXX)] as well, then any language that obeys the representation could have a binding to any other language[1].


  1. I completely get that the text file will not be sufficient for human understanding; you still need documentation, examples, etc. to be complete, but I can see the docs being extracted and embedded into the text file, which can then be used to create a first-pass set of docs. The doc examples will probably be in the wrong language, but the generated signatures will be correct, which should give end users a way of guessing what how to use it from the docs. ↩ī¸Ž

1 Like

One thing I'm unsure about is whether starting from Rust code is the right way to maintain such an ABI, where the goal is to use the portable-between-many-languages subset.

It feels like it wants a definition file authored directly, with the code generated from that -- like a protobuf IDL for data interop, or an OpenAPI spec for RESTful things.

That is actually one of the ways that you can use swig. See the documentation about writing interface files.

As for starting from Rust code to generate the textual ABI, it doesn't have to be an either-or proposition. You could generate the file by hand, using a special tool, or with special flags to rustc/cargo that causes it to emit the files for you. Doing so from the rust code might require special attributes on the parts that form your ABI, in case the compiler isn't able to glean enough information from compiling your crate. The advantage of doing it from real code is that the compiler checks your spec itself. That can be a big advantage.

I agree that having something better is desirable, but it's worth noting that this is already the case in Rust w.r.t. breaking semver API stability. semverver exists as a tool to check a (relatively large) subset of API stability, but for the most part, we're still relying on social guardrails to catch accidental breakage.

I suspect an initial implementation to look like an rdylib with extra dylib metadata sections describing ABI. It would be very nice to be able to have the native ABI descriptions be a native translation of the Wasm ABI descriptions / interop types; this reduces duplication between native and Wasm, as well as makes the set of compilers supporting interop potentially larger.

If extern "safe-1" (or extern "wasm-preview-1" if we adopt the Wasm specification wholesale) is supported in its entirety, then it acts as the check that you're only using functionality interoperable between languages that support that ABI surface. (Otherwise you'd get an improper-wasm-types lint.) I think a desirable goal of an ABI interoperability initiative would be to have an interop format optimized for machine readability first, human debugability second, and human editing third. It's advantageous to authors to write declarations in their native provider language, and then have the tooling complain if they get too ambitious.

Requiring authors to learn a new bindings language is a step that can be avoided with a smart enough technical solutions with O(languages) effort rather than O(authors) effort to learn the bindings language.

For extern "C" bindings, imho Rust already is one of the (if not the[1]) best options for declaring an API/ABI surface. (The only thing really missing imo is a proper opaque extern type rather than polyfills.)


If an ABI interopability initiative does get up and going, please do ping me! I'd love to help set up tooling for translating within the {Rust, C, C++, Wasm, ABIDL} set.


  1. I've not tried Zig, and it's likely a contender due to its 100% automatic C interop rather than Rust's reliance on (c)bindgen to translate between Rust and C headers. ↩ī¸Ž

Or .wit files (WebAssembly Interface Types). Interface Types are designed for this use case but specifically for WebAssembly; however, it seems like it would be pretty straightforward to port the tooling to native platforms. On one hand, it's very immature, and its featureset is somewhat limited. On the other hand, it's under active development.

3 Likes

While I'd love to see more widespread machine checking of API stability, I'd argue that ABI mismatches are worse than API mismatches. API mismatches at least usually give you errors at compile time, whereas ABI mismatches usually produce silent memory corruption, which in my experience can be very difficult to debug...

4 Likes