When is the ABI stable?

Rust doesn’t have a stable ABI. OK, great.

But it’s not completely unstable. If I build the same code with the same compiler, I expect the ABI won’t change (if nothing else because I expect the output to be identical).

Do any compiler options change the ABI? Optimization level? LTO? Anything else? Building for a different target definitely will, and I’m guessing there might be some llvm ones which change register use.

Is there a document which definitively describes under what circumstances the ABI will remain stable, and ones where it’s not guaranteed?

Context: I have a cache which contains abomonation-serialized objects. Its is currently in-process only, but if I make it semi-persistent, can I determine the times when I need to invalidate it (eg, compiler updates)? (I’ll leave abomonation updates and type definition changes as a separate concern.)

I believe for what you're doing (type-punning memory buffers) only the memory layout (size, alignment, field offsets) of types is relevant. This is already a big ask, but the calling convention part of "the ABI" have been considered even less.

The Unsafe Code Guidelines working group has written up some layout guarantees -- a mix of documenting pre-existing commitments and proposals that are not yet accepted by the language. However, as you will see, that does not include any semblance of "layout stability" for many aggregates types without an explicit repr(..) attribute. Back when this was discussed, we couldn't even get consensus among the (relatively few) people participating that layout of e.g. structs should be a deterministic function of the entire compilation inputs (e.g., some people want to be able to randomize struct layout), let alone of the layout of the struct's fields (e.g., some people want to be able to lay out each struct differently depending on how its fields are accessed at runtime, like profile-guided layout).

So the answer right now is (for enough types that the exceptions aren't very interesting)

ABI and even layout can change between any two compiler invocations even if they are 100% identical

and this will remain true even if more currently-proposed layout guarantees are accepted.

7 Likes

Doesn't that imply that compilation is non-deterministic? And it seems to even make separate compilation pretty awkward.

(Separate from this question, we have a strong requirement that Rust compilation is deterministic for a given set of inputs/options, so actually introducing this non-determinism would be very bad in practice.)

FWIW, my understanding is that these kinds of non-determinism actually have been introduced, but in ways that are strictly "opt-in" for whoever is invoking rustc, like the "optimization fuel" flag. Of course that still means no non-leaf crate can ever write unsafe code which relies on layouts being deterministic/repeatable, but it does mean that bit-for-bit deterministic builds are opt-out in practice rather than opt-in, which seems like the right compromise to me.

Can anyone confirm whether that is a) correct, and b) an official decision/policy/guarantee?

2 Likes

I think this depends on details and perspectives. The thread title here uses the word "stable", which we tend to use to mean that something won't change across versions ever. I doubt that any such promises have been made or will be made, especially at the language level, since it's so strong. (And thus I agree with rkruppe.)

That said, the practical realities of coding and separate compilation mean that it's likely that any particular version of rustc will have sufficient flags that one can make reproducible builds (or close to -- I remember nuances about timestamps or build folder paths or something). That just might take different flags on different versions, the outputs might never compatible between rustc and mrustc, etc.

But as usual, something working a particular way now isn't a guarantee in perpetuity.

1 Like

Well, I mean "stable" as in "when doesn't the ABI change", which simply assumes that there are some circumstances in which it doesn't. @hanna-kruppe's reply undermines this however, if the compiler (designers) want the leeway to change layouts and other details from build to build.

There's the second meta sense of "stable", which is "are the rules for determining stability themselves stable from version to version?", or at least change in backwards compatible ways (ie, tightening the restrictions on when ABI change is allowed). In this sense the rule itself is stable - the ABI not stable at all. It can only get more strict from there.

I'm fine with any change that's a product of changed inputs. Ie, if I change compiler options, or supply a different profile input for profile-directed optimization, I'm fine with changes. The only case I worry about is identical invocations of an identical compiler binary with identical inputs leads to different outputs.

I think (hope) nobody would actually argue for truly nondeterministic compilation without an option to make it reproducible. Even for the "randomized struct fields" idea, surely it would come with a compiler option that fixes the seed for the randomization, or it might be derived from the input in some deterministic fashion (e.g., hashing the source code).

But there is a distinction between "what does the language guarantee holds true in all cases" versus "what can you achieve in specific cases by finding the right rustc options" as @scottmcm also mentioned.

In addition, even if we'd agreed layout should be a deterministic function of something (I really wish we'll be able to do that eventually), we'll still need to figure out the right extent of that "something" so that it's non-trivial and yet allows everything we want the compiler(s) to be able to do. That will be a significant amount of bikeshedding as well.

2 Likes

However, while flags might be good enough for performance tweaking or for certain CI needs, I think one should never rely on such non-guaranteed implementation details for soundness. I would also not want to make layout an implementation defined matter where you can make assumptions about rustc for soundness under certain flags.

It's my understanding that if optimization fuel is enabled, that is the behavior. Multiple invocations of the compiler with the same options, provided one of those options enables optimization fuel, can produce outputs with different struct layouts.

1 Like

I thought part of the purpose of optimization fuel was to make it possible to “bisect” for an optimization that is causing some bug? Non-determinism would make that very difficult.

My understanding is that the bisection is only at a per-crate level. Inside the crate, if optimization fuel is non-zero, then any structs optimized after running out of optimization fuel don’t get reordered. The layouts would vary depending on which structs are run first?

Then again, maybe I’ve completely misunderstood and the optimization of structs happens in a guaranteed order, in which case you’re completely right and layout would be deterministic given the same flags.

First I'll note that dumping out the contents of (what is essentially random) memory out of a process, with the intention of reading it back, is how you get nasal demons.

The above notwithstanding, I think that the right solution for your problem is to give the structs that you're putting into this cache have a repr with a documented ABI (e.g. repr(C) or repr(packed)). There is maybe a separate conversation of introducing a StableAbi marker trait, though I think that's not a terrifically good idea due to there being no notion of a "stable ABI" across platforms (even ignoring endianness and the size of c_char, since BE machines don't exist in practice).

In general, I think that the appropriate thing to do in situations where you want a stable ABI is to just go with the C one. Another scenario where you sort of want this is for plugin systems, with an extern "C" interface (I don't really know what Rust's story for dylinking is, though).

The use-case is in-memory on one machine, so we're not too concerned with the architecture changing. But the memory is a shared memory segment, so its lifetime is decoupled from a process lifetime. So the problem is just seeing if the layout of the structures have changed from process to process. Since I'm assuming it can't change from run to run with one build, it essentially reduces to whether the layout can change from build to build.

#[repr(C)] is an interesting idea, but I don't think it will help here (presumably it only affects the top-most type). Likewise #[repr(packed)] seems to be difficult to use correctly.

Correct. I'd suggest recursively making sure all of the fields are #[repr(C)]. There is, unfortunately, no good way to test this that I know of. However, since you're sending data across a process boundary, you are already doing things that Rust has no hope of protecting you from, so you'll need to be extra careful.

Alternatively, you can use a protobuf and pay the codec costs, which are probably less than IPC costs.

1 Like

I hope so, but maybe that should be explicitly stated somewhere as well (such as in the UCG documents, to start). Maybe this is a common ground that everyone can actually agree on. I hope it is. :wink:

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.