repr(C) AIX Struct Alignment

I should add why I think bytemuck could break:

As I understand it, zerocopy basically reimplements the layout algorithm itself to be able to check transmutsbility safety. I do not know if bytemuck does that too, or uses some other approach. Thus it seems likely that current versions of them will not be aware of the platform layout rule changes, and will thus break if those layout rules change.

Are those platforms also ones that should get changes to match actual C? e.g. this thread is about AIX, and if we do make its repr(C) match those weird alignment rules, it will still have no bearing on other platforms.

Well, right now if they do that with a C program it's unsound. Win one, lose one...

AIX is extremely niche, and maybe 32bit MSVC can be considered niche, but some of these issues affect all MSVC targets and that's not exactly a niche target.

Ah, I wasn't aware of that. I that case I think @kpreid's idea is the only reasonable one. Because as you said: win one, loose one.

Yes. But as @RalfJung pointed out, even 64-bit MSVC has issues. Which is not a niche target (unfortunately).

repr(C) does more than what a hypothetical repr(linear) would, for example it also disables the the scalar abi optimization. Today there exists only one struct type in Rust that's repr(linear), and that's Box.

I do think we should separate the concept of repr(C) ("match the C layout") from repr(linear) or similar ("lay these things out linearly without being clever"). But I think we should solve this over an edition, such that post-edition repr(C) means "match the C layout" on more targets.

2 Likes

What would be the edition migration procedure? ā€œRead your code to find all repr(C) and figure out whether it should changeā€ doesn't follow the established principles of easy automated migration.

Also, this would introduce subtle ambiguity into every tutorial or example program written that contains repr(C) and doesn't specify an edition.

The reason I specified deprecating repr(C) is so that all newly written or updated code would be entirely unambiguous.

3 Likes

Introduce a new name (e.g. repr(C_2027)) that always has the new semantic (match the target platform), and a new name (e.g. repr(C_2015)) that always has the old semantic. Automatic migration can translate existing repr(C) to repr(C_2015).

The end goal would be for repr(C) to always mean "match the target platform's ABI". repr(C_2015) would still mostly mean that, modulo bugs that we couldn't fix for compatibility reasons, so I don't think there'd be an excessive problem with ambiguity.

2 Likes

I'm confused about how your proposal differs or succeeds. Are you proposing a migration over two editions, then? (Replace C with C_2015, then in a later edition add C as an alias for C_2027.)

If so, then I still think it is a bad idea to use the repr(C) name, because its usage is irreparably confused. Also, it would be a hazard for users migrating code forward by multiple editions.

1 Like

No, I was proposing doing it over one edition.

I think in practice there won't be much confusion, because we'd be fixing bugs. The main reason to support the old ABI is for compatibility in public interfaces, and even then, some library crates will probably decide that it isn't a breaking change because on the targets in question it never worked.

C interop already has to deal with extern "C" versus extern "system". This feels like an analogous split in repr to me, even if the split is in a different place.

1 Like

There is lots of user code which is in fact using repr(C) to mean ā€œuse this deterministic layout algorithmā€. You would be introducing bugs into that code. This is not merely an abuse of repr(C); the Rust Reference, the closest thing we have to a spec, explicitly says:

The C representation is designed for dual purposes. One purpose is for creating types that are interoperable with the C Language. The second purpose is to create types that you can soundly perform operations on that rely on data layout such as reinterpreting values as a different type.

But, as this thread demonstrates, these dual purposes conflict. But the C representation is intended for both. So, choosing one purpose and breaking the other is a breaking change; something the language promised is no longer true. That is why I think the representation must be deprecated and replaced entirely; existing uses of repr(C) are ambiguous, and the right thing to do in cases of ambiguity is to make the author of the program clarify, not to make an assumption that might break the author’s intended layout compatibility.

7 Likes

(bikeshed) IMO repr(platform) is not specific enough -- maybe repr(C_ABI)? Or perhaps repr(extern(C)) that matches the extern "C", with other externs available too?

1 Like

The problem with framing it as a conflict between C compatibility and determinism is that you usually want both. After all, C itself expects and relies on determinism; that's what makes it possible for C libraries to have a stable ABI.

Suppose your Rust code links against a C library which has a stable ABI. Should you pick the C-compatible repr (because C), or the deterministic repr (because the fact that code works today indicates that the Rust and C sides currently match ABI, and the C side won't change ABI, so any change to the Rust side of the ABI is likely a mistake)?

Admittedly, if the current trickle of bugs in #[repr(C)] continues forever, then conflicts between C compatibility and determinism will likewise continue forever. But this is a solvable problem. @Gankra recently created abi-cafe, a tool to perform automated ABI compatibility tests between Rust and C. If that can be pulled into rustc's test suite, and if we can start requiring that, say, all tier 2 targets have a test harness that runs abi-cafe against the actual system compiler, then these conflicts should start disappearing. Or at least they'll be relegated to even more obscure situations than the ones they're relegated to today. Which are already pretty obscure.

I actually agree that we should split #[repr(C)] into two: one for C compatibility, and one that tries to be more optimal/flexible. But both should be deterministic. And the C-compatible repr should be called #[repr(C)]; adding another name for the same thing is just creating needless churn. (The other repr should perhaps just be #[repr(crabi)], though the crABI effort seems to have stalled, as far as I can see.)

As for existing conflicts, I like @josh's edition idea, though I think the "lint followed by breaking change" approach might be fine too, given how rare the conflicts are on high-tier targets.

1 Like

Sorry, ā€œdeterministicā€ was a distraction and I should not have used the word. A more precise description would be ā€œfully specified and platform-independentā€: the repr(C) algorithm cares only about the size and alignment of the types of the fields, and not about the platform nor the Rust version. Those are the properties which Rust needs to continue offering to code that needs them, and which would be broken by changing the meaning of the repr(C) name rather than creating a differently-named replacement.

4 Likes

Not quite, even the "correct" repr(C) (using actual platform layout) would actually be deterministic and allow reinterpreting values. It'd just be a different algorithm than it is today. But you already replied to that above, I think.

The actual conflict is with this paragraph:

For each field in declaration order in the struct, first determine the size and alignment of the field. If the current offset is not a multiple of the field’s alignment, then add padding bytes to the current offset until it is a multiple of the field’s alignment. The offset for the field is what the current offset is now. Then increase the current offset by the size of the field.

Unless we want to somehow argue that "alignment" here refers to something different from align_of, this indeed does conflict with the goal of being "interoperable with the C Language". The source of most of these bugs is that young Rust was a bit naive about there being only one notion of "alignment" that we could use everywhere, when in fact there are at least two (well, three, if you also count the "preferred" alignment, but that seems less problematic).

So yeah, it is a decision between the permanent wart of "the layout that matches the C layout is not called C", and breaking this promise that has been in the type layout section since 7 years.

In terms of tier 1 and tier 2 targets, the breakage based on what I have seen so far would be limited to certain types on MSVC:

(It'd also affect many more types on 32bit MSVC if we want to use this to fix this by setting the align_of of u64 to 4... but we already decided not to do that and it is unclear if it's worth revising that decision.)

The last one can be fairly easily linted against. And I think the other two could also be detected with a lint as long as the lint runs during monomorphization, where we can see through all the generics. So this is not insurmountable:

  • add a repr(linear) that follows the rules quoted above
  • add a lint for the problematic types suggesting to use repr(linear)
  • now warning-free code can change its edition without changing behavior

Well... almost. During monomorphization a type is a mix of many editions, and the crate that declares the repr(C) type might not be the place where the warning about changing behavior actually occurs.

1 Like

@workingjubilee informs me that this target is even more insane than I thought:

#[repr(aix)]
struct Superstruct {
    float: f32,
    substruct: Substruct,
}

#[repr(C)]
struct Substruct {
    byte: u8,
    double: f64,
}

Apparently, double will have offset 8 in Substruct. But if you have x: Superstruct, then x.substruct.double will have offset 8 relative to the beginning of x, which makes no sense at all! The distance between byte and double changes when Substruct is used as a field type in a larger struct.

If that's correct, then that's just ridiculous. It means one cannot take pointers to fields of structs any more. I do not see any future in which Rust supports such nonsensical layout. This layout is just fundamentally incompatible with the concept of having pointers into fields of larger structs, which is a key concept in Rust (and in C, making this target also wildly non-conformant for C).

5 Likes

I've long held that if repr(C) matches the platform C compiler for strictly standards compliant C code and compiler behavior, then it is satisfying the dual purpose. Even before how it violates encapsulation, AIX power layout doesn't sufficiently align double for the alignment value that C alignof reports, so any struct impacted by power alignment is outside the domain of C. It's the same with all of the MSVC differences; they require going outside the extent of what is considered "C" as opposed to "MSVC C" or "GCC C" or some other extension.

Obviously we want to match common extensions as much as is reasonable (non-compositional layout is not reasonable), but imo it's not improper if the mismatch exists outside of standard compatible C.

1 Like

So, I've found out that I was wrong, but I was wrong in a way that is worse for everyone, sortof: The AIX "alignment rule" seems to actually just be an extremely convoluted way of saying "this is sometimes overaligned when it is put on the stack". And this is mostly irrelevant: ABIs cannot resist having ad-hoc modifications to how stack passing works, so they all have some ad hoc rule or another, and the stack layout is otherwise a purely internal implementation detail.

...However, the reason was because I was going off what I was being told about how the alignment should work, and despite asking many, many questions, I didn't realize that the alignment should in fact be only 4 for f64 AKA double. I was incorrect because LLVM does not know the alignment of the types for AIX, and because GCC and clang conflate, often, discussion of "required alignment" and "preferred alignment". Instead, only clang overrides the data layout for AIX, meaning that other frontends for LLVM, like, say, flang or other Fortran compilers, or in this case rustc, don't work for AIX, because the C compiler has been modified to have an ABI that disagrees with every other LLVM front end. The alignment for f64 that rustc currently thinks AIX has is 8.

If the rule implemented in clang, as it currently is, happens to be correct? Then AIX actually has zero modifications from repr(C) and this has been much ado about nothing.

5 Likes

Or, in brief: This discussion has been everyone correctly observing that the rule makes no sense if we assume anything about our current data layouts is correct... but, of course, the problem is that the data layouts make no sense, and this entire discussion has seemingly no reason to occur if they are fixed.

1 Like