Making #[repr(uN)] enum FFI-safe (pre-RFC?)


#1

Currently, only enums where none of the variants have data (aka “C-like enums”) are FFI-safe, and only if they have #[repr(C)] or #[repr(uN)] for some integer size N. (Also, as a special case, Option<&T> and equivalent types are FFI-safe.)

In the PR documenting this in the nomicon, I learned that enums with data (“non-C-like”) are not FFI-safe, even if they have #[repr(C)] or #[repr(uN)] and all of the data fields are FFI-safe. This surprises me: I would expect that, for instance,

#[repr(u8)]
enum TurtleAction {
    Forward(distance: u16),
    Rotate(angle: u16),
    PenUp,
    PenDown,
}

should be FFI-safe and have a totally stable representation. Specifically, that should have the same representation as #[repr(C)] struct TurtleActionFFI {action: u8; parameter: u16;}.

I also learned, furthermore, that the improper_ctypes lint (see source) already goes to the work of checking each enum variant, and only warns you if some field in the variant is itself non-FFI-safe. So, the lint would consider TurtleAction to be FFI-safe.

The logic there is pretty similar to that for untagged unions, and my reading of the untagged union RFC is that those are probably intended to be FFI-safe if they contain FFI-safe data. And the only thing that tagged unions add is a tag at the beginning—that is, a C-like enum indicating which variant to look at, and C-like variants with #[repr] are already FFI-safe. So there isn’t really much with an undefined representation here.

What needs to be done to stabilize the representation of (tagged) enums with #[repr(C)] or #[repr(uN)]? Would this be an RFC? I’m hoping this should be easy to stabilize, but I guess people just need to say “Yes, we won’t change the representation of these enums”?

If this is hard to stabilize quickly, I’d like to open a PR removing the above logic from the improper_ctypes lint, and simply warning on all non-C-like enums regardless of contents (besides Option<&T>, of course). It looks like the logic to check field members for FFI-safety was added when the current incarnation of the lint was written (PR #26583), and I see no discussion on the PR about enums either way, so I’d totally believe that it was written as a good-faith hint but not an ABI promise.

(Also, is there some documentation of what’s FFI-safe and what’s not? Even if the improper_ctypes lint were able to be 100% accurate at all times, it’s not a good reference for humans. Is the Nomicon the right place for this?)

cc @ubsan, @Gankro, @steveklabnik (from the Nomicon PR), @eefriedman (author of the current improper_ctypes lint)


#2

Non-expert opinions:

  • #[repr(u8)] on a type makes me think all values of that type are represented as u8s, not that the discriminant of the enum is a u8 and the rest of the value is arbitrarily large, which is what you appear to be saying. So assuming we want this at all (see third bullet point), I’d strongly prefer a less confusing syntax, perhaps #[repr(C, discriminant=u8)]. I’m not familiar with exactly what attribute syntax allows but I assume we can get close to that.

  • I agree the improper_ctypes lint should be more accurate and the rules for FFI-safe enums should be easier to find. I imagine some of those rules will change in the future, but it seems like it shouldn’t be controversial to make it clearer what is guaranteed today and what isn’t.

  • If I’m understanding your proposal correctly, you’re assuming that an enum value almost always consists of X discriminant bits and separate Y value bits, with no overlap. Option<&T> is the most famous counterexample, but Rust could add plenty of others if it wanted to (https://github.com/rust-lang/rfcs/issues/1230), and #[repr(<discriminant size>)] doesn’t really make sense in those cases, so I’m not sure if that feature would be sufficient. Hopefully someone more familiar with FFI needs (cc @retep998 ?) can confirm whether we need any FFI-safe layout-optimized enums in Rust, or if they’re sufficiently niche that this proposal is a good solution for the majority use case.


#3

The latter (discriminant size) is the already existing behaviour in Rust.


#4

Wow, I massively misread the Nomicon then. After double-checking it apparently only says “These specify the size to make a C-like enum.”, which probably explains how this happened. We should definitely make this more explicit.


#5

As far as I’m concerned the behaviour of any of the reprs on non-c-like enums is basically completely implementation-defined, in the sense that it’s never been specified or really meaningfully discussed afaik.

If it does something interesting right now that’s probably a total accident. Except for perhaps the “supresses option-like optimizations” part? (Option is repr(rust)) Even that is a grey area, though. Changing this requires an RFC, and that will likely be a very hard RFC to pass.

Personally I would prefer that there be a new repr for the kind of thing you want, and an error/warning for applying any of the existing c-like reprs to a tagged enum. Ideally Rust would be able to generate a C-header for the type.

I agree improper-c-types should be fixed.

Also just for clarity I would like to provide the relevant section from the nomicon:

Due to its dual purpose as “for FFI” and “for layout control”, repr© can be applied to types that will be nonsensical or problematic if passed through the FFI boundary.

  • ZSTs are still zero-sized, even though this is not a standard behavior in C, and is explicitly contrary to the behavior of an empty type in C++, which still consumes a byte of space.

  • DSTs, tuples, and tagged unions are not a concept in C and as such are never FFI safe. (editor’s note: Option<&T> excepted)

  • Tuple structs are like structs with regards to repr©, as the only difference from a struct is that the fields aren’t named.

  • If the type would have any drop flags, they will still be added (editor’s note: no longer relevant as drop flags are dead)

  • This is equivalent to one of repr(u*) (see the next section) for enums. The chosen size is the default enum size for the target platform’s C ABI. Note that enum representation in C is implementation defined, so this is really a “best guess”. In particular, this may be incorrect when the C code of interest is compiled with certain flags.


#6

I have absolutely no need for FFI safe Rust enums. The ones with data are completely useless to me, because C libraries very rarely match the exact layout of Rust data enums, and untagged unions cover my needs there just fine. The C-like enums without data are also useless to me because they cannot hold values other than their explicit variants, which is both dangerously unsafe and often impossible due to enums that are used as bitflags, plus by using simple constants I save a ton on compile time.


#7

It would help if #[repr(uN)] actually also meant that all values of the underlying type are valid, with only some valid values being named.


#8

Would very much want to see this too - basically to specify the repr© on tagged enum just like a structure with a separate tag and then the untagged union. As @Gankro said, this already does “something interesting” by pure accident, so this wouldn’t be a breaking change, but rather a specification of an existing behaviour.

Current problem is that even with suppressed optimizations there is no way to share tagged unions within one library between C and Rust pieces because on one hand, you can’t define a type on FFI bindings side to consume it from C, and on another you can’t generate safe C headers from Rust tagged enum either.

I would be happy with either direction working, because for now whenever you want to work with such enums across C<->Rust boundary, you need to write highly unsafe code accessing tag and union variant separately which can easily go out of sync by mistake.

The only alternatives currently are to either have yet another intermediate type which is same tagged enum but that is convertible from/to unsafe structure (again, requires highly unsafe code plus extra conversion roundtrip each time which is not always feasible) or to suppress improper_ctypes and pray that current behaviour of repr© / repr(uN) will continue to work as expected. Either option doesn’t seem good.


#9

Note that eddyb is working on optimizing enums and may break your shiz if you’re unsafely relying on things:

https://github.com/rust-lang/rust/pull/45225

I have separately been working on making FFI more automated/robust with native Rust types, but am waiting for eddyb to finish his work.

Note however that tagged unions are the absolutely hardest thing to get across the boundary, especially because the nature of modern ABIs means you can’t just say certain parts are opaque – (u16, u8, u8) can have a different ABI than (u16, u16).

I’m cautiously optimistic it would be possible if repr(C) was made to do something useful (although I’m not a huge fan of implicit placement of the tag).


#10

I hate C enums, because their width is implementation defined. My advice is to not use them across any ffi - and wrap them in something with a known size in C to be compatible with different compilers.

If you’re writing the C code, just don’t use them in your API.


#11

It was meaningfully discussed and it has a test:


#12

@Gankro Is his PR going to touch repr© / repr(u*) though?


#13

Yes! And given that there is test for optimizations being disabled on such repr, why not document this behaviour as expected? This will still allow layout optimizations on normal repr(Rust) enums, but will give at least some safe-ish interop for Rust<->C users.


#14

I have filed https://github.com/rust-lang/rfcs/pull/2195


#15

@Gankro Thank you so much! If this RFC gets through, we’ll be able to unblock some significant ergonomics improvements in mixed C/Rust projects.