C-style enums in FFI (and a proposal to lump in with unions)


#1

I’ve ran into an issue with FFI that I believe requires clarification, or maybe some additions to the language. Many C APIs have functions with output value types defined as C enumerations. In Rust FFI, these are “naturally” represented by #[repr(C)] enums with C-style integer members. This is all good while we receive values that fit our FFI definition. But if the foreign library grows some new enumeration members and is used without changing the Rust-side definition, attempting to match an unknown value of a Rust enum brings undefined behavior.

In the C world it’s normal to update dynamic libraries and expect binaries built against older versions of those libraries to work as long as binary compatibility is preserved, and adding new enum members is not normally considered an API or ABI break (the standard says enums can be backed by implementation-defined integer types, but in practice they are int-sized unless overridden by compiler-specific means). The soname of GLib has been kept the same for a dozen years, and recent releases of the library probably still work with GNOME 2.0 binaries. I think binaries built from Rust sources should be kept to the same expectations.

Some C APIs are pushing it further by defining bit flags with enums and then using the enum types directly in function signatures and structures. It’s apparently legal in C, where enums are full integral types with some value constants sprinkled over them as a vague hint on the expected domain. This brings issues on the input side as well: input parameters and struct members should be declared in FFI with their corresponding enum types for repr compatibility, but the actual values are composed from bit flags and so need to be transmuted.

So how can we deal with that heathen land of C? Can we rely on a tacit assumption that the actual underlying integer value can be received intact by transmuting the enum value into an appropriately repr-sized int, and vice versa? In the long term, would it be worthwhile to add more ergonomic means for passing “vaguely enum” values through FFI?

There is a long-standing Rust issue on the lack of support for C unions. I think that is a very similar problem to the one discussed here, and as such both can be solved by adding a single new Rust type: unsafe enum as described in the discussion on rust-lang/rust#5492. C-style unsafe enums would be used to unsafely match C enum values, while the struct-membered variety would stand in for unions. What do you think?


Proposal + Bikeshed: Rename C-like Enumeration
#2

Currently I get around the issue of the lack of C unions, C bitflags, C enums, and even C bitfields by using my own custom macros in winapi.

Also note that #[repr(C)] for enums is wrong since it will try to fit the discriminant in a usize or isize while on Windows enums are always 32-bit


#3

Ah, but there is #[repr(i32)].

On the other hand, if #[repr(C)] on x86_64-pc-windows-msvc ends up being something completely different than what the C compiler uses by default, there is a problem.


#4

Could you clarify? IME repr(C) doesn’t use usize/isize, at least, not on Linux. The following code prints 8 4 on the playground:

use std::mem;

#[repr(C)]
enum C { A, B }

fn main() {
    println!("{} {}", mem::size_of::<usize>(), mem::size_of::<C>());
}

#5
#[repr(C)]
enum C {
    A = -1,
    B = 0x80000000,
}

That prints a size of 8. Enums in Windows API will sometimes have shenanigans with either negative discriminants or discriminants that are greater than the range provided by an i32. In C/C++ land it is always 32-bit, and if you try to put something greater than a u32 can provide it errors, but in Rust it just increases the size and compiles fine.


#6

By the standard, one should not expect to be able to use discriminants that cannot be represented by int. So I guess what to do with discriminants that fit in unsigned int, but not int, is implementation-dependent.

What does MSVC make out of:

#include <stdio.h>

typedef enum {
    A = 0x80000000,
    B = -0x80000000
} C;

int main() {
    C val = A;
    printf("sizeof: %u\n", (unsigned)sizeof(val));
    printf("it %s wrap around!\n", (val == B)? "does" : "does not");
    return 0;
}

With GCC on x86_64-redhat-linux, it does wrap around. Curiously, it takes option -Wpedantic for the compiler to start complaining; even -Wall -Wextra -std=c11 will not result in any warning. Yeah, we all know -Wall is a lie…

There is an issue here, indeed. Rust should be compatible, quirk-for-quirk, with the behavior of the de-facto main C compiler on the target.


#7

A couple more discoveries.

This is 64-bit wide accordingly to GCC:

typedef enum {
    A = 0x80000000LL,
    B = -0x80000000LL
} C;

This doesn’t compile in today’s Rust:

#[repr(C)]
enum C {
    A = -1,
    B = 0x80000000u32,
}

So, it appears that GCC uses the discriminants’ types (including implicit value-based coercions) to decide on the size of the enum, while Rust coerces all values to isize and tries to work back from there.


#8

It’s not just the value, the literal’s numeric base also figures: http://en.cppreference.com/w/c/language/integer_constant


#9

Huh. I’m trying to remember the details here, but certainly our intention was that #[repr©] would match the “main C compiler”, to the extent that this is possible. I sort of remember that we decided to just use i32 unless explicitly specified, as part of the overflow checking work, but maybe I have it wrong. @pnkfelix do you remember?


#10

I have submitted my findings as a Github issue.


#11

(no I don’t remember the discussion here.)


#12

As I recall, it was meant to match the target’s C ABI, and git agrees with my memory: the exact words I used were “match the target’s C ABI for the equivalent C enum”. I remember I spent some time looking at ABI specs for the targets that were supported at the time, but I missed that Windows has a maximum size. So that’s a bug, in my opinion.

Also, there’s probably at least one target that’s been added in the past 4 years where the i32 default is wrong. (Originally there was no default, but the target architecture was changed from an enum to a string at some point.)

But trying to match C exactly is… difficult, as the examples in the above-linked GitHub issue demonstrate; this raises the question of whether “the equivalent C enum” is a well-defined concept.

And there’s the problem where C allows values that are impossible for the Rust enum. This raises the possibility that #[repr(C)] was never the right way to do C enum FFI. I don’t remember whether we just completely overlooked it, maybe because FFI usage was mostly Rust→C rather than C→Rust and being unable to generate those values wasn’t as bad as not being able to handle them, or if we noticed it and meant to do something about it at some point and nobody ever did, or what.

As for what to do about it: it would be nice to have something in the base language, or even a crate like libc, rather than requiring bindgen and/or winding up with code assuming i32 and breaking on uncommon platforms.