For structs that have fields that have only one possible representation, enums should use that field as the discriminant

#[repr(u8)]
#[derive(Clone, Copy)]
enum FooDiscriminant {
    Disc = 0x01,
}

#[repr(u8)]
#[derive(Clone, Copy)]
enum BarDiscriminant {
    Disc = 0x02,
}

#[repr(C)]
struct Foo {
    x: u8,
    y: u8,
    z: u8,
    disc: FooDiscriminant,
}

#[repr(C)]
struct Bar {
    x: u8,
    y: u8,
    z: u8,
    disc: BarDiscriminant,
}

enum FooBar {
    Foo(Foo),
    Bar(Bar),
}

In this example, the disc field of both the Foo and Bar types are at the same offset. Both types have the same size and alignment. But contrary to what you might imagine the compiler might optimize this code into, it turns out that the size of FooBar is 5 instead of 4. But the compiler could easily see through the structure of these inner types and choose to use those fields for the discriminants, and as a result the size of FooBar would be 4.

Unfortunately it doesn't work, because the extra bits are padding bits, and padding bits are not guaranteed to be preserved. Consider some code taking &mut Foo and writes a Foo { ... } into it, it may override those bits.

We cannot change this behavior because of existing unsafe code, and we also don't want to change it because it enables optimizations.

There have been discussions about some attribute to preserve padding, but nothing concrete.

Actually, now I see your case is even simpler: you don't have padding bytes, you want to use extra bits and not bytes. This is impossible for a similar reasoning, though: some code may take &mut FooDiscriminant and write to it, or read from it and expect it to be one of the variants.

I think you have misunderstood the example. Given the repr(C) and repr(u8) layout rules, the struct Foo is guaranteed to have 0x01 stored at offset 3, and the struct Bar is guaranteed to have 0x02 stored at offset 3. Any value which does not meet one of these conditions is not a valid value of Foo or Bar. Thus, neither an &mut Foo nor an &mut FooDiscriminant may be used to write any byte other than a 0x01 at offset 3, so the enum FooBar could use that byte as its discriminant.

This is a possible layout optimization, but the compiler does not currently support it. It is similar to the case of wanting to collapse two non-overlapping enums:

#[repr(u8)]
#[derive(Clone, Copy)]
enum Foo {
    F1 = 1,
    F2 = 2,
}

#[repr(u8)]
#[derive(Clone, Copy)]
enum Bar {
    B3 = 3,
    B4 = 4,
}

enum FooBar {
    Foo(Foo),
    Bar(Bar),
}

This could be stored as one byte, but is not (yet).

3 Likes

Ah right, I didn't see that.

This is a harder problem, because it requires range discriminants.

If there's one spot with a single value per variant, that's something that could "just work" already if the layout algorithm can find it and put it in the https://doc.rust-lang.org/nightly/nightly-rustc/rustc_abi/enum.Variants.html#variant.Multiple.field.tag_field.

1 Like