Automated Data Oriented Design (DOD) transformations?

I think the compiler already does this for the limited case of a single bool in a enum, but I'm wondering if these transformations could be applied more broadly.

For example,

enum E {
    A(u64, u32, u16, u8, bool),
    B(u64),
}

is only 16 bytes even though it should be 24 if the bool wasn't being inlined into the tag. However, adding just a single extra bit to either variant breaks this optimization and makes the type 24 bytes. Are there any technical limitations that would prevent us from using all 8 bits in the tag?

I'm not sure how common these inlinings would be in practice (only useful if the extra byte is going to exceed the alignment boundary), but it'd be neat if this was something we could do.

Ooh, just discovered another fun one:

enum E {
    A(u64, u32, u16, u8, NonZeroU8),
    B,
}

Also 16 bytes by using the niche. Of course it breaks if you add another variant since that requires another bit.

The language-semantics reason is that it must be possible to take a reference — even &mut — to any field of the enum. This can be disabled, currently for #[repr(packed)], but that doesn't enable bit-level packing, and if there was such a bit-packed repr, it won't be the default choice of representation for an enum since that would stop it from being used in the normal Rust fashion where you can take an & reference to anything.

3 Likes

Dang, forgot you could do that. So then is the reason the current optimizations are only able to work on a single bit because they're using the tag as the relevant byte when taking a reference? For example, the bool example I showed could have it's tag be 0 if it's A with bool=false, 1 if it's A with bool=true, and 2 if it's B. That still doesn't explain why changing B to be more than 64 bits breaks this though. Maybe a bug?

Yes — and if you have two bool fields, you need two distinct bytes that hold 0 or 1. So two bools can never be packed into one byte.

That still doesn't explain why changing B to be more than 64 bits breaks this though. Maybe a bug?

No idea. It's not a bug in the sense that no layout optimizations of this sort are guaranteed, but it does seem like it ought to be possible.

My guess: u64 alignment means that (u64, bool) or (u64, u8) is already 16 bytes, but where the last byte is part of the padding and can thus be in any bit pattern; if you assigned over the entire payload for B, it might alter the state of that last byte.

Neat, thanks!

Not sure I understand: whatever happens the tag needs to be assigned correctly. Are you saying having padding bytes between the u8 and a tag at the end is a problem?

If you could take a mutable reference to the entire payload and assign over it, there wouldn’t be anywhere to also set the tag. But Rust doesn’t let you do the thing I just said, so maybe that’s not the actual explanation.

Oh gotya, maybe that makes sense? I created Continue performing Data Oriented Design optimization in enums with variants of similar size · Issue #106026 · rust-lang/rust · GitHub to try and eventually confirm.