Store Option discriminant in Containing type (optimization)

I've been testing with alignments today and I think I found two optimizations that could be done in regards to usage of padding bytes/unused bits. If I'm missing something here please tell me.

As a reminder: Option<T> (and enums in general) make use of empty/free bits in the types they hold, to avoid the need for an extra byte just to store which variant it is (size_of::<Option<bool>>() == 1).

Usage of padding bytes (due to alignment requirements)

As shown in the example below it does not make use of free bits in the padding bytes needed to satisfy alignment, which makes Option<(u64, u8, bool)> smaller than Option<(u64, u8)>. I think/assume this is because a Copy can/is allowed to skip those padding bytes, which would mean the information is lost. Unless I'm mistaken Rust could make use of these padding bytes by not skipping (some of) those bytes, which - due to inlining - likely results in a smaller code size a smaller memory footprint and the same performance.

See godbolt: Adding an extra dummy: bool (C) makes it less efficient to copy C but more efficient to copy MyEnum<C> (or Option<C> for that matter). I do understand why it is this way, but since these Copy function are/should be almost always inlined the <Option<C>>::Copy function doesn't have to call C::Copy. And (at least in this simple case) using one of the alignment padding bytes works (I may be missing other places where its assumed that padding bytes can be dropped, but all of them should be working on Option<C>).

As far as I can tell the compiler/optimizer could automatically insert dummy if (and only if) it is handling Option<T>, using different functions for copying the bytes.

Usage of free bits in the outer type

As said in the introduction, enums make use of unused bits in types like bool to store which variant it is. But this check only runs in one direction (understandably, the other direction is more difficult to implement). But in principle Rust could make use of unused bits (from other enums or bool) in the structure the enum is stored in as well. Or to be more precise: When looking at the type that contains an option that didn't manage to fit the discriminant in somewhere Rust could look in this containing type:

In the output below Option<Container<Large>> wastes 1000 bytes because of not using the padding bytes, while it could reduce that to wasting about 500 bytes by recognizing that Option and Container don't have the 512 alignment restriction. The fix for this is of course to add a dummy bool in Container, which provides 7 free bits for Option to use. Something the Compiler could do by looking at Option<Container<T>> as one data structure (assuming that most of the functions (especially small ones) will be inlined anyways and thus don't result in a significant (or any) code size overhead.

The second case is more extreme: When swapping Option and Container Container<Option<Large>> the Option doesn't know about the free padding bytes in Container. Even with ContainerWithDummy<Option<Large>> the free bits in Container are unused because Rust looks at Option<Large> in isolation and (see my first point) doesn't combine the structs to make use of the padding. Thus resulting in a size of 3x512 instead of 2x512.

The only way I've found to avoid this is to write a a custom Option that combines the two: CustomOption<Large>, which is really painful from a usability perspective.

Finally I'd like to mention why this is even relevant to think about. Such large alignments are rare, so I'm not sure if doing this would really be worth the implementation and maintenance effort. On the other hand this situation does happen quite often:

  • The enums below showcase that a decent amount of bytes can get wasted with some common situations like storing an Option inside an enum variant, worse if the variant contains some data and an option (MyEnumB).
  • Another situation where this becomes relevant is in a struct with multiple optional fields:
    struct Test {
        a: Option<u32>,
        b: Option<u32>,
        c: Option<u32>,
        d: Option<u32>,
    }
    
    This struct is 32 bytes large and would fit into 20 bytes (16 bytes for the u32 + one byte for the option bits + 3 bytes alignment padding).

All of this of course becomes less relevant the smaller the alignment requirement of the inner type is.

use std::mem::{size_of, align_of};
use std::any::type_name;

#[repr(align(512))]
pub struct Large(pub [u8; 512]);

pub struct Container<T> {
    pub value: T,
    pub extra: u64,
}
pub struct ContainerWithDummy<T> {
    pub value: T,
    pub extra: u64,
    pub dummy: bool,
}

pub enum MyEnumA {
    A(Option<Large>),
    B,
}
pub enum MyEnumB {
    A(Option<Large>, bool),
    B,
}
pub enum MyEnumC {
    A(Option<(Large, bool)>),
    B,
}

pub enum CustomOption<T> {
    Some{
        value: T,
        extra: u64,
        dummy: bool,
    },
    None{
        extra: u64,
        dummy: bool,
    },
}

pub fn main() {
    println!("Options");
    print::<Option<(u64, u8)>>();
    print::<Option<(u64, bool)>>();
    print::<Option<(u64, u8, bool)>>();

    println!("Types in use:");
    print::<Large>();
    print::<Option<Large>>();
    print::<Container<Large>>();
    
    println!("Extra and dummy only exist when Some");
    print::<Option<Container<Large>>>();
    print::<Option<ContainerWithDummy<Large>>>();

    println!("Extra and dummy always exist");
    print::<Container<Option<Large>>>();
    print::<ContainerWithDummy<Option<Large>>>();
    print::<CustomOption<Large>>();

    println!();
    println!("Enum Usecase (these differ in behavior)");
    print::<MyEnumA>();
    print::<MyEnumB>();
    print::<MyEnumC>();
}

fn print<T>() {
    println!(
        "    {}: {}/{}", 
        type_name::<T>(), 
        size_of::<T>(), 
        align_of::<T>(),
    );
}

Output:

Options
    core::option::Option<(u64, u8)>: 24/8
    core::option::Option<(u64, bool)>: 16/8
    core::option::Option<(u64, u8, bool)>: 16/8
Types in use:
    example::Large: 512/512
    core::option::Option<example::Large>: 1024/512
    example::Container<example::Large>: 1024/512
Extra and dummy only exist when Some
    core::option::Option<example::Container<example::Large>>: 1536/512
    core::option::Option<example::ContainerWithDummy<example::Large>>: 1024/512
Extra and dummy always exist
    example::Container<core::option::Option<example::Large>>: 1536/512
    example::ContainerWithDummy<core::option::Option<example::Large>>: 1536/512
    example::CustomOption<example::Large>: 1024/512

Enum Usecase (these differ in behavior)
    example::MyEnumA: 1024/512
    example::MyEnumB: 1536/512
    example::MyEnumC: 1024/512

Final words

I'm not sure if it makes sense to have these optimizations (especially in terms of compiler complexity), but I am interested in what you think about this (or if I'm missing something here besides the difficulty to implement this).

I think you have the misconception that niche exploit unused bits, but that's not true, they exploit invalid bit patterns. For example a NonZeroU8 uses all its bits, but the all-zero bit pattern is invalid. In comparison however padding bytes allow any bit pattern and thus cannot be exploited for niches.

This is not the issue. The problem is that you can get a &mut (u64, u8) from that option, and then you can write to it another (u64, u8), padding bytes included, which would overwrite the Option discriminator, and that's unsound.

And making a write of (u64, u8) always avoid the padding bytes won't solve the issue, since it is currently assumed in the ecosystem that you can write a value of type T by writing std::mem::size_of::<T>() bytes, at which point the notion of which byte is a padding byte or not is totally lost. Making std::mem::size_of::<T>() ignore the trailing padding bytes is also problematic because the size of a type is currently assumed to be the same as its stride (the difference between the addresses of two consecutive elements in an array/slice, which must include the trailing padding bytes).

Making padding bytes guaranteed to be zero would enable this optimization, since the other 255 bit patterns would become invalid, but the cost would be that the compiler would have to write zeros to the padding bytes whenever it isn't sure they will be zero. It's also unclear how this would interact with unsafe code that initialized structs field-by-field, since that doesn't include zeroing the padding bytes.

Arguably a better fix would be using:

#[repr(u8)]
enum ZeroedPaddingU8 {
    Zero = 0
}

This will force the padding bytes to be 0, thus disallowing any other bit pattern. It's IMO better than bool because it carries no information, while bool does, and thus has no chance to be mistaken with some kind of flag.

That's because Option<Large> actually must work in isolation. How would taking references to those fields work otherwise?

4 Likes

This is the crux of the issue that creates this counter-intuitive wastage.

Another counter-intuitive example is ((((u32, u8), u8), u8), u8). Without this size requirement it would only take 8 bytes, but it takes 20 bytes.

4 Likes

True, I haven't thought about taking references to the option and passing that to some other code. That wouldn't be inlined and thus would prevent this, so I guess this could only be done if no reference to the option exists after optimizations, which is basically impossible to ensure so this would only be useful if there where no references to the option field(s) at all. As that other code unfortunately wouldn't be aware of the structure the option is in.

i think this is an application for move-only fields, where you can't take a reference to the field, you can only read it or assign to it.

1 Like