Niche optimization makes use of niches, but padding bytes are neglected

Try playing around with these types:
(Playground)

#[repr(C)]
struct MyStruct<H, const N: usize, T> {
    head: H,
    body: [u8; N],
    tail: T,
}

#[repr(u8)]
enum U8SingleVariant {
    OnlyVariant,
}

fn main() {
    show_size!(header); // see macro definition in playground
    show_size!(U8SingleVariant);
    show_size!(MyStruct<u8, 6, u8>);
    show_size!(MyStruct<u8, 6, NonZeroU8>);
    show_size!(MyStruct<u8, 6, U8SingleVariant>);
    show_size!(MyStruct<u8, 6, ()>);
    // ...
}

We got:

Type                                  T    Option<T>
U8SingleVariant                       1    1
MyStruct<u8, 6, u8>                   8    9
MyStruct<u8, 6, NonZeroU8>            8    8
MyStruct<NonZeroU8, 6, u8>            8    8
MyStruct<(), 7, NonZeroU8>            8    8
MyStruct<NonZeroU8, 7, ()>            8    8
MyStruct<u8, 6, U8SingleVariant>      8    8
MyStruct<U8SingleVariant, 6, u8>      8    8
MyStruct<u8, 6, ()>                   7    8
MyStruct<u16, 5, u8>                  8   10
MyStruct<u16, 5, NonZeroU8>           8    8
MyStruct<u16, 5, U8SingleVariant>     8    8
MyStruct<u16, 5, ()>                  8   10 // <- doesn't look smart
MyStruct<u16, 4, NonZeroU16>          8    8
MyStruct<u16, 4, U8SingleVariant>     8    8
MyStruct<u16, 4, u8>                  8   10 // <- doesn't look smart
MyStruct<u16, 4, ()>                  6    8
MyStruct<u64, 7, NonZeroU8>          16   16
MyStruct<u64, 7, U8SingleVariant>    16   16
MyStruct<u64, 7, ()>                 16   24 // <- doesn't look smart

Can niche optimization be enhanced to make use of padding bytes in the future?


P.S. There's something else that could theoretically have its layout automated, basically:

enum Category0 {
    Variant0 = 0,
    Variant1 = 1,
}
enum Category1 { // valid patterns of `Category0` & `Category1` do not overlap
    Variant2 = 2,
    Variant3 = 3,
}
enum MyEnum { // make sizeof::<MyEnum>() == 1
    Category0(Category0),
    Category1(Category1),
}

But I guess this might be too heavy.

On the other hand, making use of padding bytes doesn't seem too complicated.

It can't because padding bytes can be overwritten:

12 Likes

I suppose that it might be possible to add a #[repr(...)] that would force a type's padding bytes to be zeroed at all times. That adds a bunch of illegal states that the niche optimization could take advantage of to store a discriminant.

I'm sure that there's a tradeoff for this choice somewhere, which may make this more trouble than it's worth to implement.

7 Likes

Another possibility is to allow size != stride, so that size doesn't have to be a multiple of alignment (Separate size and stride for types · Issue #1397 · rust-lang/rfcs · GitHub). This would allow MyStruct<u16, 4, u8> to have size 7 (if you remove the #[repr(C)]) and Option<MyStruct<u16, 4, u8>> would have size 8.

2 Likes

That one is unlikely to happen.

2 Likes

Some arguments in that FAQ seem wrong:

The combination of std::array::from_ref and array indexing is a stable guarantee that a pointer (or reference) to a type is convertible to a pointer to a 1-array of that type, and vice versa.

But [T; 1] wouldn't need padding at the end for the same reason T wouldn't, so they would be same size.

In general, the size of [T; n] would be n * sizeof(T) + (n-1) * padding (except when n=0).

Unsafe code may also assume that overwriting trailing padding is allowed, which would conflict with the repurposing of such padding for data storage.

This seems wrong too: there would never be a need for any trailing padding. Padding would only be needed between struct or array elements.

1 Like

I don't think that's going to happen, because it's a pretty drastic change to what it means to use a type, and it would make a whole bunch of plausible proc macros no longer sound.

If someone wants a niche in a type, I think it's better to represent that by adding an actual field.

Once can make a

#[repr(u8)] enum ZeroByte { Zero = 0 }

and add a field of that type, and the niche optimizations will start to happen.

Then by it being a real field, the "this byte needs to be zero" property is a normal consequence of an ordinary validity invariant.

(And obligatory mention of Pre-pre-RFC: syntactic sugar for `Default::default()` - #75 by ekuber that would let such a field be defaulted and thus less annoying to use.)

2 Likes

The downside to that approach is there's no way to apply it where padding would be in generic types, e.g.:

struct Foo<T> {
    header: u16,
    value: T,
}

There's no way to make Option<Foo<u8>> fit in 4 bytes[1] without making Foo<u16> no longer fit in 4 bytes[2].

I'm not sure if that's important enough to implement such a new repr (the few times I wanted that, I've been able to find a workaround that optimizes well enough), but that is a case where adding a field won't suffice.


  1. Even if you assume rustc takes the optimization when it can, which IME it usually does, but there's no guarantee ↩︎

  2. You probably could work for u8 and u16 specifically with enough trait hackery, but there's no way to make it generic for the layout of any type argument ↩︎

7 Likes

if we loosen the requirements for constant expressions, we could do _zeroed_padding: [ZeroByte; 2.saturating_sub(mem::size_of::<T>())]

Yeah, for those two specific sizes, but that won't work for Foo<u32> (which has 2 padding bytes from u32's higher alignment).

Something like:

const fn padding_bytes(align: usize, u16_size: usize, t_size: usize) -> usize {
    let size_without_padding = u16_size + t_size;
    if size_without_padding % align == 0 {
        0
    } else {
         size_without_padding % align
    }
}

struct Foo<T> {
    header: u16,
    value: T,
    padding: [ZeroByte; padding_bytes(mem::size_of::<u16>(), mem::size_of::<T>(), mem::align_of::<Self>())]
}

would work fully generally[1], but I doubt we'll ever be able to reference Self's layout in the definition of Self because of all the ways you could break things through circular references.


  1. Assuming we can trust rustc to arrange the fields in the optimal order, which I think it does, but they definitely don't promise. ↩︎

1 Like

that seems wayy overkill, in 90% of cases a single padding byte should be enough.

a better way to do this might be an attribute macro that constructs a constant expression finding the maximum alignment of any of a struct's fields.

If this can be emulated by adding fields, this could be automated by a proc macro that would receive a type without such zero bytes and add them as needed. And in principle this macro could be provided by the stdlib. Then the compiler could also have a deny by default lint against any access to those fields (or even make it a hard error)

At this point this is pretty much identical to just providing this feature in the language itself.

There's a big difference whether the field is actually there.

Today, you can initialize a struct correctly by initializing all its fields in a MaybeUninit, then call assume_init. And a proc macro can emit that code, as it sees all those fields syntactically.

It has massive semantic implications to change the rules to add new types that need more than that.

8 Likes

Adding an unnecessary zero byte to create a niche for an enum is a hacky suboptimal solution to an artificial problem: an enum can add a byte for a discriminant all by itself when necessary, and when not in an enum, that byte is unnecessary overhead. So I don't think this should have a special built-in language support.

I think a clearly better solution is to not round structs up to a multiple of alignment, i.e. mem::size_ofmem::stride_of. Unfortunately the documentation states that mem::size_of is the stride, so it would be a breaking change.

2 Likes

Just having size < stride would be kind of lackluster. In general you'd want types that own only a subset of the bytes within their stride.

View types could be a more general solution to that, but for these layout optimizations you would just need move-only fields, a way for a type to opt-out of taking references to the field.

Unfortunately this wouldn't immediately help Option<T>, since it does allow references to the inner T, but you could define a move-only wrapper to at least optimize Option<Move<T>>:

struct Move<T>(#[move_only] T);

Because the field is move-only, its contents can be packed arbitrarily, e.g. by removing padding. (There are certainly tradeoffs with how much packing is worthwhile.)

1 Like