Missed layout optimization

tczajka · December 17, 2023, 9:40pm

struct Foo {
    a: u16,
    b: u8,
}

struct Bar {
    a: u16,
    b: u8,
    c: bool,
}

Option<Foo> has 6 bytes while Option<Bar> has only 4 bytes.

Option could store its discriminant in the spare byte of Foo. Is there anything preventing this optimization?

kpreid · December 17, 2023, 9:52pm

Padding bytes cannot contain niches, because:

Suppose you have an &mut Foo pointing at the value of an Option<Foo>. It's allowed to assign to this Foo by copying mem::size_of::<Foo>() bytes (4 bytes), which would then overwrite the discriminant if it were stored there.

In general, only byte values that cannot be a part of a valid value of Foo can be used as a niche in Foo. Padding bytes are allowed to be anything, so they cannot make the value invalid, so the Option can't rely on the Foo user not writing the None-niche-value to them.

Bar gets optimized because the c byte has two valid values (0 and 1) and 254 invalid values, so Option gets to pick any one of the invalid values for its use, and rely on it not being overwritten by writing a valid bool because a valid bool can never have anything but 0 or 1 there.

tczajka · December 17, 2023, 10:01pm

OK yes, thank you, this makes it impossible.

I'll additionally note that the optimization would be possible if size didn't have to be a multiple of alignment, in which case the size of Foo would be 3 and Option<Foo> would be 4. It's what my intuition says should happen here, but the reference already says that size is always a multiple of alignment, so I guess that's a no-go.

pitaj · December 17, 2023, 10:07pm

I guess one solution for this could be an attribute #[repr(zeroed_padding)], then None can use a non-zero value in padding position.

kpreid · December 17, 2023, 10:15pm

If you want, you can make zeroed "padding" yourself:

#[repr(u8)]
enum Zp { Zp };

struct Foo {
    a: u16,
    b: u8,
    _zp: Zp,
}

[Edited to correct repr as discussed below.]

tczajka · December 17, 2023, 10:17pm

This still makes Option<Foo> have 6 bytes, you need to add another variant to make it 4 bytes.

pitaj · December 17, 2023, 10:19pm

Interesting. Maybe something like it could be added to the standard library:

// core::mem

#[repr(u8)]
pub enum Zeroed {
    Zeroed = 0
}

Or maybe if we want to hide the implementation details

// core::mem

#[repr(u8)]
enum ZeroedInner {
    Zero = 0
}

pub struct Zeroed {
    inner: ZeroedInner
}

pub const Zeroed: Zeroed = Zeroed { inner: ZeroedInner::Zero };

pitaj · December 17, 2023, 10:20pm

~~That seems like a missed optimization opportunity / bug, though.~~

Nevermind, see below

jdahlstrom · December 17, 2023, 10:21pm

You need to add #[repr(u8)] to the enum, otherwise it's a ZST and naturally has no niches.

kpreid · December 18, 2023, 12:13am

Fixed in my post (so there's a clean full example).

Tempting, since as I just demonstrated it's easy to get wrong. But I think that it'd be nice to wait till it can be more general:

#[repr(u8)]
enum Zero { Z }

#[repr(C)]
pub struct Zeroed<T> {
    alignment: [T; 0],
    zeroes: [Zero; core::mem::size_of::<T>()],
}

Of course, this isn't valid code yet ("generic parameters may not be used in const operations"), but I think it'd be useful to be able to write Zeroed<u16> instead of [Zero; 2]. Though, not for the case brought up in this thread, since to handle that you need to calculate the full struct layout “on paper” anyway.

user16251 · December 18, 2023, 3:01am

Doesn't size have to be a multiple of alignment to make arrays work?

kpreid · December 18, 2023, 3:06am

You can define the "stride" to be the size increased until it is a multiple of the alignment, and use that number for array layout. (However, that would imply that either arrays don't have a size of element stride × length, or [T; 1] is bigger than T.)

I hear that Swift has stride distinct from size, but I don't know how they handle it.

(Another place the stride concept comes up is: if you can pick the stride dynamically (rather than statically from the element type) then you can go from &[MyStruct] to a slice of any of its fields, by keeping the stride the same and applying the appropriate offset — sort of a whole-slice version of AsRef.)

jrose · December 18, 2023, 4:53am

Some previous discussion here, including notes from Swift:

scottmcm · December 18, 2023, 5:32am

Obligatory link to https://github.com/rust-lang/lang-team/blob/master/src/frequently-requested-changes.md#size--stride

tczajka · December 18, 2023, 9:01am

This says:

Rust makes several guarantees that make supporting size != stride difficult in the general case. The combination of std::array::from_ref and array indexing is a stable guarantee that a pointer (or reference) to a type is convertible to a pointer to a 1-array of that type, and vice versa.

What the page doesn't mention is that this problem could be solved by having arrays have padding (stride - size) between elements rather than after elements (so there is no padding after the last array element).

So the only remaining objection is that existing code assumes stride = size.

zirconium-n · December 18, 2023, 9:13am

Technically you are right. But it seems the compiler does not facilitate this in some cases.

It's still copying Foo by moving a word then a byte instead of a dword.

Edit: So just immediately after posting this, I noticed Foo is passed into 2 registers and Bar is passed into just 1 registers, weird.

mathstuf · December 18, 2023, 11:44am

The compiler still tries its hardest to not touch padding (you can see this in C++ codegen as well), but that doesn't mean unsafe code isn't allowed to blindly copy padding.

zirconium-n · December 18, 2023, 11:57am

I figured this would be the case. But why though? What do we gain from this?

Yes, I'm aware of this. I'm just pointing out a case of missed optimization (if it is one).

jdahlstrom · December 18, 2023, 12:51pm

I'm speculating, but one reason could be avoiding making more copies of bytes that might contain fragments of sensitive data that was previously stored in the same address(es).

mathstuf · December 18, 2023, 1:40pm

Beyond the "don't leak sensitive data", you also have sanitizers/valgrind which may notice that said padding was never initialized and trigger a read-from-uninit diagnostic.

Topic		Replies	Views
Store Option discriminant in Containing type (optimization) compiler	6	626	May 9, 2025
Niche optimization makes use of niches, but padding bytes are neglected compiler	15	528	October 22, 2024
Could we make the pointer niche bigger?	23	2398	July 10, 2022
Pre-RFC - Add alignment niches for references language design	19	2996	March 10, 2022
Missed niche optimization on unions language design	30	2415	May 19, 2022

Missed layout optimization

Related topics