Missed layout optimization

struct Foo {
    a: u16,
    b: u8,
}

struct Bar {
    a: u16,
    b: u8,
    c: bool,
}

Option<Foo> has 6 bytes while Option<Bar> has only 4 bytes.

Option could store its discriminant in the spare byte of Foo. Is there anything preventing this optimization?

2 Likes

Padding bytes cannot contain niches, because:

Suppose you have an &mut Foo pointing at the value of an Option<Foo>. It's allowed to assign to this Foo by copying mem::size_of::<Foo>() bytes (4 bytes), which would then overwrite the discriminant if it were stored there.

In general, only byte values that cannot be a part of a valid value of Foo can be used as a niche in Foo. Padding bytes are allowed to be anything, so they cannot make the value invalid, so the Option can't rely on the Foo user not writing the None-niche-value to them.

Bar gets optimized because the c byte has two valid values (0 and 1) and 254 invalid values, so Option gets to pick any one of the invalid values for its use, and rely on it not being overwritten by writing a valid bool because a valid bool can never have anything but 0 or 1 there.

17 Likes

OK yes, thank you, this makes it impossible.

I'll additionally note that the optimization would be possible if size didn't have to be a multiple of alignment, in which case the size of Foo would be 3 and Option<Foo> would be 4. It's what my intuition says should happen here, but the reference already says that size is always a multiple of alignment, so I guess that's a no-go.

I guess one solution for this could be an attribute #[repr(zeroed_padding)], then None can use a non-zero value in padding position.

3 Likes

If you want, you can make zeroed "padding" yourself:

#[repr(u8)]
enum Zp { Zp };

struct Foo {
    a: u16,
    b: u8,
    _zp: Zp,
}

[Edited to correct repr as discussed below.]

6 Likes

This still makes Option<Foo> have 6 bytes, you need to add another variant to make it 4 bytes.

Interesting. Maybe something like it could be added to the standard library:

// core::mem

#[repr(u8)]
pub enum Zeroed {
    Zeroed = 0
}

Or maybe if we want to hide the implementation details

// core::mem

#[repr(u8)]
enum ZeroedInner {
    Zero = 0
}

pub struct Zeroed {
    inner: ZeroedInner
}

pub const Zeroed: Zeroed = Zeroed { inner: ZeroedInner::Zero };
3 Likes

That seems like a missed optimization opportunity / bug, though.

Nevermind, see below

You need to add #[repr(u8)] to the enum, otherwise it's a ZST and naturally has no niches.

4 Likes

Fixed in my post (so there's a clean full example).

Tempting, since as I just demonstrated it's easy to get wrong. But I think that it'd be nice to wait till it can be more general:

#[repr(u8)]
enum Zero { Z }

#[repr(C)]
pub struct Zeroed<T> {
    alignment: [T; 0],
    zeroes: [Zero; core::mem::size_of::<T>()],
}

Of course, this isn't valid code yet ("generic parameters may not be used in const operations"), but I think it'd be useful to be able to write Zeroed<u16> instead of [Zero; 2]. Though, not for the case brought up in this thread, since to handle that you need to calculate the full struct layout “on paper” anyway.

1 Like

Doesn't size have to be a multiple of alignment to make arrays work?

You can define the "stride" to be the size increased until it is a multiple of the alignment, and use that number for array layout. (However, that would imply that either arrays don't have a size of element stride Ă— length, or [T; 1] is bigger than T.)

I hear that Swift has stride distinct from size, but I don't know how they handle it.

(Another place the stride concept comes up is: if you can pick the stride dynamically (rather than statically from the element type) then you can go from &[MyStruct] to a slice of any of its fields, by keeping the stride the same and applying the appropriate offset — sort of a whole-slice version of AsRef.)

3 Likes

Some previous discussion here, including notes from Swift:

Obligatory link to https://github.com/rust-lang/lang-team/blob/master/src/frequently-requested-changes.md#size--stride

5 Likes

This says:

Rust makes several guarantees that make supporting size != stride difficult in the general case. The combination of std::array::from_ref and array indexing is a stable guarantee that a pointer (or reference) to a type is convertible to a pointer to a 1-array of that type, and vice versa.

What the page doesn't mention is that this problem could be solved by having arrays have padding (stride - size) between elements rather than after elements (so there is no padding after the last array element).

So the only remaining objection is that existing code assumes stride = size.

1 Like

Technically you are right. But it seems the compiler does not facilitate this in some cases.

It's still copying Foo by moving a word then a byte instead of a dword.

Edit: So just immediately after posting this, I noticed Foo is passed into 2 registers and Bar is passed into just 1 registers, weird.

The compiler still tries its hardest to not touch padding (you can see this in C++ codegen as well), but that doesn't mean unsafe code isn't allowed to blindly copy padding.

1 Like

I figured this would be the case. But why though? What do we gain from this?

Yes, I'm aware of this. I'm just pointing out a case of missed optimization (if it is one).

I'm speculating, but one reason could be avoiding making more copies of bytes that might contain fragments of sensitive data that was previously stored in the same address(es).

Beyond the "don't leak sensitive data", you also have sanitizers/valgrind which may notice that said padding was never initialized and trigger a read-from-uninit diagnostic.