I've been testing with alignments today and I think I found two optimizations that could be done in regards to usage of padding bytes/unused bits. If I'm missing something here please tell me.
As a reminder: Option<T>
(and enums in general) make use of empty/free bits in the types they hold, to avoid the need for an extra byte just to store which variant it is (size_of::<Option<bool>>() == 1
).
Usage of padding bytes (due to alignment requirements)
As shown in the example below it does not make use of free bits in the padding bytes needed to satisfy alignment, which makes Option<(u64, u8, bool)>
smaller than Option<(u64, u8)>
. I think/assume this is because a Copy
can/is allowed to skip those padding bytes, which would mean the information is lost. Unless I'm mistaken Rust could make use of these padding bytes by not skipping (some of) those bytes, which - due to inlining - likely results in a smaller code size a smaller memory footprint and the same performance.
See godbolt:
Adding an extra dummy: bool
(C
) makes it less efficient to copy C
but more efficient to copy MyEnum<C>
(or Option<C>
for that matter). I do understand why it is this way, but since these Copy
function are/should be almost always inlined the <Option<C>>::Copy
function doesn't have to call C::Copy
. And (at least in this simple case) using one of the alignment padding bytes works (I may be missing other places where its assumed that padding bytes can be dropped, but all of them should be working on Option<C>
).
As far as I can tell the compiler/optimizer could automatically insert dummy
if (and only if) it is handling Option<T>
, using different functions for copying the bytes.
Usage of free bits in the outer type
As said in the introduction, enums make use of unused bits in types like bool
to store which variant it is. But this check only runs in one direction (understandably, the other direction is more difficult to implement). But in principle Rust could make use of unused bits (from other enums or bool
) in the structure the enum is stored in as well. Or to be more precise: When looking at the type that contains an option that didn't manage to fit the discriminant in somewhere Rust could look in this containing type:
In the output below Option<Container<Large>>
wastes 1000 bytes because of not using the padding bytes, while it could reduce that to wasting about 500 bytes by recognizing that Option and Container don't have the 512 alignment restriction. The fix for this is of course to add a dummy bool in Container
, which provides 7 free bits for Option
to use. Something the Compiler could do by looking at Option<Container<T>>
as one data structure (assuming that most of the functions (especially small ones) will be inlined anyways and thus don't result in a significant (or any) code size overhead.
The second case is more extreme: When swapping Option and Container Container<Option<Large>>
the Option
doesn't know about the free padding bytes in Container
. Even with ContainerWithDummy<Option<Large>>
the free bits in Container
are unused because Rust looks at Option<Large>
in isolation and (see my first point) doesn't combine the structs to make use of the padding. Thus resulting in a size of 3x512 instead of 2x512.
The only way I've found to avoid this is to write a a custom Option
that combines the two: CustomOption<Large>
, which is really painful from a usability perspective.
Finally I'd like to mention why this is even relevant to think about. Such large alignments are rare, so I'm not sure if doing this would really be worth the implementation and maintenance effort. On the other hand this situation does happen quite often:
- The enums below showcase that a decent amount of bytes can get wasted with some common situations like storing an Option inside an enum variant, worse if the variant contains some data and an option (
MyEnumB
). - Another situation where this becomes relevant is in a struct with multiple optional fields:
This struct is 32 bytes large and would fit into 20 bytes (16 bytes for the u32 + one byte for the option bits + 3 bytes alignment padding).struct Test { a: Option<u32>, b: Option<u32>, c: Option<u32>, d: Option<u32>, }
All of this of course becomes less relevant the smaller the alignment requirement of the inner type is.
use std::mem::{size_of, align_of};
use std::any::type_name;
#[repr(align(512))]
pub struct Large(pub [u8; 512]);
pub struct Container<T> {
pub value: T,
pub extra: u64,
}
pub struct ContainerWithDummy<T> {
pub value: T,
pub extra: u64,
pub dummy: bool,
}
pub enum MyEnumA {
A(Option<Large>),
B,
}
pub enum MyEnumB {
A(Option<Large>, bool),
B,
}
pub enum MyEnumC {
A(Option<(Large, bool)>),
B,
}
pub enum CustomOption<T> {
Some{
value: T,
extra: u64,
dummy: bool,
},
None{
extra: u64,
dummy: bool,
},
}
pub fn main() {
println!("Options");
print::<Option<(u64, u8)>>();
print::<Option<(u64, bool)>>();
print::<Option<(u64, u8, bool)>>();
println!("Types in use:");
print::<Large>();
print::<Option<Large>>();
print::<Container<Large>>();
println!("Extra and dummy only exist when Some");
print::<Option<Container<Large>>>();
print::<Option<ContainerWithDummy<Large>>>();
println!("Extra and dummy always exist");
print::<Container<Option<Large>>>();
print::<ContainerWithDummy<Option<Large>>>();
print::<CustomOption<Large>>();
println!();
println!("Enum Usecase (these differ in behavior)");
print::<MyEnumA>();
print::<MyEnumB>();
print::<MyEnumC>();
}
fn print<T>() {
println!(
" {}: {}/{}",
type_name::<T>(),
size_of::<T>(),
align_of::<T>(),
);
}
Output:
Options
core::option::Option<(u64, u8)>: 24/8
core::option::Option<(u64, bool)>: 16/8
core::option::Option<(u64, u8, bool)>: 16/8
Types in use:
example::Large: 512/512
core::option::Option<example::Large>: 1024/512
example::Container<example::Large>: 1024/512
Extra and dummy only exist when Some
core::option::Option<example::Container<example::Large>>: 1536/512
core::option::Option<example::ContainerWithDummy<example::Large>>: 1024/512
Extra and dummy always exist
example::Container<core::option::Option<example::Large>>: 1536/512
example::ContainerWithDummy<core::option::Option<example::Large>>: 1536/512
example::CustomOption<example::Large>: 1024/512
Enum Usecase (these differ in behavior)
example::MyEnumA: 1024/512
example::MyEnumB: 1536/512
example::MyEnumC: 1024/512
Final words
I'm not sure if it makes sense to have these optimizations (especially in terms of compiler complexity), but I am interested in what you think about this (or if I'm missing something here besides the difficulty to implement this).