(I don’t know if this is the right category for optimization of compiled code, not of the compiler. Let me know.)
Why this matters
Reading @camlorn’s Optimizing Rust Struct Size post on struct field reordering made me think of the memory layout of enums in particular. Because of alignment constraints, we often end up reserving 4 or 8 bytes for an enum discriminant even the vast majority of enum types have much less than 256 variants (so one byte would suffice). This can be very wasteful, in particular with nested enums. As an extreme example, Option<Option<Option<Option<*const ()>>>>
is 40 bytes on 64-bit systems.
The example above is exaggerated, but this problem happens in practice. This is a real type typical of Servo’s style
crate:
pub enum LengthOrKeyword {
Length(pub enum Length {
NoCalc(pub enum NoCalcLength {
Absolute(pub enum AbsoluteLength {
Px(f32),
In(f32),
Cm(f32),
Mm(f32),
Q(f32),
Pt(f32),
Pc(f32),
}),
FontRelative(pub enum FontRelativeLength {
Em(f32),
Ex(f32),
Ch(f32),
Rem(f32),
}),
ViewportPercentage(pub enum ViewportPercentageLength {
Vw(f32),
Vh(f32),
Vmin(f32),
Vmax(f32),
}),
ServoCharacterWidth(i32),
}),
Calc(
Box<CalcLengthOrPercentage>,
pub enum AllowedLengthType {
All,
NonNegative,
}
),
}),
Keyword(pub enum SizeKeyword {
ClosestSide,
FarthestSide,
ClosestCorner,
FarthestCorner,
}),
}
(Pseudo-syntax with definition of nested enums inlined.)
On x86_64 size_of::<LengthOrKeyword>()
is 24 bytes. If alignment was not a constraint (and if I’m counting correctly) it would be 11 bytes. This is 41% of wasted space.
24 bytes by themselves may not seem like a lot, but these things add up. For example, the border-radius
property defines 8 lengths (horizontal and vertical ellipsis arc radii in each of 4 corners). And then there is PropertyDeclaration
, an enum with hundreds of (generated) variants, for every CSS property. This was several hundred bytes until we started boxing variants over a certain threshold. Boxing does not reduce total memory usage (and even costs time to allocate), but it does reduce the amount of memmove
traffic that shows up in profiles.
We sometimes “flatten” enums (LengthOrPercentageOrAutoOrContent
is another real type) but this is not practical to do pervasively.
@Gankra is currently experimenting with replacing WebRender’s serialization for inter-process communication with bincode, from transmuting to &[u8]
. Initial results show half of the space used (and therefore half the time spent sending these bytes). Padding bytes for alignment (used in memory but not by bincode) are presumably a large part of the difference.
What to do
I’ve had some optimization ideas and RFC issue #1230 More Exotic Enum Layout Optimizations has some more, but some of them are made impossible or difficult by two constraints:
- Matching can take a reference to any field of an enum’s variant, so each field needs to have the same layout (and size, and alignment) as a value of the same type created independently.
- Such a reference can be mutable, and assigned to. Assignment to
*(foo: &mut T)
presumably overwritessize_of::<T>()
bytes including any “unused” padding bytes, so these bytes can’t (easily) be used by an type that containsT
. If we change assignment to write a minimal byte range, there’s stillptr::write
. And if we change that there’s a risk that these optimizations break assumptions made by unsafe code. (Or conversely, writing unsafe code would require more care of subtle constraints.)
The idea from issue #1230 that sounds most promising to me is to try and pick discriminant values that do not overlap with values of the discriminant of inner enums, and share their storage. For example:
enum Foo {
A(u32),
B(Bar),
C,
}
If Bar
is an enum whose discriminant can be 0 to 4, the the compiler could pick 5 for Foo::A
, 6 for Foo::C
, and have 0 to 4 both indicate Foo::B
and serve as Bar
’s discriminant.
Matching Foo:B
would require a range comparison rather than simple equality. I don’t know if the impact of that on runtime perfomance is significant.
Even on layout alone, this would help but it’s not perfect. This requires the inner enum to be the first field of its variant in memory, so there may be a tradeoff to make between sharing discriminant space and reordering fields to reduce padding. (Or maybe not? I can’t find an example where nested-enum-first is worse.) Or a tradeoff with promoting the discriminant to a wider type to support larger values v.s. doing less discriminant-space sharing. (Also maybe not?) There could be more than one variant with nested enums (like NoCalcLength
above) and the ranges of these discriminants might overlap. Trying to make them not overlap would require a global analysis (looking where a type is used before deciding its layout) which at best would be heuristic past the crate boundary.
What do you all think? Other ideas?