Pre-RFC - Add alignment niches for references

moulins · November 30, 2021, 7:19pm

Summary

Add new "alignment niches" to references (&'a T, &'a mut T), Vec<T>, and all smart pointer types, allowing better packing for enums containing these types.

Motivation

Consider the following enum:

enum E<'a> {
    A(&'a u16),
    B,
    C,
}

Currently, mem::size_of::<E>() == 16:

8 bytes for the &'a u16;
1 byte for the discriminant;
7 bytes of padding.

However, this is suboptimal; because &'a u16 must be well-aligned (in particular, 2-aligned), there are unused bit patterns that can be exploited to pack the enum into a single usize:

E::A(&'a u16) stores the reference directly;
E::B is stored as the bit pattern 0b0;
E::C is stored as the bit pattern 0b1.

This RFC aims to introduces new niches to references types and smart pointer types, so that such enum optimizations become possible.

Guide-level explanation

"Pointer-like" containers (&T, &mut T, etc) gain new niches, corresponding to the scalar values strictly below align(T) and strictly above align(T).wrapping_neg(). For 1-aligned types, this is equivalent to the existing null niche.

The full list of affected types is: &T, &mut T, Box<T>, Arc<T>, Rc<T>, Vec<T>.

This has the consequence that enums of the form:

enum E {
  A(&T or &mut T or ...),
  B,
  C,
  // ...
}

where the total number of variants is less than or equal to 2*align(T) may be optimized by the compiler to fit in the space of a single reference.

Stability guarantees

Unlike null pointer optimization for Option<&T>-like types, this optimization isn't guaranteed by the Rust language.

Reference-level explanation

`&T` and `&mut T`

References must always be valid, non-null and well-aligned. This means that a reference's scalar value must lie in align(T)..=align(T).wrapping_neg(); values outside this range can be used as niches by the compiler.

`Unique<T>`

To enable this optimization on Box<T> and Vec<T>, the validity invariant of the std-internal type std::ptr::Unique<T> is modified.

Introduce two special values align and align_high for the attributes #[rustc_layout_scalar_valid_range_start/end] (bikesheddable syntax). These values are only applicable to types with a single generic parameter T.

align stands for mem::align_of::<T>();
align_high stands for mem::align_of::<T>().wrapping_neg().

Then, tighten the invariant on std::ptr::Unique<T> to require that the wrapped pointer is correctly aligned, and add the following attributes, enabling the alignment niches:

#[rustc_layout_scalar_valid_range_start(align)]
#[rustc_layout_scalar_valid_range_end(align_high)]

`Aligned<T>`

To handle other smart pointers (Arc<T>, Rc<T>), a new std-internal type std::ptr::Aligned<T> is introduced.

This type behaves mostly like NonNull<T>, but also requires that the wrapped pointer is well-aligned, enabling the use of the attributes presented in the previous section.

std implementors can then replace their use of NonNull<T> with this type to enable alignment niches.

Note: std::{rc, sync}::Weak<T> already uses a non-aligned pointer as a sentinel, and so can't benefit from this optimization.

Drawbacks

None known.

Rationale and alternatives

An alternative would be to fully exploit the unused bit patterns in well-aligned references, and say that all unaligned bit patterns can be used as a niche: e.g. this would mean that &'a u16 has a niche for 0, and for every odd integer value.

While this massively increases the number of niches, and thus the maximum number of enum variants that can be optimized, this would require adding support for non-contiguous scalar value ranges to the compiler, a difficult task.

In comparison, this RFC builds upon existing support for contiguous scalar value ranges in the compiler, and requires only minimal changes to the compiler logic.

Prior art

None known.

Unresolved questions

Should the Aligned<T> type be part of stdlib's public API? This would allow library authors (like hashbrown) to also exploit these niches.
"size niches" could also be introduced: e.g. &[u8; 100] has no alignment niches, but has size niches above 100.wrapping_neg() (thanks scottmcm for the idea!)
- This somewhat complicates the validity invariant of Aligned<T>, is it acceptable?

Future possibilities

Combined with smarter enum packing optimizations, this RFC (in the "all unaligned values are niches" form) could allow for automatic tagged-pointer-like enum layouts in some restricted cases, e.g.

enum E<'a> {
  A(&'a u32),
  B(u16),
  C(u16),
  D(u16),
}

Could have the following layout on 32-bit targets:

     [             &'a u32               ]
E::A: ******** ******** ******** ******00
     [       u16       ]               ^^
E::B: ******** ******** < pad. > 00000001
     [       u16       ]               ^^
E::C: ******** ******** < pad. > 00000010
     [       u16       ]               ^^
E::D: ******** ******** < pad. > 00000011
                                       ^^
                                 discriminant

EDIT 1 (2021-30-11): Added high alignment niches above align(T).wrapping_neg().

EDIT 2 (2021-03-12): Added the Aligned<T> type, enabling the optimizations for more types. Clarify that these niches aren't guaranteed by the Rust language.

Thoughts?

CAD97 · November 30, 2021, 7:47pm

Another, very similar note: we need to be very sure that niche optimization for Option-like enums chooses the 0 niche, to maintain the specification guaranteed null pointer optimization.

Adding more niches up to alignment and just assuming the lowest one is always used first isn't necessarily enough to keep this a guarantee.

scottmcm · November 30, 2021, 8:29pm

If this is going to make niches dependent on alignment, then might as well add the high impossible values too. align.wrapping_neg() and up are also impossible because the one-past-the-end-of-the-allocated-object pointer would wrap the address space.

(This would, of course, make CAD's point about ensuring that the guaranteed option optimizations continue to work all the more important.)

moulins · November 30, 2021, 9:10pm

IIRC, null-pointer optimization is only guaranteed for Option<NonNull<T>>-like types, right?

For these types, there is always a single field, so -- as long as we pick niches starting from 0 -- the proposed scheme should have null-pointer optimization as a special case.

moulins · November 30, 2021, 10:05pm

align.wrapping_neg() is a valid reference though, because it could point to a type of size 0. (e.g. let a: &() = &*(-1 as usize as *const ()) is sound)

However, you're right about values above that; I'll add it to the document.

scottmcm · November 30, 2021, 10:53pm

Oh, right, ZSTs

Hmm, I guess this could also add size niches to references. &[u8; 100] has no alignment niches (other than the null), but it has 100 niches available in 100.wrapping_neg()..=usize::MAX.

(That might make the risk of cycles in layout computation that much harder, though. niche optimizations only affect size, not alignment, so adding alignment niches might be fundamentally easier than size ones.)

josh · November 30, 2021, 11:13pm

github.com/rust-lang/rust

Enum should prefer discriminant zero for niche

rust-lang:master ← bonega:enum_niche_prefer_zero

opened 02:18PM - 05 Aug 21 UTC

bonega

+82 -9

Given an enum with unassigned zero-discriminant, rust should prefer it for niche… selection. Zero as discriminant for `Option<Enum>` makes it possible for LLVM to optimize resulting asm. - Eliminate branch when expected value coincides. - Use smaller instruction `test eax, eax` instead of `cmp eax, ?` - Possible interaction with zeroed memory? Example: ```rust pub enum Size { One = 1, Two = 2, Three = 3, } pub fn handle(x: Option<Size>) -> u8 { match x { None => {0} Some(size) => {size as u8} } } ``` In this case discriminant zero is available as a niche. Above example on nightly: ```asm mov eax, edi cmp al, 4 jne .LBB0_2 xor eax, eax .LBB0_2: ret ``` PR: ```asm mov eax, edi ret ``` I created this PR because I had a performance regression when I tried to use an enum to represent legal grapheme byte-length for utf8. Using an enum instead of `NonZeroU8` [here](https://github.com/bonega/yore/blob/d683304f5dfe2e99f769e6ab8adf8d60a0d1d9b3/src/internal/decoder_incomplete.rs#L90) resulted in a performance regression of about 5%. I consider this to be a somewhat realistic benchmark. Thanks to @ogoffart for pointing me in the right direction! Edit: Updated description

felix.s · December 1, 2021, 1:37pm

Wait, there is a specification?

bestouff · December 1, 2021, 3:32pm

This is not true of all CPU architectures.

moulins · December 1, 2021, 4:05pm

That was simply an example; the proposed niche mechanism will use the correct alignment for all platforms (which is the one returned by std::mem::align_of::<T>())

bestouff · December 1, 2021, 4:59pm

Good - and I mixed up CPU architectures and ABIs. Of course there are more constraints in ABIs, so more reasons to use your trick.

chrefr · December 2, 2021, 3:19pm

I don't think we should guarantee that, especially since alignment for repr(Rust) types is not guaranteed and these niche rules are pretty complex. Null pointer optimization is only good to guarantee since it's useful for FFI.

That being said, I think that's a nice suggestion.

moulins · December 3, 2021, 5:02pm

Added the Aligned<T> type, enabling the optimizations for more types.

Also added a section on stability guarantees, mentioning that these niches aren't guaranteed by the Rust language.

moulins · December 6, 2021, 9:06pm

Opened an official RFC: RFC: Alignment niches for references types. by moulins · Pull Request #3204 · rust-lang/rfcs · GitHub

H2CO3 · December 8, 2021, 8:58am

I like the idea of guaranteeing such niches, it would also make tagged pointers more reliable if someone were to need them. So overall, I support this proposal.

The only thing I strongly disagree with is special casing even more library types. Presumably Rc, Arc and Vec will always have to store a pointer to the allocation somewhere, so instead of baking these types into the language, why not generalize this mechanism in a way that it works across recusively contained types?

Also, I'm not sure if it is implied by &T/&mut T, but it seems to me that slices should also be eligible for this.

carbotaniuman · December 8, 2021, 12:31pm

Personally, I think that baking in optimizations for common types that most users of Rust can benefit from shouldn't be blocked by having to commit to guarantees about it occuring for every type. There should be the implementation freedom to experiment I think, without having to bake it as a language feature, adding more complexity.

CAD97 · December 8, 2021, 1:17pm

Nothing in this proposal makes them any more special cased than they already are. We actually only make the null pointer optimization guarantee for &_, &mut _, ptr::NonNull<_>, and Box<_>. That it happens for other smart pointers is derived from what they're built out of. (And how Rc, Arc, and Vec are represented are library implementation details^[1].)

This RFC recommends three things, and three things only:

Exploit the already existing validity requirement of references to be well-aligned to increase the number of niches they provide from one (null) to the non-consecutive and max-consecutive insufficiently aligned values;
Add a ptr::Aligned<_> type which adds the alignment validity invariant to ptr::NonNull<_>, enabling the above mentioned niches, but without requiring reference validity; and
Use ptr::Aligned<_> to implement std smart pointer types where appropriate, so they also can communicate the alignment requirement to the compiler such that it can be taken advantage of by layout niching.

Nothing about that makes any existing library type more special. This niche optimization is not guaranteed for any type, and is just the compiler happening to optimize your data layout better now.

^[1] to be fully correct, Vec<_, GlobalAlloc> is guaranteed to be 3×usize as a (ptr, len, cap) triple, but the order isn't guaranteed. The pointer is guaranteed nonnull, but the niche optimization isn't guaranteed, it's just provided by a fact of how it's implemented.

chrefr · December 9, 2021, 10:18pm

Tagged pointers are used for optimization. For thay, you can rely on these niches as they are going to change in the foreseeable future. But guaranteeing them will make them impossible to change, even if, for example, we will find something more performant. Thus, I think it is only good to guarantee if people rely on it for correctness, like the non-null optimization.

sollyucko · December 10, 2021, 11:44am

It could be controlled or guaranteed by an attribute, e.g. #[repr(bitpacked)].

system · March 10, 2022, 11:44am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Could we make the pointer niche bigger?	23	2358	July 10, 2022
Optimized representations ideas (deprecated)	5	1934	March 25, 2019
Optimizing layout of nested enums? compiler	19	12249	March 25, 2019
[Pre RFC] Prefer all-zeros niche for Niche Variant Optimization Unsafe Code Guidelines	11	1509	December 22, 2024
Niche optimization makes use of niches, but padding bytes are neglected compiler	15	517	October 22, 2024