Pre-RFC - Add alignment niches for references

Summary

Add new "alignment niches" to references (&'a T, &'a mut T), Vec<T>, and all smart pointer types, allowing better packing for enums containing these types.

Motivation

Consider the following enum:

enum E<'a> {
    A(&'a u16),
    B,
    C,
}

Currently, mem::size_of::<E>() == 16:

  • 8 bytes for the &'a u16;
  • 1 byte for the discriminant;
  • 7 bytes of padding.

However, this is suboptimal; because &'a u16 must be well-aligned (in particular, 2-aligned), there are unused bit patterns that can be exploited to pack the enum into a single usize:

  • E::A(&'a u16) stores the reference directly;
  • E::B is stored as the bit pattern 0b0;
  • E::C is stored as the bit pattern 0b1.

This RFC aims to introduces new niches to references types and smart pointer types, so that such enum optimizations become possible.

Guide-level explanation

"Pointer-like" containers (&T, &mut T, etc) gain new niches, corresponding to the scalar values strictly below align(T) and strictly above align(T).wrapping_neg(). For 1-aligned types, this is equivalent to the existing null niche.

The full list of affected types is: &T, &mut T, Box<T>, Arc<T>, Rc<T>, Vec<T>.

This has the consequence that enums of the form:

enum E {
  A(&T or &mut T or ...),
  B,
  C,
  // ...
}

where the total number of variants is less than or equal to 2*align(T) may be optimized by the compiler to fit in the space of a single reference.

Stability guarantees

Unlike null pointer optimization for Option<&T>-like types, this optimization isn't guaranteed by the Rust language.

Reference-level explanation

&T and &mut T

References must always be valid, non-null and well-aligned. This means that a reference's scalar value must lie in align(T)..=align(T).wrapping_neg(); values outside this range can be used as niches by the compiler.

Unique<T>

To enable this optimization on Box<T> and Vec<T>, the validity invariant of the std-internal type std::ptr::Unique<T> is modified.

Introduce two special values align and align_high for the attributes #[rustc_layout_scalar_valid_range_start/end] (bikesheddable syntax). These values are only applicable to types with a single generic parameter T.

  • align stands for mem::align_of::<T>();
  • align_high stands for mem::align_of::<T>().wrapping_neg().

Then, tighten the invariant on std::ptr::Unique<T> to require that the wrapped pointer is correctly aligned, and add the following attributes, enabling the alignment niches:

#[rustc_layout_scalar_valid_range_start(align)]
#[rustc_layout_scalar_valid_range_end(align_high)]

Aligned<T>

To handle other smart pointers (Arc<T>, Rc<T>), a new std-internal type std::ptr::Aligned<T> is introduced.

This type behaves mostly like NonNull<T>, but also requires that the wrapped pointer is well-aligned, enabling the use of the attributes presented in the previous section.

std implementors can then replace their use of NonNull<T> with this type to enable alignment niches.

Note: std::{rc, sync}::Weak<T> already uses a non-aligned pointer as a sentinel, and so can't benefit from this optimization.

Drawbacks

None known.

Rationale and alternatives

An alternative would be to fully exploit the unused bit patterns in well-aligned references, and say that all unaligned bit patterns can be used as a niche: e.g. this would mean that &'a u16 has a niche for 0, and for every odd integer value.

While this massively increases the number of niches, and thus the maximum number of enum variants that can be optimized, this would require adding support for non-contiguous scalar value ranges to the compiler, a difficult task.

In comparison, this RFC builds upon existing support for contiguous scalar value ranges in the compiler, and requires only minimal changes to the compiler logic.

Prior art

None known.

Unresolved questions

  • Should the Aligned<T> type be part of stdlib's public API? This would allow library authors (like hashbrown) to also exploit these niches.
  • "size niches" could also be introduced: e.g. &[u8; 100] has no alignment niches, but has size niches above 100.wrapping_neg() (thanks scottmcm for the idea!)
    • This somewhat complicates the validity invariant of Aligned<T>, is it acceptable?

Future possibilities

Combined with smarter enum packing optimizations, this RFC (in the "all unaligned values are niches" form) could allow for automatic tagged-pointer-like enum layouts in some restricted cases, e.g.

enum E<'a> {
  A(&'a u32),
  B(u16),
  C(u16),
  D(u16),
}

Could have the following layout on 32-bit targets:

     [             &'a u32               ]
E::A: ******** ******** ******** ******00
     [       u16       ]               ^^
E::B: ******** ******** < pad. > 00000001
     [       u16       ]               ^^
E::C: ******** ******** < pad. > 00000010
     [       u16       ]               ^^
E::D: ******** ******** < pad. > 00000011
                                       ^^
                                 discriminant

EDIT 1 (2021-30-11): Added high alignment niches above align(T).wrapping_neg().

EDIT 2 (2021-03-12): Added the Aligned<T> type, enabling the optimizations for more types. Clarify that these niches aren't guaranteed by the Rust language.


Thoughts?

22 Likes

Another, very similar note: we need to be very sure that niche optimization for Option-like enums chooses the 0 niche, to maintain the specification guaranteed null pointer optimization.

Adding more niches up to alignment and just assuming the lowest one is always used first isn't necessarily enough to keep this a guarantee.

9 Likes

If this is going to make niches dependent on alignment, then might as well add the high impossible values too. align.wrapping_neg() and up are also impossible because the one-past-the-end-of-the-allocated-object pointer would wrap the address space.

(This would, of course, make CAD's point about ensuring that the guaranteed option optimizations continue to work all the more important.)

4 Likes

IIRC, null-pointer optimization is only guaranteed for Option<NonNull<T>>-like types, right?

For these types, there is always a single field, so -- as long as we pick niches starting from 0 -- the proposed scheme should have null-pointer optimization as a special case.

align.wrapping_neg() is a valid reference though, because it could point to a type of size 0. (e.g. let a: &() = &*(-1 as usize as *const ()) is sound)

However, you're right about values above that; I'll add it to the document. :slight_smile:

1 Like

Oh, right, ZSTs 🤦

Hmm, I guess this could also add size niches to references. &[u8; 100] has no alignment niches (other than the null), but it has 100 niches available in 100.wrapping_neg()..=usize::MAX.

(That might make the risk of cycles in layout computation that much harder, though. niche optimizations only affect size, not alignment, so adding alignment niches might be fundamentally easier than size ones.)

3 Likes
1 Like

Wait, there is a specification?

This is not true of all CPU architectures.

That was simply an example; the proposed niche mechanism will use the correct alignment for all platforms (which is the one returned by std::mem::align_of::<T>())

10 Likes

Good - and I mixed up CPU architectures and ABIs. Of course there are more constraints in ABIs, so more reasons to use your trick.

2 Likes

I don't think we should guarantee that, especially since alignment for repr(Rust) types is not guaranteed and these niche rules are pretty complex. Null pointer optimization is only good to guarantee since it's useful for FFI.

That being said, I think that's a nice suggestion.

2 Likes

Added the Aligned<T> type, enabling the optimizations for more types.

Also added a section on stability guarantees, mentioning that these niches aren't guaranteed by the Rust language.

Opened an official RFC: RFC: Alignment niches for references types. by moulins · Pull Request #3204 · rust-lang/rfcs · GitHub

I like the idea of guaranteeing such niches, it would also make tagged pointers more reliable if someone were to need them. So overall, I support this proposal.

The only thing I strongly disagree with is special casing even more library types. Presumably Rc, Arc and Vec will always have to store a pointer to the allocation somewhere, so instead of baking these types into the language, why not generalize this mechanism in a way that it works across recusively contained types?

Also, I'm not sure if it is implied by &T/&mut T, but it seems to me that slices should also be eligible for this.

1 Like

Personally, I think that baking in optimizations for common types that most users of Rust can benefit from shouldn't be blocked by having to commit to guarantees about it occuring for every type. There should be the implementation freedom to experiment I think, without having to bake it as a language feature, adding more complexity.

Nothing in this proposal makes them any more special cased than they already are. We actually only make the null pointer optimization guarantee for &_, &mut _, ptr::NonNull<_>, and Box<_>. That it happens for other smart pointers is derived from what they're built out of. (And how Rc, Arc, and Vec are represented are library implementation details[1].)

This RFC recommends three things, and three things only:

  • Exploit the already existing validity requirement of references to be well-aligned to increase the number of niches they provide from one (null) to the non-consecutive and max-consecutive insufficiently aligned values;
  • Add a ptr::Aligned<_> type which adds the alignment validity invariant to ptr::NonNull<_>, enabling the above mentioned niches, but without requiring reference validity; and
  • Use ptr::Aligned<_> to implement std smart pointer types where appropriate, so they also can communicate the alignment requirement to the compiler such that it can be taken advantage of by layout niching.

Nothing about that makes any existing library type more special. This niche optimization is not guaranteed for any type, and is just the compiler happening to optimize your data layout better now.

[1] to be fully correct, Vec<_, GlobalAlloc> is guaranteed to be 3×usize as a (ptr, len, cap) triple, but the order isn't guaranteed. The pointer is guaranteed nonnull, but the niche optimization isn't guaranteed, it's just provided by a fact of how it's implemented.

4 Likes

Tagged pointers are used for optimization. For thay, you can rely on these niches as they are going to change in the foreseeable future. But guaranteeing them will make them impossible to change, even if, for example, we will find something more performant. Thus, I think it is only good to guarantee if people rely on it for correctness, like the non-null optimization.

1 Like

It could be controlled or guaranteed by an attribute, e.g. #[repr(bitpacked)].