Pre-RFC: Allow array stride != size

(Posted for: Separate size and stride for types · Issue #1397 · rust-lang/rfcs · GitHub)

[Pre-RFC] Allow stride != size

Summary

Rust should allow for values to be placed at the next aligned position after the previous value, ignoring the tail padding of that previous field. This requires changing the meaning of "size", so that a value's size in memory (for the purpose of reference semantics and layout) is not definitionally the same as the distance between consecutive values of that type (its "stride").

Motivation

Some other languages (C++ and Swift, in particular) can lay out values more compactly than Rust conventionally can, leading to better performance at greater convenience, and less than ideal Rust interoperability.

Optimization opportunity

Consider the difference between (u16, u8, u8) and ((u16, u8), u8). The first can fit in 4 bytes, while the second requires 6. A (u16, u8) is a 4 byte value with 1 byte of tail padding. And a (T, u8) can't just stuff the u8 inside the tail padding for T! If, instead, we declared that (u16, u8) were a 3 byte value with alignment 2, then ((u16, u8), u8) could be 4 bytes instead of 6. This is not possible today.

(For backwards compatibility reasons described later, we can't literally do this for tuples, but only for user-defined types. But this gives the gist of the optimization opportunity this proposal supports.)

By inventing the concept of a "data size", which doesn't need to be a multiple of the alignment, we can allow fields in specially-designed types to be packed closer together than they would be today, saving space. This is similar to the performance benefits of #[repr(packed)], but safer: all values would still be correctly aligned, just placed more closely together.

This optimization has already been implemented in other programming languages. Swift applies this to every type and every field: a type's size excludes tail padding, and a neighboring value can be laid out immediately next to it when stored in the same type, with no padding between the two. In C++, the optimization automatically applies to base classes ("EBO", the Empty Base Optimization), and is opt-in on fields via the [[no_unique_address]] attribute.

For example, here's an example in Swift and in C++. These types are compact! Rust does not work like this today.

Interoperability with C++ and Swift

(Note that the author works on C++ interop, Swift is mentioned for completeness.)

In fact, exactly because this optimization is already implemented in other languages, those languages are theoretically not as compatible with Rust as they are with each other. In C++ and Swift, writing to a pointer or reference does not write to neighboring fields. But if that pointer or reference were passed to Rust, and you used any Rust facility to write to it -- whether it were vanilla assignment or ptr::write -- Rust could overwrite that neighboring field. Because the use of this optimization is pervasive in both Swift and C++, interoperating with these languages is difficult to do safely.

Concretely, consider the following C++ struct:

struct MyStruct {
    [[no_unique_address]] T1 x;
    [[no_unique_address]] T2 y;
    ...
};

Which is equivalent to this Swift struct:

struct MyStruct {
    let x: T1
    let y: T2
    ...
}

If you are working with cross-language interop, and obtain in Rust a &mut T1 which refers to x, and a &mut T2 which refers to y, it may be immediately UB, because these references can overlap in Rust: y may be located inside what Rust would consider the tail padding of the T1 reference.

For the same reason, even if you avoid aliasing, if you obtain a &mut T1 for x, and then write to it, it may partially overwrite y with garbage data, causing unexpected or undefined behavior down the line.

This also cannot be avoided by forbidding the use of MyStruct: even if you do not directly use it from Rust, from the point of view of Swift and C++, it is just a normal struct, and Swift and C++ codebases can freely pass around references and pointers to its interior. Someone passing a reference to a T1 may have no idea whether it came from MyStruct (unsafe to pass to Rust) or an array (safe). You would need to ban (or correctly handle) any C++ and Swift type which can have tail padding, in case that padding contains another object.

(To add insult to injury, the struct MyStruct itself -- not just references to fields inside it -- cannot be represented directly as so in Rust, either.)

And anyway, such structs are unavoidable. In Swift, this is the default behavior, and pervasive. In C++, [[no_unique_address]] is permitted to be used pervasively in the standard library, and it is impractical to only interoperate with C++ codebases that avoid the standard library.

In order for C++ and Swift pointers/references to be safely representable in Rust as mut references, a &mut T1 would need to exclude the tail padding, which means that Rust would need to separate out the concept of a type's interior size from its array stride. And in order to represent MyStruct in Rust, we would need a way to use the same layout rules that are available in these other languages.

Explanation

(I haven't separated this out to guide-level vs reference-level -- this is a pre-RFC! Also, all names TBD.)

As a quick summary, the proposal is to introduce the following new traits, functions, and attributes, and behaviors:

  • std::mem::data_size_of<T>(), returning the size but not necessarily rounded to alignment / not necessarily the same as stride.
  • In the memory model, pointers and references only refer to data_size_of::<T>() bytes.
  • AlignSized, a trait for types where the data size and stride are the same.
  • #[repr(compact)], to mark a type as not implementing AlignSized, and thus having a potentially smaller data size.
  • #[compact], to mark a field as laid out using the data size instead of the stride.

Data size vs stride

Semantically, Rust types would gain a new kind of size: "data size". This is the size of the type, minus the tail padding. In fact, it's in some sense the "true" size of the type: array stride is the data size rounded up to alignment.

Data size would be exposed via a new function std::mem::data_size_of::<T>(); array stride continues to be returned by std::mem::size_of::<T>().

The semantics of a write (e.g. via ptr::write, mem::swap, or assignment) are to only write "data size" number of bytes, and a &T or &mut T would only refer to "data size" number of bytes for the purpose of provenance and aliasing semantics. (&[T; 1], in contrast, continues to refer to size_of::<T>() bytes.)

The AlignSized trait and std::array::from_ref

It is fundamentally a backwards-incompatible change to make stride and size not the same thing, because of functions like std::array::from_ref and std::slice::from_ref. The existence of these functions means that Rust guarantees that for an arbitrary generic type today, that type has identical size and stride.

This means that if we want to allow for data size and stride to be different, they must not be different for any generic type as written today. Existing code without trait bounds can call from_ref! So we must add an implicit trait bound on AlignSized : Sized, which, like Sized, guarantees that the data size and the stride are the same. This trait would be automatically implemented for all pre-existing types, which retain their current layout rules.

In other words, the following two generics are equivalent:

fn foo<T>() {}
fn foo<T: Sized + AlignSized>() {}

... and to opt out of requiring AlignSized, one must explicitly remove a trait bound:

fn foo2<T: ?AlignSized>() {}
// AlignSized requires Sized, and so this will also do it:
fn foo3<T: ?Sized>() {}

To opt out of implementing this trait, and to opt in to being placed closer to neighboring types inside a compound data structure, types can mark themselves as #[repr(compact)]. This causes the data size not to be rounded up to alignment:

#[repr(C, compact)]
struct MyCompactType(u16, u8);
// data_size_of::<MyCompactType>() == 3
// size_of::<MyCompactType>() == 4

Taking advantage of non-AlignSized types with #[compact] storage

If a field is marked #[compact], then the next field is placed after the data size of that field, not after the stride. (These can only differ for a non-AlignSized type.) This provides easy control, and provides compatibility with C++, where this behavior can be configured per-field.

It is an error to apply this attribute on non-#[repr(C)] types.

#[repr(C, compact)]
struct MyCompactType(u16, u8);

#[repr(C)]
struct S {
    #[compact]
    a: MyCompactType,  // occupies the first 3 bytes
    b: u8,             // occupies the 4th byte
}
// data_size_of::<S>() == size_of::<S>() == 4

Example

Putting everything together:

#[repr(C, compact)]
struct MyCompactType(u16, u8);
// data_size_of::<MyCompactType>() == 3
// size_of::<MyCompactType>() == 4

#[repr(C)]
struct S {
    #[compact]
    a: MyCompactType,  // occupies the first 3 bytes
    b: u8,             // occupies the 4th byte
}

// data_size_of::<S>() == size_of::<S>() == 4

We can take mut references to both fields a and b, and writes to those references will not overlap:

let mut x : S = ...;
let S {a, b} = &mut x;
*a = MyCompactType(4, 2);  // writes 3 bytes
*b = 0;  // writes 1 byte

If we had not applied the repr(compact) attribute, or had not applied the #[compact] attribute, then data_size_of<S>() would have been 6, and so would size_of<S>(). The assignment *a = ... would have (potentially) written 4 bytes.

Drawbacks

Backwards compatibility and the AlignSized trait

In order to be backwards compatible, this change requires a new implicit trait bound, applied everywhere. However, that makes this change substantially less useful. If that became the way things worked forever, then #[repr(compact)] types would be very difficult to use, as almost no generic functions would accept them. Very few functions actually need AlignSized, but every generic function would get it implicitly.

We could change this at an edition boundary: a later edition could drop the implicit AlignSized bound on all generics, and automated migration tooling could remove the implicit bound from any generic function which doesn't use the bound, and add an explicit bound for everything that does. After enough iterations, the only code with a bound on AlignSized would be code which transmutes between T and [T]/[T; 1]. Though this would be a disruptive and long migration.

Alternatively, we could simply live with repr(compact) types being difficult and usually not usable in generic code. They would still be useful in non-generic code, and in cross-language interop.

alloc::Layout

std::alloc::Layout might not work as is. Consider the following function:

fn make_c_struct() -> Layout {
    Layout::from_size_align(0, 1)?
        .extend(Layout::new::<T1>())?.0
        .extend(Layout::new::<T2>())?.0
        .pad_to_align()
}

This function was intended to return a Layout that is interchangeable with this Rust struct:

#[repr(C)]
struct S {
  x: T1,
  y: T2,
}

In order for this to continue returning the same Layout, it must work the same even if T1 is changed to be repr(compact). In other words, if Layout::new is to accept ?AlignSized types, it must use the stride as the size. The same applies to for_value*.

(Alternatively, it may be okay to reject non-AlignSized types.)

One assumes, then, that we need *_compact versions of all the layout functions, which use data size instead of stride. And then:

fn make_c_struct() -> Layout {
    Layout::from_size_align(0, 1)?
        .extend(Layout::new_compact::<T1>())?.0
        .extend(Layout::new::<T2>())?.0
        .pad_to_align()
}

Would generate the same Layout as for the following struct:

#[repr(C)]
struct S {
  #[compact] x: T1,
  y: T2,
}

Alternatively, perhaps we could introduce separated data_size and stride fields into the Layout, and have extend and extend_compact, supplementing from_size_align(stride, align) with from_data_size_stride_align(data_size, stride, align).

... but this author is very interested to hear opinions about how this should all work out.

It's yet another (implicit) size/alignment trait

There is also some desire for an Aligned trait or a DynSized trait. This would be yet another one, which may require changes throughout the Rust standard library and ecosystem to support everywhere one would ideally hope.

Rationale and alternatives

Alternative: manual layout

One could in theory do it all by hand.

User-defined padding-less references

Instead of references, one could use Pin-like smart pointer types which forbids direct writes and reads. To avoid aliasing UB, this cannot actually be Pin<&mut T> etc. -- it must be a (wrapper around a) raw pointer, as one must never actually hold a &mut T or even a &T. This must be done for all Swift or C++ types which contain (what Rust would consider) tail padding, unless it is specifically known that they are held in an array, where it's safe to use Rust references.

Something like this:

struct PadlessRefMut<'a, T>(*mut T, PhantomData<&'a mut T>);

Unfortunately, today, a generic type like PadlessRefMut is difficult to use: you cannot use it as a self type for methods, for instance, though there are workarounds.

Even there, various bits of the Rust ecosystem expect references: for instance, you can't return a PadlessRef or PadlessRefMut from an Index or IndexMut implementation. This, too, could be fixed by replacing the indexing traits (and everything else with similar APIs) with a more general trait that uses GATs... but we can see already that, at least right now, this type would be quite unpleasant.

Layout

For emulating the layout rules of Swift and C++, you could manually lay out structs (e.g. via a proc macro) and use the same Pin-like pointer type:

// instead of C++:
//     `struct Foo {[[no_unique_address]] T1 x; [[no_unique_address]] T2 y; }`
##[repr(C, align( /* max(align_of<T1>(), align_of<T2>()) */ ... ))]
struct Foo {
    // These arrays are not of size size_of<T1>() etc., but rather the same as the proposed data_size_of<T1>().
    x: [u8; SIZE_OF_T1_DATA],
    y: [u8; SIZE_OF_T2_DATA],
}

impl Foo {
    fn x_mut(&mut self) -> PadlessRefMut<'_, T1> {
        PadlessRefMut::new((&mut self.x).as_mut_ptr() as *mut _)
    }
    // etc.
}

This is especially easy to do when writing a bindings generator, since you can automatically query the other language's to find the struct layout, and automatically generate the corresponding Rust.) But otherwise, it's quite a pain -- one would hope, perhaps, for a proc macro to automate this, similar to how Rust automatically infers layout for paddingful structs and types.

Conclusion: manual layout is unpleasant

Almost nothing is impossible in Rust, including this. But it does mean virtually abandoning Rust in a practical sense: Rust's references cannot exclude tail padding, so we use raw pointers instead. Rust's layout rules cannot omit padding, and so we replace the layout algorithm with a pile of manually placed u8 arrays and manually specified alignment. And the result integrates poorly with the rest of the Rust ecosystem, where most things expect conventional references, and things that don't or can't use references are difficult to work with.

Alternative: repr(packed), but with aligned fields

We could replicate the layout of C++ and Swift structs, but make them very unsafe to use, similar to repr(packed). One would still, like repr(packed), avoid taking or using references to fields inside such structs, and these are still going to be difficult to work with as a result.

Prior art

Languages with this feature

Swift: Swift implicitly employs this layout strategy for all types and all fields. A type has three size-related properties: its "size", meaning the literal size taken up by its field, not including padding; its "stride", meaning the difference between addresses of consecutive elements in an array; and its alignment.

C++: Unlike Swift, C++ does not separate out size and stride into separate concepts. Instead, it claims that array stride and size are the same thing, as they are in Rust and C, but that objects can live inside the tail padding of other objects and that you are simply mutably aliasing into the tail padding in a way which the language defines the behavior for. C++ nominally allows this for the tail padding of all types, but only when they are stored in certain places: objects may be placed inside the tail padding of the previous object when that previous object is a subobject in the same struct (not, for instance, a separate local variable), and it is either a base class subobject (so-called "EBO"), or a [[no_unique_address]] data member ("field"). In practice, however, the compiler is free to not reuse the tail padding for some types. In the Itanium ABI, C-like structs ("POD" types, with an Itanium-ABI-specific definition of "POD") do not allow their tail padding to be reused.

Papers and blog posts

  • I worked around this in Crubit, a C++/Rust bindings generator. The design is here: crubit/unpin.md at main · google/crubit · GitHub . tl;dr: if we assume that the only source of this layout phenomenon is base classes, then only non-final classes needed to get the uncomfortable Pin-like API. Unfortunately, this does not work if [[no_unique_address]] becomes pervasive.

Unresolved questions

  • What do we do about std::alloc::Layout?
  • What's the long term future of the AlignSized bound?
  • Clearly, for compatibility reasons if nothing else, Rust types must not have reusable tail padding unless specially marked. But what about fields: should it be opt-in per field (like C++), or automatic (like Swift)? In this doc, it's assumed to be opt-in per field for repr(C) (for C++-compatibility), and automatic for repr(Rust).
  • How free should Rust be to represent fields compactly in repr(Rust) types?
  • Is repr(C) allowed to use this new layout strategy with specially marked fields using a new attribute, or do we need a new repr? The documentation is very prescriptive.
  • This is part of a family of issues with interop, where Rust reference semantics do not match other languages' reference semantics. (The other prominent member of the family is "aliasing".) Part of the reason for wanting to use Rust references is simply the raw ergonomics: generic APIs take and return &T, self types requires Deref (which requires reference-compatible semantics), etc. It is worth asking: rather than modifying references, does this cross the line to where we should instead make it more pleasant to use pointers that cannot safely deref?
  • "Language lawyering": how does this interact with existing features? For example, is a repr(transparent) type also repr(compact)? (I believe the answer should be yes.)
  • TODO: better names for everything. For example, repr(compact), "data size" and data_size_of. AlignSized especially.
  • How much of the standard library should be updated to ?AlignSized?
16 Likes

Is it at all possible to say that the last element of an array does not have tail padding, and thus [T;1] continues to be identical to T?

array::from_ref is not the only problem... there's also tons of unsafe code out there that will access size_of many bytes starting at some pointer. So adding ?AlignSized to low-level unsafe functions will require a careful re-review of that function to check which (explicit or implicit) uses of size_of should become data_size_of.

5 Likes

Note that every time this has come up -- ?Move, ?Pinned, etc -- the answer has been "we're not adding more of these".

What would an alternative look like that doesn't have the implicit trait bound?

8 Likes

Hello from Swift! (though I didn’t myself work much on this)

The two drawbacks to this I know:

  • The array last-element thing was already brought up; it’s worse in Swift because homogeneous tuples are semi-promised to have the same layout as the corresponding array, and they don’t.

  • There’s no simple guaranteed-optimal layout algorithm anymore. Rust’s current (non-contractual) layout algorithm sorts by alignment, guaranteeing no interior padding between fields (because size == stride). If you want to pack fields into inner tail padding, though, that’s no longer sufficient. Consider ((u32, u8), u16, [u8; 3]): the optimal packing here requires moving the array before the u16. (I think this comes out to one of the NP-complete problems.) Swift punted on solving this; right now it just does in-order layout no matter how bad that is.

I may come back later with Rust-specific thoughts, but thought I’d pass along the Swift part first.

15 Likes

Even if true, I don't see this as a drawback. This only creates additional optimization opportunities, even if you can't get to the optimum 100% of the time. Surely the total size would not increase compared to what it does without this freedom.

For an n-element struct, there is an O(2n n) algorithm for optimal packing: use dynamic programming to compute the optimal size for each of the 2n subsets of elements.

So for any struct that has less than say 16 elements it's fast to compute the optimal packing.

There are additional optimizations to this.

For instance, if there are several elements of the same size and alignment, this significantly reduces the number of subsets you have to consider.

If the largest alignment element happens to have size that is divisible by its alignment, it can always be placed first, and the optimization problem is reduced.

I am not sure if the general case with a very large number of elements all of different sizes and alignments is solvable optimally... maybe not (still thinking about this). But this is a rare case. You can always do some sort of approximation or local search algorithm for this case, this should get you close to the optimum.

5 Likes

Rust’s current layout algorithm would prefer ((u32, u8), u16, [u8; 3]) over ((u32, u8), [u8; 3], u16). I think I can contrive a case where more space is wasted than the current tail padding, but it would indeed be contrived. But pointing out that this is a dynamic programming problem with a low N reassures me quite a bit.

My point was that even if the algorithm doesn't improve, this change would still decrease the size of this tuple from 16 bytes to 11 bytes. So that's not really a drawback of the proposal. The second, optimal choice is 10 bytes, so if the algorithm is improved then it's even better of course.

I don't think that's possible. Even if you don't change the order, the amount of required padding can only decrease.

1 Like

More generally, you can reduce all field sizes modulo the alignment of the struct. So even with arbitrarily many fields of alignment at most A, you only have (A-1)*lb(A) "types" of field to consider. For example in a struct with align 4, two fields with equal alignment and sizes 5 and 13 can always be freely swapped, and any fields in between just shift by 8 bytes.

3 Likes

I’ve convinced myself you’re right, interior padding can only come from what would have been field tail padding if you continue to sort by alignment. And everyone else’s remarks are encouraging. So maybe it’s just the array vs. tuple / single-element array concern that carries over.

This is a really clever idea. I think it might be possible, modulo the "unsafe code already assumes otherwise" issue you bring up.

I can find no reason this would not work. The current documented guarantees about size are all really about array stride, and don't constrain us from adding a new notion of size. Any code assuming that array stride == object size was already UB as far as I know.

If this works, I'd be very happy to strip out the trait and its bounds and make it work this way instead.

Can you elaborate a bit more? Are you saying this won't work for some reason?

Unless we can make from_ref (etc.) work in some other way (e.g. as above), the only alternative is to take a leaf from Pin and make new reference and pointer types which only refer to the data, and not the full array stride. All C++ interop would only occur via such references. All #[repr(compact)] structs would be unsafe to use / take references to the inner fields (similar to #[repr(packed)]).

It's tempting to think this is not so bad, because Pin does it, but this is significantly less usable than Pin.

Unlike Pin, we can't just wrap reference types / smart pointers: references are UB to form to these values, because it would violate the aliasing rules. And so instead of one Compact<P> type, we need at least two, maybe at least four types: CompactRef<'a, T>, CompactMutRef<'a, T>, CompactPtr<T>, CompactMutPtr<T>. Also, since none of these can implement Deref (because references are UB to form), none of these can be used as a self type, and we can't make methods available via either self or Deref.

Like with Pin, they also don't compose well: any API using references isn't usable with these. Unlike Pin, they don't compose with anything: they can't even implement Deref. So, for example, what is the type of a pinned compact reference? Perhaps it is Pin<CompactMutRef<'a, T>>, but since CompactMutRef<'a, T> doesn't implement Deref, the entire Pin API is gone, and it has no methods at all -- we need to reimplement Pin, and anything similar to Pin, so that it works with our new reference types. (This isn't a corner case, but the common case in interop: the safe assumption is that a C++ reference is both pinned and to a compact / no_unique_address field.) This is true across the board. Nothing works with these references.

I am not seriously suggesting this, but if you really wanted to make this exactly as user-friendly, it probably looks like this. Allow for self : CompactRef<'_, T> to define methods which are reachable from both references and CompactRefs, and replace &self with this for all types and traits shared with C++. Replace Deref with a new DerefCompact trait everywhere possible, and replace Index with a new Index2 trait with body type Output: ?Sized; type Ref : DerefCompact<'a, Output> + 'a; fn index2(self: DerefCompact<Self>, idx: Idx) -> Self::Ref; , and so on like this.

(Something similar happens with aliasing, btw.)

From a language design perspective it will all work without any extra work, though -- these can just be library types written on top of raw pointers. So long as you never create a reference you won't get UB. Low bar though: the resulting references will be very annoying.

I like the idea of removing padding in nested structs. It'd be great to have packed enums where a 1-byte discriminant doesn't double their size due to alignment padding.

However, I'm concerned about truncating arrays. There are cases where it's useful to reinterpret arrays as bytes, and iterate over them in chunks.

Nevermind, any padding makes it UB anyway

If the arrays are fully padded, then it's safe and simple to iterate over stride-sized chunks. However, if the last element was shorter, then having a stride-sized slice at the last element would be UB. Handing of the last shorter element is cumbersome:

std::slice::from_raw([t; N].as_ptr(), N * stride_of::<T>()) // UB
// it needs to be:
std::slice::from_raw([t; N].as_ptr(), if N > 0 { (N-1) * stride_of::<T>() + size_of::<T>() } else { 0 })

and then it needs iterating via chunks rather than chunks_exact, which optimizes worse due to variable element size.

7 Likes

Pin has the exact same problem. To be precise, using Pin<&mut T> still means creating the &mut T and thus getting all of the aliasing requirements of &mut T.

The (current, implementation detail) solution is that containing UnsafeCell makes your type !Freeze, where Freeze is an auto trait that enables noalias when lowering &; containing PhantomPinned makes your type !Unpin[1], where an Unpin guarantee enables noalias when lowering &mut.

It would be fully possible to introduce a new magic auto trait "SizeAligned" which controlled whether size==align without introducing a new ?bound.

The difficulty in this approach is ensuring that old code doesn't get silently turned into UB by this change. The simplest way is of course to make the bound a default bound in current editions and add the explicit bound when migrating editions.

If the structs contain any padding, this is already unsound at best[2], assuming byte means u8. This is why bytemuck and similar safe abstractions require Pod or similar[3] traits to ensure there isn't any padding involved.

It is still an important restriction to keep in mind that

fn noop_write<T: ?Sized>(it: &mut T) {
    let len = std::mem::size_of_val(it);
    let ptr = it as *mut T as *mut u8;
    unsafe { std::ptr::copy(ptr, ptr, len); }
}

is presumably sound though, as is all of the other documentation stating that size_of is stride_of.


I haven't thought enough about the implications to say one way or the other whether decoupling size and stride is doable, desirable, or even necessary.

So long as we only are trying to solve the FFI use case where the type(s) aren't owned on the Rust stack[4], then a #[repr(packed)] solution is workable, even if not fully ideal — as far as the Rust side is aware, the structs all have align 1. You can still use &mut MyStruct and get &mut T1 from it, because they're aligned by virtue of being constructed on the side of the FFI which is fully aware of the alignment requirements.

The limitation of this design is that it becomes a soundness requirement that the Rust side never be able to hold MyStruct by value, because then it'd be able to misalign it. This is already very similar to the limitation on address-aware C++ types, which must be considered pinned as soon as their constructor is called, so potentially not overly further restrictive.

[aside] That [[no_unique_address]] permits putting one object within the trailing padding of another (for non-empty types) is new information to me. I previously thought it only worked like the empty base class optimization, allowing "logically zero sized" types to share their address with the following member. This part of [[no_unique_address]] isn't a concern for Rust, since it actually allows zero-sized types with a zero stride, whereas C++ without [[nua]] or EBO requires every (sub)object to have a distinct address. In retrospect, it seems reasonable that the wording of [[nua]] would allow overlapping any trailing padding (since empty types are exclusively trailing padding), but that didn't prevent the initial belief it only allowed removing the entire object's storage rather than exclusively the trailing. [/aside]


  1. It's somewhat likely that in the future the pinning auto trait will be an unsafe Implementation detail of some UnsafeCellMut like with Freeze, as Unpin being public and safe to implement has some "fun" implications. ↩︎

  2. Whether it's UB depends on whether the validity of &T requires T to be valid. This indirection-recursive validity requirement currently seems unlikely, making a reference to bytes invalid at their implied type only unsound. ↩︎

  3. bytemuck actually only requires the source type to be NoUninit and the latter type to be AnyBitPattern for by-move and by-ref transmutes rather than the full Pod guarantee. ↩︎

  4. This restriction is essentially required for C++ since objects are address-aware; I'm not sure if Swift objects can be address-aware, although given GC I think everything Swift works on has to be Swift-managed. Allowing safe construction on the Rust side can be done by exposing new MyStruct over FFI or by exposing construction instead of Aligned<MyStruct> which reattaches the alignment requirement (and the trailing padding to stride). ↩︎

1 Like

It’s pretty much what kornel covered about truncation. Swift doesn’t have fixed-size arrays, but you can see the discrepancy by comparing Arrays (Vec, basically) and tuples. In Swift, an Array’s storage size in bytes is always a multiple of stride, whereas a tuple gets laid out element by element and will thus omit tail padding on the last element; and this difference can be observed if you get a byte-view of each. But that’s pretty much the only time it comes up today; if you’re dealing with typed buffer pointers (roughly *const [T] or *mut [T] instead of *const [u8]), I don’t think there’s any operation that would distinguish them short of someone manually doing stride-based math. That said it might become more important if Swift adds fixed-size arrays as a type separate from tuples, because then people will want to transmute between them.

(I was thinking about this a few months ago because Swift zero-sized types have a size of 0 but a stride of 1, which exacerbates this difference in that one edge case. But Rust has truly-zero-sized types and slices already handle that, so that’s not an issue here.)

You can get the size of the array this way:

std::mem::data_size_of::<[T; N]>()

One thing we could do without breaking changes: If we added "move-only fields", where it's entirely disallowed to take their addresses, we could even pack existing types like this, with no breaking changes at all. And that's wanted for other stuff too, like bitpacking enums together.

No, it wasn't. size_of is very explicitly documented to be

the offset in bytes between successive elements in an array with that item type including alignment padding

(And, practically, it's always been that and there's no other api that would give the stride, so there's really no other option that code could have used, and pervasive assumptions like this are the kind of thing we attempt not to break because it's hard to crater -- at least if we break parsing we can get a good idea of the impact.)

Not really, it just means you need to use [MaybeUninit<u8>; N] for the chunks -- ptr::swap_nonoverlapping already does this today.

Or use copy_nonoverlapping instead of reads/writes, since ptr::copy and ptr::swap are doing untyped copies by RalfJung · Pull Request #97712 · rust-lang/rust · GitHub defined that to be ok for padding.

What's the implication for that when I take a pointer to that field in C++? Can I not write the whole sizeof(*ptr) bytes any more? Was that always UB in C++?

I guess C++ just pessimizes every read/write of every padded type in order to enable this? At least https://godbolt.org/z/M67hGdrEf appears to copy the fields individually instead of just doing one copy for the full size.

(Also, as I understand EBO, it's just because C++ doesn't have true ZSTs. It's nothing to do with stride-vs-size, because the derived type will still have size at least 1, even if also empty. Or am I misunderstanding something?)

I don’t think there’s a problem saying references—all references—are only allowed to touch the data size. The value of padding bits thus far has been unspecified, and it is UB to rely on them; this is “just” the additional guarantee that writes will not change the contents of tail padding. If you wanted to be conservative rolling this out, you could say that allocations continue to round up to the stride, so that code doing bytewise copies based on strides is less likely to stop working. But it’s definitely something that would be hard to catch if someone did use stride * count instead of mem::data_size_n::<T>(count).

I’d be curious what a crater run says about applying this change to every struct. Besides hoping for correctness, I’d also expect code size to increase.

It's UB to rely on them because it's allowed, but not required, to write to them when writing a value.

For example, typed-copying a (u64, u32, u16, u8) can just copy 128 bits at once and not worry about the fact that overwrites the padding, but because typed copies don't need to copy padding, it's ok for (u128, u8) to just copy the 136 bits that matter, and not touch the 120 bits that don't.

(Untyped copies, like ptr::copy_nonoverlapping, do need to copy the values even of padding.)


Outside of very careful unsafe code using MaybeUninit in very particular ways.

I feel like you’re conflating what the compiler is allowed to do with what user code is allowed to do. Of course today the compiler can do a single store where after this change it would have to do two. But we can change that for some or all structs if we want to. The only question is whether there’s user code that would break. (And there definitely is, everybody using size to do bytewise copies/moves, rather than doing typed copies/moves, is potentially in trouble here. But I’d kind of like to know how much.)

As far as I'm concerned, users using memcpy was blessed by RFC 1419 in 2016.

There is still perhaps a loophole out for trailing padding only (of a slice or array, and thus of a single instance), but since you don't have a guarantee of where your padding is today, you have to copy any potentially trailing padding when doing something memcpy-like on a default layout struct (or slice or array of such structs).

I.e. it's still a pretty rude breaking change, and I don't think editions help.

3 Likes