Pre-RFC: Allow array stride != size

AFAIK it's completely allowed for user code to copy a Copy type using copy_nonoverlapping(a, b, 1), which definitely (per #97712) copies the padding.

That’s exactly what I mean by user vs. compiler. Changing that to copy data size instead of stride cannot have any observable effect, because it’s undefined behavior to read that padding (before and after the change).

EDIT: If you’re talking about padding bits rather than padding bytes, yes, those get copied. But that’s consistent with the data size of bool being an entire byte even though there are a number of unused/invalid representations within that byte.

Except it's actually defined sometimes -- that's the whole point of the clarification in the linked PR.

For example, MIRI confirms that the following is perfectly well-defined:

#[repr(C, align(2))]
struct OneByteOnePaddingByte {
    x: u8,
}

fn main() {
    let a = 0xFFFF_u16;
    let mut b = 0x1111_u16;
    unsafe {
        copy_nonoverlapping(
            addr_of!(a).cast::<OneByteOnePaddingByte>(),
            addr_of_mut!(b).cast::<OneByteOnePaddingByte>(),
            1,
        );
    }
    assert_eq!(b, 0xFFFF_u16);
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3a752f06183db33ee6accdaa3949e47d

It's defined because copy_nonoverlapping does an untyped copy, and thus must copy both bytes, even though that OneByteOnePaddingByte type has a padding byte.

(If that were rewritten to be a typed copy, it would be allowed to fail.)

That’s specifically a repr(C) type, which must always have stride = data size. But you’ve come up with a non-UB example where this is observable, so I retract my previous statement; I just doubt any real code would care if this started producing a mixed value.

I just used that to avoid a conversation about layout guarantees. This is also sound, I believe:

assert!(std::mem::size_of::<(u16, u8)>() <= 16);

let a = 0xFFFFFFFF_FFFFFFFF_FFFFFFFF_FFFFFFFF_u128.to_be();
let mut b = 0x11111111_FFFFFFFF_FFFFFFFF_FFFFFFFF_u128.to_be();
unsafe {
    copy_nonoverlapping(
        addr_of!(a).cast::<(u16, u8)>(),
        addr_of_mut!(b).cast::<(u16, u8)>(),
        1,
    );
}
assert_eq!(b, 0xFFFFFFFF_FFFFFFFF_FFFFFFFF_FFFFFFFF_u128.to_be());

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a3dfe7ba44598f3bc3b1e7223ec4e5cc

Which only has the possible out of "well, it'll throw if (u16, u8) takes more than 128 bits". And it's true that there's no such guarantee de jure. It would be legal to make that tuple 136 bits, making this code panic. But I also think that nobody would use any such implementation...

I’m confused. That tuple is currently 32 bits; the proposed change would make it 24 bits of data size. (And the final assertion would indeed fail.)

Unfortunately, Miri is currently conservative w.r.t. padding. Miri currently copies padding bytes nondestructively even with typed copies (example).

I agree that the current intent is that typed copies always convert padding back to uninitialized bytes, and that ptr::copy[_nonoverlapping] are untyped copies which copy all bytes, including padding.

The latter is essentially non-negotiable, as the documentation says

Copies count * size_of::<T>() bytes from src to dst. [...] The initialization state is preserved exactly.

The code makes two u128 places, then copies the first size_of::<(u16, u8)> bytes from the former to the latter. This is observing that copying (u16, u8) is copying at least 4 bytes, because 4 bytes in the destination are getting converted from 0x11 to 0xFF.

If we made size_of::<(u16, u8)>() < 4, this code would be allowed to panic. While (4..=16).contains(size_of::<(u16, u8)>()), this code is defined to run without panicking by the documentation.

With a lower-level language such as Rust, a lot of "what the compiler does" ends up guaranteed and observable. Even "X is UB" is a guarantee that user code can rely on just like the compiler does, that X is never done.

unsafe allows user code to remove the nice handrails and observe a lot of the raw (abstract) machine state.

There is no such thing as "padding bytes" in memory. Rust's memory is untyped and is merely a (segmented) linear array of (abstract machine) bytes.

It's perfectly defined behavior to read "padding bytes;" that read gives me the value currently stored in that byte in memory. What makes it "padding" is that when doing a typed copy, the value of that byte is not preserved, and is reset back to uninitialized in the new location. But so long as no typed copies are done, the memory is just memory like any other memory and it's perfectly defined to write a value to it then read it back.

C++ can "get away" with [[no_unique_address]] overlapping subobjects because there it's "just" not allowed to use memcpy/memmove "raw" copying to write overlapping subobjects, as that would clobber the overlap. We don't have that freedom with Rust, because we consider making previously sound code unsound when using it in conjunction with a new feature not a great idea.

It's the othert way around -- we agree that size_of returns the stride. It doesn't return the object size, because there is no concept of object size today in Rust, only stride and layout. What I meant was that you weren't AFAIK allowed to write size_of::<T>() bytes to a &mut T, only to a &mut [T; 1].

Reading the code example you gave, however, it seems this is defined behavior after all. You are allowed to cast between types with/without tail padding, and at least per that PR it is not UB. So for all existing types, the size must be the same as the stride, anything else would be a backwards compatibility break. And all existing generic code may assume that the size and stride are the same, so any new types where that isn't true must not be able to be passed into existing generic APIs.

AFAIK this requires either that we don't use references, Deref, or any other existing Rust machinery, or that we use references but introduce new implicit trait bounds.

memcpying bytes like this was always unsound, in that it would result in UB on some types or values. It has defined behavior if:

  • the type of the pointee is trivially copyable (and thus can be written to via bytes).
  • the pointee is not a "potentially-overlapping subobject" of another type. This means:
    1. (true since very old C++) it is not a base class subobject
    2. (since C++20) it is not a no_unique_address member variable

So before C++20, I believe you could do the write you suggest after guarding it with some test like:

using T = decltype(*ptr);

std::is_trivially_copyable_v<T> && (
    !std::is_class_v<T> || std::is_final_v<T>
)

... since any final or non-class type cannot be a potentially overlapping subobject in C++17, because it can't be a base class.

However, there is no test or protective measure for (2): any type may be stored in a no_unique_address member variable, and it is per the standard UB to perform the write in that case.

See e.g. std::memcpy - cppreference.com which says:

If the objects are potentially-overlapping or not TriviallyCopyable, the behavior of memcpy is not specified and may be undefined.

Though I think you may be able to copy the underlying bytes of a non-trivially-copyable class if it's implicit lifetime type, so long as you copy it to a character array. I'm super unclear about this part, but see Object creation.

Yes, EBO (and no_unique_address) is the workaround C++ has because it doesn't have true ZSTs. The problem is that, as a consequence of that workaround, distinct C++ objects can live in each others' padding, and writes do not touch padding -- you can choose to think of this as meaning that, for the purpose of assignment and so on, object size != stride in C++. This does not match Rust's reference and pointer semantics, where overlapping is assumed not to exist, stride matches the size, and so on. And so it is unsound (today) to have a Rust reference to a C++ type if that type has (reusable) tail padding, even if it would otherwise be safe in every way (e.g. the C++ code could otherwise match rust semantics on aliasing, nullability, etc.)


Ah. This rules out giving [T; 1] and T the same data size for all types in the proposal, even aside from the untyped copy semantics mentioned earlier.

Even more fun, I believe [[no_unique_address]] techically allows overlapping subobjects such that one is enitrely contained within the interior padding of another.

It also rules out &mut T having write permissions over fewer than 1 * size_of::<T>() bytes, because it's perfectly allowed to ptr::copy a count of 1 into &mut. An example backing this up is the example for ptr::write, which uses copy_nonoverlapping to implement a version of mem::swap:

fn swap<T>(a: &mut T, b: &mut T) {
    unsafe {
        // Create a bitwise copy of the value at `a` in `tmp`.
        let tmp = ptr::read(a);

        // Exiting at this point (either by explicitly returning or by
        // calling a function which panics) would cause the value in `tmp` to
        // be dropped while the same value is still referenced by `a`. This
        // could trigger undefined behavior if `T` is not `Copy`.

        // Create a bitwise copy of the value at `b` in `a`.
        // This is safe because mutable references cannot alias.
        ptr::copy_nonoverlapping(b, a, 1);

        // As above, exiting here could trigger undefined behavior because
        // the same value is referenced by `a` and `b`.

        // Move `tmp` into `b`.
        ptr::write(b, tmp);

        // `tmp` has been moved (`write` takes ownership of its second argument),
        // so nothing is dropped implicitly here.
    }
}

Also relevant are swap[_nonoverlapping] (the latter has a count parameter but the former doesn't).

I couldn't find a documentation reference for a pointer function which doesn't take a count parameter stating how many bytes are written, but that's not because they're all typed copies (ptr::swap does untyped copies); it's because they're implicitly understood to be copying size_of::<T>() bytes to preserve initialization state exactly. I think the only one that is an untyped copy is ptr::swap, which reads

Swaps the values at two mutable locations of the same type, without deinitializing either.

But for the following exceptions, this function is semantically equivalent to mem::swap:

  • It operates on raw pointers instead of references. When references are available, mem::swap should be preferred.
  • The two pointed-to values may overlap. If the values do overlap, then the overlapping region of memory from x will be used. This is demonstrated in the second example below.
  • The operation is “untyped” in the sense that data may be uninitialized or otherwise violate the requirements of T. The initialization state is preserved exactly.

... actually, I found a quote, from the std::ptr docs,

Many functions in this module take raw pointers as arguments and read from or write to them. For this to be safe, these pointers must be valid. Whether a pointer is valid depends on the operation it is used for (read or write), and the extent of the memory that is accessed (i.e., how many bytes are read/written). Most functions use *mut T and *const T to access only a single value, in which case the documentation omits the size and implicitly assumes it to be size_of::<T>() bytes.

1 Like

Hmm; I think it depends on how you define 'padding'. Two sibling subobjects have to occupy disjoint bytes:

[intro.object]:

Two objects with overlapping lifetimes that are not bit-fields may have the same address if one is nested within the other, or if at least one is a subobject of zero size and they are of different types; otherwise, they have distinct addresses and occupy disjoint bytes of storage.

However, the bytes don't have to be contiguous:

An object of trivially copyable or standard-layout type ([basic.types.general]) shall occupy contiguous bytes of storage.

So if one subobject's type is not trivially copyable or standard-layout, then you could take the bytes that would otherwise be considered 'interior padding' and say that they're not actually part of the object, making that object non-contiguous, and freeing those bytes to be used for another subobject. In fact, this can happen even without [[no_unique_address]]! …I think.

1 Like

Instead of (or in addition to) letting the compiler pick a more compact representation for slices, would it be useful to add a second type parameter Stride=T to [T], so that you can field project slices?

We’d want to add some compiler support to allow an operation like this without the need for unsafe:

/// offset is the relative location of a U within the
/// memory allocation of a T
unsafe fn project_slice<T,U,S>(&[T, Stride=S], offset=usize)->&[U, Stride=S];

unsafe fn project_slice_mut<T,U,S>(&mut [T, Stride=S], offset=usize)->&mut [U, Stride=S];

An alternative would be to add a separate strided slice type that stores the stride inside the metadata in addition to the length, and add some way to get a [T strided] from [T] (perhaps a coercion or a Borrow implementation). In an edition, we could switch the [T] syntax from the packed slice type to the one with a dynamic stride, and provide an alternative type name for the current packed slice.

Why do we need to have different size and stride? If we talk about "1-byte holes" in nested structs, they can be situated not only at the tail. For example:

struct Nested { x: i32; y: u8; z: u16; };
struct Big { n: Nested; flag: bool; }

The best way for field flag is between y and z.

In your specific example without a repr attribute, I believe the compiler will actually reorder the fields of Nested as x,z,y, leaving the padding element at the tail.

However, more generally that's true; some types will have padding in the middle (e.g. a pair of types that have tail-padding). A more general solution should support types that don't own all the bytes in the span 0..size.

So, compiler by default shuffle fields to put them tightly with empty space at the end, isnt it? Yes, in this case we can expect, that the best empty space is situated at the end. Anyway, why do user need difference between size and stride? Do anybody expect, that offset of field flag will be greater or equal size of struct Big? Or do somebody expect that adding some filed will strictly increase size of struct? I think, it is necessary only in C-bindings. So I think, information about empty space at the end can bu the internal compiler info:

struct Nested { x: i32; y: u8; z: u16; }; // size=8 bytes
struct Big { n: Nested; flag: bool; } // size is still 8 bytes

The problem is that given a &mut Nested the current expectation is to be able to write std::mem::size_of::<Nested>() bytes to it, which include the padding bytes. However with your approach flag is placed in n's padding byte, which mean you would end up writing garbage/invalid values to it, which is UB!

And you could argue that you could make Nested's size equal to 7 as to not include the padding byte where flag is placed, but another expectation is that elements of slices are placed at an offset multiple of the element size. But if Nested's size is 7 then the second element of a [Nested] would end up at offset 7, which is not aligned, and thus UB again.

5 Likes