Repr(binned) - making padding fully-UB to read/write if you don't own the type

The Idea

We've seen threads about "size != stride" and guaranteed-zero-padding. They're not good enough.

repr(packed) is nice. it's also generally UB to use references to repr(packed) structs. as such, repr(packed) is effectively an unsafe-only tool. What if we could have something similar (but very different) for safe rust, that happens to preserve alignment, yet allows much smaller structs?

To put it simple, repr(binned) would introduce the concept of "bins". They're like padding, except they're completely UB to interact with, if you don't own the type. They also kinda decay into padding when put inside a non-binned type, at least if they're not used for said type's fields.

This basically expands on "size != stride", so for example, a type like (u16, u8) would have a size of 4, but it does have space for another u8 at the end. "size != stride" would make this a size of 3 and a stride of 4, enabling ((u16, u8), u8) to have a size of 4 and stride of 4. But this isn't good enough, and can't really be introduced backwards-compatibly. With repr(binned) (which would NOT be the default for tuples, for backwards compatibility), this would still have a size of 4. It would also have a bin of size 1 at position 3.

The key is what happens when you start introducing more nesting:

#[repr(binned)]
struct Foo(u32, u8);
#[repr(binned)]
struct Bar(Foo, u16);
#[repr(binned)]
struct Baz(Bar, u8);

This currently has a size of 16. With binned, it would have a size of 8. To understand this, let's look at their byte-wise representation, with i being an u32, b being an u8, s being an u16 and _ being a bin.

Foo = iiiib___
Bar = iiiib_ss
Baz = iiiibbss

Between the first and second you just have normal "size != stride": the Foo has some space at the end, so you add a correctly-aligned u16 to it. But between Bar and Baz something else happens: the space is no longer at the end, and yet we can still fit a whole byte into it. The important thing here is to keep alignment.

So this is the basic idea: the bins need to be UB to read/write, because they could be part of another struct.

Unsafe Code

Of course this affects unsafe code. The only way to add this backwards compatibly is to create a new trait like Sized, but for bins, and have it as an implicit bound everywhere. Unsafe code that wants to work with bins needs to be aware of bins, and existing unsafe code wouldn't be allowed to work with bins. There's... not much else we can say here. Ofc, if the unsafe code only works with owned types, then it should probably work with bins as well - in particular, Vec would be sound with bins, mem::replace wouldn't.

Performance

You might think this is really slow, yeah? After all, you need to avoid touching the padding...

But here's the thing, how often do you actually move a type into another type through a reference?

Personally, we find ourselves doing that with Vec, but not so much with arbitrary types, and as we said above, containers like Vec are zero-cost with bins, altho it might still be handy to wrap the types into something that makes the bins decay into padding. (This could be handled by Vec having an iter_mut_binned which unsafely transmutes the &mut T into &mut (T,) or something, but we digress.) Other cases involve putting things into Option, which, as a repr(Rust) type, also decays the bins into padding, and thus is also zero-cost. (This also enables Option to put its discriminant in the bins, while at it.)

All in all, we think the size savings from this would make up for the performance cost, which isn't that high to begin with, and can be trivially opted-out of if really needed. In fact we'd expect much code to even get a performance boost out of this, after some careful tuning.

Of course, applying #[repr(binned)] to a type is a breaking change, but that doesn't mean std can't benefit from it. In particular, it's often possible to split public types into an repr(binned) inner part, and use that inner part throughout std, while just delegating the public part into the inner part. Further, wrapping a binned type into a Rust type is always free - bins vs padding are a compile-time construct, and as long as we don't get guaranteed-zero padding, this is always gonna be exactly free. And moving a Rust type, even one that contains repr(binned) types, is also always free.

What do you actually gain from binned types? Can't you just inline other types yourself?

It's true that you can generally inline other types into your type yourself, and this can save more memory than this repr(binned) stuff. But repr(binned) is as zero-cost as it can be, whereas inlining other types requires you to duplicate code (because e.g. you can't just take an &mut Url to pass it to the Url methods), makes it really difficult to convert between your type and the target type, among other things.

Arguably Rust could have a way to inline types, but inlinable types would have to be fully-public, including all functions and methods that interact with the inlinable type, and you wouldn't be able to take references to it and conversions would be more expensive. All in all, repr(binned) is significantly cheaper, both in code size (can just use references, no need to monomorphize inlinable types), general move performance, and simply... user cost. repr(binned) is much easier to use, even with the binned trait.

Sure, repr(binned) may be somewhat niche. But it's also zero-cost. Unlike repr(packed) or any of the alternatives.

1 Like

Actually fairly often. I assign to &mut T quite frequently. One issue is that repr(binned) types end up with O(n) writes, n is the number of fields. For example, on x86_64, (u32, u8, u16) can be copied with one size 8 read/write, but #[repr(binned)] struct Foo(u32, u8, u16); needs at least 2 writes, but probably 3 (as x86 can't do reads of size 3).

Also, this heavily impacts the memory model of rust, as a borrow could be interrupted accross repr(binned), which currently does not occur - if a borrow of a particular kind borrows n bytes, then the same type of borrow applies to all n bytes. Consider the following safe code:

#[repr(binned)]  #[derive(Debug)]struct Foo(u32,u8);
#[repr(binned)] struct Bar(Foo, u16);

let mut f = Bar(Foo(0,1),2);
let x = &f.0;
f.1 = 3;
println!("{:?}",x);

Under the current stacked borrows model, that code has undefined behaviour. How would you propose to change the model such that it doesn't, while not undermining all of the fairly useful optimizations the model provides.

3 Likes

Oh, interesting, we only find ourselves doing that with owned types (&mut T into a Vec<T>, whole Option<T>, etc). Even if we weren't, the trivial opt-out (T,) would likely cover most other cases, thus binned types would have no performance impact for our use-cases. We also use object construction a lot which is no more costly than the status quo (simply a matter of changing a field's offset). All in all, we would expect performance improvements from this, in addition to generally lower RAM usage. (Note that being able to fit more things in cache generally improves performance as well as efficiency.)

We do notice the issues with borrowing, and that's why both reads and writes of bins through a reference to a binned type must be UB. Pointers get some slack (particularly for Vec<Binned>, etc) but if they come from a reference it's still UB. Thankfully the binned-related trait and implicit bound hopefully covers that...?

Some example codegen for this:

/*#[repr(binned)]*/ struct Tuple(u32, u8);
pub fn store(x: &mut Foo, y: Foo){
     *x = y;
}

x86_64

store:
    mov [rdi],rsi
    ret
#[repr(binned)] struct Tuple(u32, u8);
pub fn store(x: &mut Foo, y: Foo){
     *x = y;
}

x86_64

store:
     mov [rdi],esi
     rsh esi, #32
     mov [rdi+4],sil
     ret

If the compiler doesn't know where a &mut T comes from, it must assume the worst possible case.

This would mean that taking a reference to a repr(binned) field of a repr(binned) type would be ill-formed or unsafe. While this is marginally better than the case of repr(packed) (taking a reference to any field of a repr(packed) type is unsafe), I don't know that this is useful.

Do you have any actual quantifications of the expected performance or size impact?

Also note that you cannot soundly pointer-cast &mut T to &mut (T,). It may also be ill-formed to transmute ((T,) is repr(Rust) and thus not guaranteed to have the same layout as T), so this "opt-out" not only requires unsafe code, but cannot actually be done by unsafe code in general, without further relying on the unspecified layout of (T,).

Yes. Our uses of Vec<Binned> would have to be Vec<(Binned,)>. We're okay with this.

Why would it be? The inner binned can't touch the outer binned's fields, nothing unsafe about this.

We do have plenty of structs that we put in vecs that would become smaller to various extents. We can't measure it (because this feature doesn't exist), but we do expect size reductions all around.

Additionally, the codegen for this:

#[repr(binned)] struct Foo(u32, u8);
struct Bar(Foo, u16);

Bar(foo, 0)

is literally free.

For sure. Vec could do it (Vec::iter_mut_binned or something), but yes, we're assuming you can control where the &mut comes from, and hoping the cases where you can, make up for the cases where you can't.

Note that any proposal requiring this is giving itself a massive penalty.

Things like that were discussed extensively as part of the futures conversations about immovable things -- in the form of ?Move -- and the takeaway I got from all those conversations is that lang wants to never have more traits like that. (Not that that means it's impossible, just that the motivation would need to be extraordinarily strong.)

5 Likes

Wait, why would this proposal require a new implicit trait? I thought it was already UB to read or write padding bytes, and only the compiler was allowed to exploit padding for efficiency. So this would just be a new hint that you want the compiler to exploit it for memory size rather than speed.

The problem is that there's existing unsafe code that makes assumptions about copying padding bytes. In particular, std::mem::swap is one of those.

While we could just say all of that is UB, that seems like a massive footgun. The trait basically solves that.

1 Like

Imagine you have a type T with some data (.) and some padding (P): [..PP.P.P].

And you have another type U(T, A=u8, B=u16). Maybe it gets binned as [..bb.a.P].

Now when you copy some other &T into &mut U.0 for example, you can't just memcpy 8 bytes because

  • You'd be overwriting U.1 and U.2
  • If your U is in some V that exploits the last pad byte, you overwrote that too
  • If the other T is part of some other U (or another binned type), you might be reading A or B data that is behind an exclusive borrow

Related.

Unlike Pin, there's no way to provide the intended semantics without language support. The thing about Pin that avoids the Move trait requirement is that it's purely advisory, and a pinned type can freely define semantics like "can technically be moved, but will lead to an abort".

std::mem::swap is part of std, so it's just one of the various rust internals that would have to be tweaked to implement this proposal - it's allowed to be privileged to know about things user code doesn't. Outside of std, using unsafe code to mess with raw bytes is always going to be a massive footgun - having an "actually you can write into padding bytes" implicit trait seems to provide only a very small amount of protection relative to how much complexity it adds. But my main point is that adding such a trait would be tangential to this proposal, because I think it is already UB for unsafe user code to write into padding bytes in all cases.

Yeah, that's the reason this would be an attribute rather than the default layout for structs. I don't see how it addresses my post that you replied to, though? User code already isn't allowed to throw around a memcpy like that without invoking UB, and the compiler and std would still be able to use the memcpy for non-binned structs. There's definitely an argument that it's not worth adding complexity to the compiler and std for this, but I really don't think it would inherently require a trait that is exposed to user code, like Sized.

I can write the equivalent of std::mem::swap though, since it's pretty much just:

fn swap<T>(x: &mut T, y: &mut T) {
    // Safety: x and y are exclusive references, so they are valid for read and writes, 
    // aligned and non overlapping
    unsafe { swap_nonoverlapping(x, y, 1) }
}

This is totally sound today, however swap_nonoverlapping is defined to swap count * size_of::<T>() bytes and that includes padding bytes, so it would become unsound if you could safely read/write to #[repr(binned)] structs. You need a way to make my function not safely callable with references to binned fields.

3 Likes

It's not guaranteed and not RFC'd, so that's technically correct. But it's still done a lot and will probably be considered defined ala that UCG topic. Things like the guaranteed layout of slices imply it's okay, as do parts of the documentation.

Oh wow, that copy_from_slice doc really does conflict with this proposal (in the absence of the implicit trait), even without more-general guarantees about padding. I retract my point.

I got more curious about it, and it turns out that copy_on_slice being memcpy was an RFC (and so e.g. not just accidental stabilization-via-documentation). Lots of conversation to that effect in the PR, too. So that's a nod for slices of Copy types, direct implication for Copy types anywhere, and an argument for Rust data structures more generally.

Fwiw slices, like all Rust types, would also decay the bins. So that particular method would still be sound in the presence of binned types. But we digress.

That doesn't seem right to me - if the Baz from your original example contained a [Bar; 1] instead of a Bar, I would hope that it would be binned the same way, but then you could take a slice reference to the Bar part and copy_from_slice into it.

Note that you don't even need the array originally -- you can use array::from_mut to get a &mut [T; 1] from a &mut T.

3 Likes

Ah, interesting. I was thinking "$Precedent for [T] implies $Precedent for T" as there's no way you're going to track what's nominally in a slice or not (or track what is a slice or not for that matter, given its guaranteed layout). But that just kills any "technically you could" argument; every T "is" in a [T; 1] and therefore in a slice already, in some sense.

1 Like