The Idea
We've seen threads about "size != stride" and guaranteed-zero-padding. They're not good enough.
repr(packed) is nice. it's also generally UB to use references to repr(packed) structs. as such, repr(packed) is effectively an unsafe-only tool. What if we could have something similar (but very different) for safe rust, that happens to preserve alignment, yet allows much smaller structs?
To put it simple, repr(binned) would introduce the concept of "bins". They're like padding, except they're completely UB to interact with, if you don't own the type. They also kinda decay into padding when put inside a non-binned type, at least if they're not used for said type's fields.
This basically expands on "size != stride", so for example, a type like (u16, u8)
would have a size of 4, but it does have space for another u8
at the end. "size != stride" would make this a size of 3 and a stride of 4, enabling ((u16, u8), u8)
to have a size of 4 and stride of 4. But this isn't good enough, and can't really be introduced backwards-compatibly. With repr(binned) (which would NOT be the default for tuples, for backwards compatibility), this would still have a size of 4. It would also have a bin of size 1 at position 3.
The key is what happens when you start introducing more nesting:
#[repr(binned)]
struct Foo(u32, u8);
#[repr(binned)]
struct Bar(Foo, u16);
#[repr(binned)]
struct Baz(Bar, u8);
This currently has a size of 16. With binned, it would have a size of 8. To understand this, let's look at their byte-wise representation, with i
being an u32
, b
being an u8
, s
being an u16
and _
being a bin.
Foo = iiiib___
Bar = iiiib_ss
Baz = iiiibbss
Between the first and second you just have normal "size != stride": the Foo
has some space at the end, so you add a correctly-aligned u16
to it. But between Bar
and Baz
something else happens: the space is no longer at the end, and yet we can still fit a whole byte into it. The important thing here is to keep alignment.
So this is the basic idea: the bins need to be UB to read/write, because they could be part of another struct.
Unsafe Code
Of course this affects unsafe code. The only way to add this backwards compatibly is to create a new trait like Sized
, but for bins, and have it as an implicit bound everywhere. Unsafe code that wants to work with bins needs to be aware of bins, and existing unsafe code wouldn't be allowed to work with bins. There's... not much else we can say here. Ofc, if the unsafe code only works with owned types, then it should probably work with bins as well - in particular, Vec
would be sound with bins, mem::replace
wouldn't.
Performance
You might think this is really slow, yeah? After all, you need to avoid touching the padding...
But here's the thing, how often do you actually move a type into another type through a reference?
Personally, we find ourselves doing that with Vec
, but not so much with arbitrary types, and as we said above, containers like Vec
are zero-cost with bins, altho it might still be handy to wrap the types into something that makes the bins decay into padding. (This could be handled by Vec having an iter_mut_binned
which unsafely transmutes the &mut T
into &mut (T,)
or something, but we digress.) Other cases involve putting things into Option
, which, as a repr(Rust)
type, also decays the bins into padding, and thus is also zero-cost. (This also enables Option
to put its discriminant in the bins, while at it.)
All in all, we think the size savings from this would make up for the performance cost, which isn't that high to begin with, and can be trivially opted-out of if really needed. In fact we'd expect much code to even get a performance boost out of this, after some careful tuning.
Of course, applying #[repr(binned)]
to a type is a breaking change, but that doesn't mean std can't benefit from it. In particular, it's often possible to split public types into an repr(binned)
inner part, and use that inner part throughout std, while just delegating the public part into the inner part. Further, wrapping a binned type into a Rust type is always free - bins vs padding are a compile-time construct, and as long as we don't get guaranteed-zero padding, this is always gonna be exactly free. And moving a Rust type, even one that contains repr(binned)
types, is also always free.
What do you actually gain from binned types? Can't you just inline other types yourself?
It's true that you can generally inline other types into your type yourself, and this can save more memory than this repr(binned)
stuff. But repr(binned)
is as zero-cost as it can be, whereas inlining other types requires you to duplicate code (because e.g. you can't just take an &mut Url
to pass it to the Url
methods), makes it really difficult to convert between your type and the target type, among other things.
Arguably Rust could have a way to inline types, but inlinable types would have to be fully-public, including all functions and methods that interact with the inlinable type, and you wouldn't be able to take references to it and conversions would be more expensive. All in all, repr(binned)
is significantly cheaper, both in code size (can just use references, no need to monomorphize inlinable types), general move performance, and simply... user cost. repr(binned)
is much easier to use, even with the binned trait.
Sure, repr(binned)
may be somewhat niche. But it's also zero-cost. Unlike repr(packed)
or any of the alternatives.