Repr(binned) - making padding fully-UB to read/write if you don't own the type

Yes. Our uses of Vec<Binned> would have to be Vec<(Binned,)>. We're okay with this.

Why would it be? The inner binned can't touch the outer binned's fields, nothing unsafe about this.

We do have plenty of structs that we put in vecs that would become smaller to various extents. We can't measure it (because this feature doesn't exist), but we do expect size reductions all around.

Additionally, the codegen for this:

#[repr(binned)] struct Foo(u32, u8);
struct Bar(Foo, u16);

Bar(foo, 0)

is literally free.

For sure. Vec could do it (Vec::iter_mut_binned or something), but yes, we're assuming you can control where the &mut comes from, and hoping the cases where you can, make up for the cases where you can't.

Note that any proposal requiring this is giving itself a massive penalty.

Things like that were discussed extensively as part of the futures conversations about immovable things -- in the form of ?Move -- and the takeaway I got from all those conversations is that lang wants to never have more traits like that. (Not that that means it's impossible, just that the motivation would need to be extraordinarily strong.)

5 Likes

Wait, why would this proposal require a new implicit trait? I thought it was already UB to read or write padding bytes, and only the compiler was allowed to exploit padding for efficiency. So this would just be a new hint that you want the compiler to exploit it for memory size rather than speed.

The problem is that there's existing unsafe code that makes assumptions about copying padding bytes. In particular, std::mem::swap is one of those.

While we could just say all of that is UB, that seems like a massive footgun. The trait basically solves that.

1 Like

Imagine you have a type T with some data (.) and some padding (P): [..PP.P.P].

And you have another type U(T, A=u8, B=u16). Maybe it gets binned as [..bb.a.P].

Now when you copy some other &T into &mut U.0 for example, you can't just memcpy 8 bytes because

  • You'd be overwriting U.1 and U.2
  • If your U is in some V that exploits the last pad byte, you overwrote that too
  • If the other T is part of some other U (or another binned type), you might be reading A or B data that is behind an exclusive borrow

Related.

Unlike Pin, there's no way to provide the intended semantics without language support. The thing about Pin that avoids the Move trait requirement is that it's purely advisory, and a pinned type can freely define semantics like "can technically be moved, but will lead to an abort".

std::mem::swap is part of std, so it's just one of the various rust internals that would have to be tweaked to implement this proposal - it's allowed to be privileged to know about things user code doesn't. Outside of std, using unsafe code to mess with raw bytes is always going to be a massive footgun - having an "actually you can write into padding bytes" implicit trait seems to provide only a very small amount of protection relative to how much complexity it adds. But my main point is that adding such a trait would be tangential to this proposal, because I think it is already UB for unsafe user code to write into padding bytes in all cases.

Yeah, that's the reason this would be an attribute rather than the default layout for structs. I don't see how it addresses my post that you replied to, though? User code already isn't allowed to throw around a memcpy like that without invoking UB, and the compiler and std would still be able to use the memcpy for non-binned structs. There's definitely an argument that it's not worth adding complexity to the compiler and std for this, but I really don't think it would inherently require a trait that is exposed to user code, like Sized.

I can write the equivalent of std::mem::swap though, since it's pretty much just:

fn swap<T>(x: &mut T, y: &mut T) {
    // Safety: x and y are exclusive references, so they are valid for read and writes, 
    // aligned and non overlapping
    unsafe { swap_nonoverlapping(x, y, 1) }
}

This is totally sound today, however swap_nonoverlapping is defined to swap count * size_of::<T>() bytes and that includes padding bytes, so it would become unsound if you could safely read/write to #[repr(binned)] structs. You need a way to make my function not safely callable with references to binned fields.

3 Likes

It's not guaranteed and not RFC'd, so that's technically correct. But it's still done a lot and will probably be considered defined ala that UCG topic. Things like the guaranteed layout of slices imply it's okay, as do parts of the documentation.

Oh wow, that copy_from_slice doc really does conflict with this proposal (in the absence of the implicit trait), even without more-general guarantees about padding. I retract my point.

I got more curious about it, and it turns out that copy_on_slice being memcpy was an RFC (and so e.g. not just accidental stabilization-via-documentation). Lots of conversation to that effect in the PR, too. So that's a nod for slices of Copy types, direct implication for Copy types anywhere, and an argument for Rust data structures more generally.

Fwiw slices, like all Rust types, would also decay the bins. So that particular method would still be sound in the presence of binned types. But we digress.

That doesn't seem right to me - if the Baz from your original example contained a [Bar; 1] instead of a Bar, I would hope that it would be binned the same way, but then you could take a slice reference to the Bar part and copy_from_slice into it.

Note that you don't even need the array originally -- you can use array::from_mut to get a &mut [T; 1] from a &mut T.

3 Likes

Ah, interesting. I was thinking "$Precedent for [T] implies $Precedent for T" as there's no way you're going to track what's nominally in a slice or not (or track what is a slice or not for that matter, given its guaranteed layout). But that just kills any "technically you could" argument; every T "is" in a [T; 1] and therefore in a slice already, in some sense.

1 Like

This would definitely be T: !Binned.

(Having bins decay even in arrays would make Vecs faster. Altho it also means you can't fold your fields into the array's bins. And yeah we know we talked about opting out with (T,) but meh. We don't assume we have full creative control over the feature tho, so feel free to choose your own tradeoffs.)

Oh cool, I didn't realize we had committed to that guarantee about array layout!

Meta: I appreciate that you took the time to wrote out a significant amount of text about how this would work, how it compares to other possibilities, and its advantages and drawbacks.

Though, personally I would be more interested in a way to inline struct fields, despite the disadvantages you pointed out. In my opinion they just solve more problems, including:

  • "Struct full of Options"-like use cases, where a struct has multiple enum-typed fields and their tags could be combined into a bitfield.
  • Enabling arbitrary bitfields.
  • Simulating alignment requirements with offsets, like "multiple of four plus two", for structs like:
    struct Foo {
        a: u16,
        b: (u16, u32), // would be nice if this didn't need padding
        c: u32,
    }
    

I don't see why types which are inlined would have to be fully public or why visibility would be involved at all. I also don't believe this would require monomorphizing any code more than it already is. References would be an issue, but they can be made to work to at least some extent. It's true that there may be a bit of overhead when calling non-inlined methods taking references, so binned would be better for use cases that do that frequently, but on the other hand, inlining allows for significantly more space optimizations. And the design I have in mind doesn't require adding a new default trait bound; as backwards compatibility risks go, at most it might cause issues with applying derive macros to structs that opt into their fields being inlined.

2 Likes

You could have methods like

#[inlinable]
struct Foo { ... }

impl Foo {
  fn foo(&mut self) -> &mut NonInlinable { // can't *return* references to inlinable types but that's okay
    ...
  }
}

But this involves creating synthetic methods for everything that inlines these, so e.g.

struct Bar ( #[inline] Foo );

impl Bar {
  fn bar(&mut self) {
    *self.0.foo() = something;
  }
}

would still compile, but would require generating a copy of Foo::foo specific to Bar's inlined instance of Foo. which we would argue is a form of monomorphization.

This quickly stacks up. We're not sure how fast this grows but we wouldn't be surprised if it grew exponentially. Even if, at first, it looks more efficient than our proposal, it really can't be. And if you disallow this kind of composition/delegation, it becomes practically useless in real-world use-cases.

#[inline] applies to the caller:

#[inline]
fn foo() {
}

fn bar() {
  foo()
}

so an #[inline] struct should just be a form of reverse monomorphization:

#[inline] struct Foo(...);
struct Bar(Foo);

one way to "desugar" inline structs is like so:

trait Foo {
  fn get_field_0();
  fn get_field_1();
  ...
}

impl<T: Foo> T { // this would be "given methods", similar to provided methods but not overridable
  ...
}

in fact, we'd even argue inline structs should actually use a mix of struct/enum and trait syntax:

trait struct Foo(i32, u64);
trait enum Bar<T> {
  Some(T),
  None
}

struct Baz(Foo);

fn do_foo<T: Foo>(...) {
}

impl<T: Foo> T {
  ...
}

this clearly shows the monomorphization cost involved, and is likely easier to deal with in the compiler.

still would prefer to have repr(binned) but this trait-struct-as-inline-struct stuff is also exciting.

(also note the lack of dyn! this requires 2021 edition obviously.) cc @comex

sorry, we know this one isn't as clear as the OP, but it's mostly a vague idea of how inline types could work. obviously the compiler would implicitly create types and stuff based on these, so something like:

trait struct Foo(u64, u32);
struct Bar(Foo, u32);

fn foo<T: Foo>(x: &mut T) {
}
fn bar(x: &mut Bar) {
  foo(&mut x.0);
}

would implicitly create an

struct MutFooInBar<'a> {
  a: &'a mut u64,
  b: &'a mut u32,
}

and use it for the &mut x.0.