Pre-Pre-RFC: Field offsets

I thought about the fact that there isn’t a good way to talk about offsets in a struct (i.e., C++'s pointer-to-member) while putting together my unnamables post. Some prior discussion can be found here.

Currently, you can’t project fields through a custom reference type, like Pin. The workaround here is unsafe { ptr.map(|x| x.field) }, which requires unsafe, since the map function could move out of the Pin. To work around this problem, we introduce the type T.U, for T, U types. Then, we’d be able to write a function like this:

fn project<'b, U>(this: &'b mut Pin<'a, T>, field: T.U) -> Pin<'b, U> {
    Pin { inner: this.inner.(field) }
}
// ..
struct Foo { x: i32 }
Pin::project(ptr, offsetof Foo.x)
Pin::project(ptr, offsetof _.x) // with type inference!

The type T.U is a field offset in the type T of type U, (i.e., an offset). This syntax is currently a parse error, so we can claim it for this use. We can use a field offset in place of a hardcoded identifier to access a field:

struct Foo { x: i32 }
let field: Foo.i32 = offsetof Foo.x; // take offset-of-field
let foo: Foo = ..;
foo.(field) // access the field pointed to by `field`

It is also acceptable to do a primitive cast to a usize, since this really is just an offset.

While Foo.x, would be symmetrical with the field access foo.x and the type name Foo.i32, whether Foo.x is “offset of” or “access field” is ambiguous relative to whether Foo is a type or a binding. I’m using the offsetof as a placeholder until we come up with a better syntax. Alternative syntax: <Foo>.x, though I don’t think the angle brackets are much better…

The .() operator is used for the same reason, since having foo.bar mean different things depending on whether a binding bar is in scope is a bad idea. Given that we need to use parens in a strange way to call a member fnptr, I think this is acceptable (see: (foo.f)(bar), since fields and methods are in different namespaces).

The syntax isn’t amazing, but it’s better than the alternatives I thought of. Analogy with C++'s T::*U would be confusing (not to mention that it’s an awful sigil) since users might expect a T::*mut U, even though mutability is meaningless for a field offset. T::&U is worse, since it insinuates there’s a lifetime involved. Granted, I think that since this is a fairly advanced feature (I have never encountered T::*U in C++ outside of a manual), it doesn’t need to be supremely ergonomic.

Edit: I wondered whether we could go further and overload field projection completely:

trait Project {
    type Target<'a, T>; // requires GAT
    fn project<T>(&'a self, offset: Self.T) -> Self::Target<'a, T>;
}

However, C++ does not allow for overloading operator., for good reason. Unless we come up with a better way to drop down to vanilla projection than unsafe { *(&foo as *const _).offset(offsetof Foo.x as isize) }, this is a bad idea. I think this is about as much of a good idea as overloading & (which C++ allows!)

We can probably do some clever things with the size of T.U, since structs very rarely have more than 255 fields. T.U will often be u8-sized, and, for structs with only one field of type U, a ZST. If there is no field of type U, T.U can be safely considered uninhabited! This allows for a rather silly function, to test if T has a field of type U:

const fn has_field_of<T, U>() { mem::size_of::<T.U>() != 0 }

Interesting idea. Some miscellaneous notes:

  1. offsetof is unreserved in the beta compiler. The unreservation goes into stable on 2018-08-02. Here is the RFC that unreserved it.

  2. For project, it would be important that T is not polymorphic, which it isn’t in your snippet. Otherwise, you could leak private information about Foo which violates privacy (and thus leads to unsoundness).

  3. The function project quantifies over U which is a type. This works well when all types in a type definition are disjoint. But what if we extend this to struct Bar { x: u8, y: u8 } ? Then we have to rely on the address to get the right place… Did you consider quantifying over field member names instead?

  4. Continuing on 3. – would it / should it be possible to get the type of a field via Type.field_name?

  5. Overloading field projection is interesting; but I think a central question is if it is overloaded for type constructors or for each type. In other words, are we extending it to &'a or to &'a Foo? If the former is the case, then we can do very interesting things given HKTs.

Indeed, I noticed that you're responsible for this. =P It's a placeholder syntax; I've also considered <Foo>.x and Foo.&x, which aren't much better, but I'm open to ideas.

Elaborate?

Here's how imagine the sizes of an offset looking: let T and U be types.

  • If T is a struct, and there is exactly one field of type U in T, then T.U is an inhabited ZST, since we know at compile time exactly what offset it refers to.
  • If T is a struct and there a multiple such fields, then T.U is the smallest integer type that can hold the largest offset needed to access a field in T (almost always a u8... who the hell has 256+ field structs?!).
  • If T is a union, T.U is a ZST if T has a field of type U (reading union.(field) is unsafe, obviously).
  • [T; n].T is as wide as the smallest integer type needed to hold n. (Unsure how we should allow for [T; n].T to be constructed... I think coercing from usize isn't an awful idea.)
  • In all other cases, T.U is uninhabited.

I think that quantifying over the names of fields is... interesting, but I like the fact that it is possible to quantify over the type. Perhaps I could imagine giving each field of a struct a unique ZST, which coerces to a typed field offset. You certainly would not be able to use such name-specific types in a polymorphic context without adding identifiers to the list of things that can exist in angle brackets (no thanks, imo).

Sure, this should be allowed. Continuing the analogy with fnptrs, you can totally do this via FnTrait::Output. Actually, it would probably make sense for T.U to implement Fn(T) -> U...

Though recall that you can't name a field offset by Type.field_name due to the parsing ambiguity I mentioned.

I don't quite understand what you're talking about here... I'd love an example!

Consider:

  1. Vec in some module A.
  2. project in some module B where A != B and defined as:
    fn project<T, U>(this: &mut T, field: T.U) -> &mut U {
        this.(field)
    }
    
  3. Vec<u8>
  4. In some module C where C != A
    fn main() {
        let mut vec = vec![0];
         // If you forbid `offsetof Vec.len` here then all should be fine:
        let len = project(&mut vec, offsetof Vec.len);
        *len += 1;
        do_stuff(vec[1]);  // Buffer overflow.
    }
    

Actually; given that you could make Vec.len not well-formed, maybe this isn't a problem.

Well reasoned :slight_smile:

This is a bit of a bummer. Does that not entail that we can't project to Option<MyEnum::Variant>? Perhaps this can be solved with variant types.

Could be solved with: Type.<field_name> ?

Very unbaked:

trait Project {
    fn project<'a, T, U>(self: Self<'a, T>, offset: T.U) -> Self<'a, U>;
}

impl Project for |'b, T: type| &'b T { // Invented syntax for type lambdas...
    fn project<'a, T, U>(self: &'a T, offset: T.U) -> &'a T { magic }
}

impl Project for |'b, T: type| &'b mut T {
    fn project<'a, T, U>(self: &'a mut T, offset: T.U) -> &'a mut T { magic }
}

impl Project for |'b, T: type| Pin<'b, T> {
    fn project<'a, T, U>(self: Pin<'b, T>, offset: T.U) -> Pin<'a, T> { ... }
}

...

No way you'll get to make offsets of fields you can't see! The callee of project should rest assured that the caller has visibility for field. Formally: offsetof T.x will compile iff you can read t.x for t: T locally. If you do something like that to a type you own... well, your funeral. I should also point out that Vec.len is definitely ill-formed, because Vec does not have kind *. You want Vec::<u8>.len.

Yeah I avoided HKTs in my example since GATs seem more developed. All you've done here is replaced my Target with Self which is... more constrained than I'd intended, but still gets the point across. I've also toyed with making Project: Deref, since overloading projection seems like something that should only be allowed with a smart pointer. Project doesn't use Deref::Target though, so it seems a bit artificial. I'm interested in what your thoughts are on the relationship between Deref and Project.

Yeah; realized that after writing it down -- at least we agree on what should not be allowed now :wink:

And it should be Vec::<u8>.len, agreed.

Not quite. There's also a difference in that you use &'a self while my variant permits you to implement it for self and Constructor<T> as well. This is necessary if we want to capture partial moves of fields in the trait. Consider:

struct Freeze<T> { field: T }

impl Project for |'b, T: type| Freeze<T> {
    fn project<'a, T, U>(self: Self<'a, T>, offset: T.U) -> Self<'a, U> {
        self.field.(offset)
    }
}

Or perhaps instead, using the family pattern and GATs:

struct FreezeFamily;

trait Project {
    type Source<'a, T>;
    type Target<'a, U>;
    fn project<'a, T, U>(src: Self::Source<'a, T>, offset: T.U)
        -> Self::Target<'a, U>;
}

impl Project for FreezeFamily {
    type Source<'a, T> = Freeze<T>;
    type Target<'a, U> = Freeze<U>;
    fn project<'a, T, U>(src: Self::Source<'a, T>, offset: T.U)
        -> Self::Target<'a, U> {
        src.field.(offset)
    }
}

I will point out that the size of T.U does leak “private” information, insofar that I can tell if a struct as more than one field of a particular type by measuring size_of::<T.U>(), but doing that would just be plain silly!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.