[Idea] Pointer to Field

RustyYato · May 8, 2019, 6:59pm

This discussion was started from

But is a bit off topic there, so I started a new thread for it

This is the third edit of this draft, to see previous versions check out the pencil in the top corner

In order to make safe pointer to fields

struct Field<Parent, Type> {
    offset: usize
}

which is auto-generated by the compiler like so

struct Foo {
    bar: u32,
    quaz: i8
}

fn main() {
    assert_eq!(Foo::bar, Field::<Foo, u32> { offset: 0 });
    assert_eq!(Foo::quaz, Field::<Foo, i8> { offset: 4 }); // (assuming fields aren't reordered by compiler)
}

The syntax for this not important (i.e. Foo::bar could be changed to something else)

Another idea for syntax is Type.field, this has the benefit of being unambigious

We could then declare it safe to use these fields offsets to project valid pointers to fields like so we could provide something like this

// Always safe to project using this trait given that `self` and `field` are valid
trait Project<FTy> {
    type Type;
    type Projection;
    
    fn project(self, field: Field<Self::Type, FTy>) -> Self::Projection;
}

impl<T, FTy> *const T {
    unsafe fn project_inbounds<FTy>(self, field: Field<T, FTy>) -> *const FTy {
        (self as *const _ as *const u8)
            .add(field.offset) as *const FTy
    }
}

impl<T, FTy> Project<FTy> for *const T {
    type Type = T;
    type Projection = *const FTy;
    
    fn project(self, field: Field<T, FTy>) -> Self::Projection {
        (self as *const _ as *const u8)
            .wrapping_add(field.offset) as *const FTy
    }
}

impl<'a, T, FTy: 'a> Project<FTy> for &'a T {
    type Type = T;
    type Projection = &'a FTy;
    
    fn project(self, field: Field<T, FTy>) -> Self::Projection {
        unsafe { &*(self as *const _).project_inbounds(field) }
    }
}

Which could then be used generically to project to a field.

playground example:

HeroicKatora · May 8, 2019, 7:42pm

Not really. Projection on pointers must be unsafe since they do not assert that the structure to project upon is actually pointed to. But maybe the converse could be true, one trait that exposes it as an unsafe fn and another subtrait that declares it safe.

Why would Field need to be a trait? It's an internal so I think a const enabled struct could suffice. That would buy us stricter typing over usize and a value representation that one would be used-to from T::*U. Or alternative, it could be a trait with an associated constant with such a type:

/// Internal, const constructed by the syntax `Foo::bar` or equivalent.
struct Field<Parent, Child> {
    offset: usize,
}

trait Project<T> {
    type Type;
    type Projection;
    fn project(self, field: Field<Self::Type, T>) -> Self::Project;
}

This is more like what went through my head previously. But Type appearing as associated type in the trait locks Self into choosing one base type, this is most likely a plus.

mcy · May 8, 2019, 7:53pm

I suspect that you need an unsafe variant of Project, since projecting through a raw pointer is always unsafe.

I do wonder if we should introduce bounds like T: ?unsafe Trait, which allows the impl of Trait for T to use unsafe fns whre safe fns are required. (By generalization, T: const Trait, requiring the implementation to use const fns.)

RustyYato · May 8, 2019, 8:22pm

Ok, I wasn't aware of that. The sub-trait solution does seem acceptable though.

I was thinking that you could use it to make some generic libraries that operate on the fields of types. But if that is not necessary or wouldn't work, that's fine. I wrote this up as a draft, so I am expecting changes.

HeroicKatora:

/// Internal, const constructed by the syntax `Foo::bar` or equivalent.
struct Field<Parent, Child> {
    offset: usize,
}

trait Project<T> {
    type Type;
    type Projection;
    fn project(self, field: Field<Self::Type, T>) -> Self::Project;
}

How would you impl Project? For example could you do it for &T?

RustyYato · May 8, 2019, 8:24pm

I've seen this in a few places, but I don't understand why this is the case, could explain it please? I think that projecting through a raw pointer to a field should be safe, at least the docs for offset seems to say that it is safe to offset a pointer as long as you stay within the size of the pointee + 1 byte. This should always be true for field offsets.

HeroicKatora · May 8, 2019, 8:49pm

By providing the basic pointer method on the Field type.

// Forgot: this type should likely be Clone + Copy
impl<P, C> Field<P, C> {
    /// Or even an intrinsic internally? Don't need to know or care
    ///
    /// Some probably decenty long list of preconditions, ...
    pub unsafe fn offset_ptr(self, ptr: *const P) -> *const C {
        let ptr = ptr as *const u8;
        let field = ptr.offset(F::OFFSET);
        field as *const F::Type
    }
}

impl<'a, T: ?Sized> Project<U> for &'a T {
    type Type = T;
    type Projection = &'a U;
    fn project(self, field: Field<T, U>) -> Self::Projection {
        // SAFETY: some text about how this fulfills the preconditions.
        unsafe { &*field.offset_ptr(self) }
    }
}

Part of the reason for doing it this way is that no stability guarantees about OFFSET itself are made. Only about the pointer operation, under a number of preconditions that are additionally listed at the definition of offset_ptr. And it doesn't suggest that doing manual byte pointer offsetting is a good idea, because really it is never unless you have to yourself and it can not be put behind a core implementation.

HeroicKatora · May 8, 2019, 8:52pm

I can also explain, the pointer could be such that the pointer to field has no representation (i.e. overflows addressable memory). A pointer does not need to guarantee that an object of the size of its referent could be placed at its location.

(A bit more even, I think it is technically undefined behaviour if you offset a pointer that was constructed from a reference such that it points outside the region where the original one comes from–and I don't know if you are even allowed to have others. Pointers are allowed to track their underlying reference in llvm. Also, answer likely depends on resolution of the unsafe guidelines. This one in particular is interesting)

RustyYato · May 8, 2019, 8:52pm

Ok, that seems like a good idea, I like the idea of hiding the actual offset. That does seem like a good idea.

eaglgenes101 · May 8, 2019, 8:54pm

You can throw a pointer absolutely anywhere, and it’s considered safe. Only actually dereferencing it is considered unsafe.

// Creates a perfectly well-formed pointer, which just happens to point into the middle of nowhere
let x: *const char = 0x0FE203FE203FE203_usize as *const char;

So no need to have an unsafe version for raw pointers. No dereferencing is going on in this suggestion, just arithmetic on addresses.

RustyYato · May 8, 2019, 8:59pm

Instead of an intrinsic, we could simply provide an impl of Project for *const T/*mut T and have everyone else base their Project off of that.

/// Projections may be unsafe, don't use this directly in generic code
trait Project<T> {
    type Type;
    type Projection;
    
    unsafe fn project(self, field: Field<Self::Type, T>) -> Self::Projection;
}

/// Projection is always safe
unsafe trait SafeProjection<T>: Project<T> {}

impl<F, T> Project<F> for *const T {
    type Type = T;
    type Projection = *const F;
    
    unsafe fn project(self, field: Field<T, F>) -> *const F {
        (self as *const u8).offset(field.offset) as *const F
    }
}

impl<'a, T, F> Project<F> for &'a T {
    type Type = T;
    type Projection = &'a F;
    
    unsafe fn project(self, field: Field<T, F>) -> &'a F {
        &*(self as *const T).project(field)
    }
}

unsafe impl<T, F> SafeProjection<F> for &'a T {}

matt1985 · May 8, 2019, 9:00pm

With const-generics,a trait to get the offset of a field could be defined like this(requiring no new syntax):

unsafe trait FieldOffset<const NAME:&'static str> {
    /// Type of the field
    type Type;
    /// The offset of the field inside the parent
    const OFFSET: usize;
}


struct Foo<'a>{
    bar:u32,
    baz:&'a str
}

// Compiler defined
impl<'a> FieldOffset<"bar"> for Foo<'a>{
    type Type=u32;
    const OFFSET=0;
}

// Compiler defined
impl<'a> FieldOffset<"baz"> for Foo<'a>{
    type Type=&'a str;
    const OFFSET=4;
}

RustyYato · May 8, 2019, 9:01pm

I’m a bit worried about using stringly typed things, especially for unsafe code.

matt1985 · May 8, 2019, 9:08pm

Why?Those impls are compiler generated,an if you mistype the strings it would just fail to compile.

RustyYato · May 8, 2019, 9:09pm

These are just normal traits, so anyone could implement them. Also, strings don’t work well with IDEs, so, I would prefer not to use them.

More importantly, this would make fields names part of the public interface for all types. This is bad. With syntax, we could fields being accessed where they are not allowed to be accessed. (maintaining privacy rules)

matt1985 · May 8, 2019, 9:14pm

With syntax, we could fields being accessed where they are not allowed to be accessed. (maintaining privacy rules)

Ah ok,hadn't though about how this feature interacts with privacy.

RustyYato · May 8, 2019, 9:36pm

I have made some updates to the original post. I split Project into an unsafe/safe versions and changed over to using a compiler generated type instead of a trait.

eaglgenes101 · May 8, 2019, 9:57pm

I’m still not clear on why this is unsafe on raw pointers. What unsoundness could result from this if Project and UnsafeProject were separated? From what I can tell, getting a raw pointer to field ends up just boiling down to a bit of arithmetic on addresses and a typecast (to another raw pointer), neither of which are considered unsafe.

RustyYato · May 8, 2019, 9:59pm

Basically we want to use ptr::offset, but offsetting past usize::max_value() is UB, and we can create a raw pointer anywhere, which means that it isn't safe to project raw pointers in general.

The offset being in bounds cannot rely on "wrapping around" the address space. That is, the infinite-precision sum, in bytes must fit in a usize.

eaglgenes101 · May 8, 2019, 10:02pm

I was under the impression that raw pointer offsetting followed usize overflow rules in either diverging or performing two’s compliment wraparound. Is this just incidental behavior, or is it documented as an intentional decision somewhere?

RustyYato · May 8, 2019, 10:03pm

This is likely incidental behavior, because it is stated to be UB in the docs, you can use wrapping_offset if you want that behavior

Topic		Replies	Views
Need for -> operator for Unsafe Code Guidelines	65	6271	September 2, 2019
Feature: Allow pattern-matching of a pointer language design	7	1856	July 10, 2021
Computing raw pointers to fields Unsafe Code Guidelines	3	4478	December 22, 2024
Raw pointer field projection	5	1521	May 29, 2021
Pre-Pre-RFC: Field offsets language design	7	1754	March 25, 2019

[Idea] Pointer to Field

Related topics