[Idea] Pointer to Field

Actually, the trait should probably insist that the implementation diverge (safely) upon overflow, as even if the address obtained by a wrapped offset is well-defined, it’s almost certainly not pointing towards anything useful.

This may be a stupid question, but:

ptr::offset is UB when using it on a bogus pointer because it provides an inbounds assertion to enable optimization. ptr::wrapping_offset is safe because it doesn’t require remaining inbounds of the allocation, and is defined to wrap at pointer size boundaries.

Is it possible to define ptr-to-member such that it’s always safe (i.e. it doesn’t give an inbounds assertion unless it’s already inferred (e.g. you use the ptr)) but it’s overflow is poison rather than undef or wrapping? You’ll get a bogus value, but you were going to get a bogus value anyway.

And honestly, I’d personally prefer it to use wrapping_offset instead of offset unless it can be shown that the latter is demonstrably better in some use case. My intuition suggests inbounds is trivially inferred for ptr-to-member whenever it’s actually useful for optimization.

I am not that well versed in unsafe code and codegen, so I will defer to someone else

I think diverging on overflow fits the bill. If the offset overflows, the calling code never gets its hands on the invalid pointer (or we assume it doesn’t); if it doesn’t, then we can be assured that the pointer that comes out the other end and into usage by code makes sense if the original one did.

Does offset generate better code due to the inbounds assertion? If so, I am inclined to keep it, but if that seems like too much overhead (a whole new trait just to special case raw pointers) then it could be changed to use wrapping_offset.

wrapping_offset claims offset gives better optimization and offset says (paraphrased) “if it’s hard to guarantee this, use wrapping_offset instead”.

I think the ideal case would be project works on both safe types and pointers safely, and project_inbounds is an unsafe operation that provides the inbounds assumption. (Modulo naming, of course.)

So project : wrapping_offset :: project_inbounds : offset.

3 Likes

Ok, that looks reasonable. I think we will put project_inbounds on raw pointers directly, and not expose them to Project

@RalfJung Does this api look sound to you?

I tried to add some implementations of my own, and maybe it is better to provide the Project trait with an additional method fn safe_project(...) that is actually safe to call, instead of an external wrapper for it. then, the usefulness of having unsafe on the trait just for the default impl is open to critique, though, so maybe this is not the way.

See (added Ref, RefMut and Pin): https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=b542bc2b1ee075df8321363a50756f3c

So, some things have changed while you were hacking away. Due to @CAD97’s comments on raw pointers I have decided to change to only have a safe Project trait. For raw pointers it will use ptr::wrapping_add instead of ptr::add. So I think that clears up that issue.

That sounds like a good idea. Definitely a clear api with a single trait and a safe method for most cases.

And supposedly the unsafe projection on Pin<_> would be left out, or get the same treatment of an unsafe extra method. May be too easy to misuse in any case?

What about pointers to ?Sized types, can they be fit in here as well? One of the most annoying things to deal with on stable are adding dynamically sized structs and not having good ways to even initialize their fields manually. I suppose it's a bit of a challenge, but maybe?

Pin<P<_>> can always use ptr::offset (the unsafe version, or just do it with references directly) safely as it always points to a real object. (Pin<*[const|mut] _> is a valid type IIRC but not an actual pin.)

That said, pin projection is a whole mess of its own. It’s only safe to project the pin iff you only ever project with the pin.

So would this be sound for Pin

impl<'a, T: ?Sized, FTy: 'a> Project<FTy> for Pin<&'a mut T> {
    type Type = T;
    type Projection = Pin<&'a mut FTy>;
    
    fn project(self, field: Field<T, FTy>) -> Self::Projection {
        unsafe {
            Pin::map_unchecked_mut(
                self,
                |x| x.project(field)
            )
        }
    }
}

Or are the Unpin bounds required?

I think it’s fully sound. (Though the generic version should be generic over Project I think? I’m not sure.) When you have a Pin<P<T>>, you can’t move the whole type. If you only have access to &Field or Pin<&mut Field>>, it’s sound.

1 Like

I don’t think we could write the generic version, because we don’t have a way to directly get the pointer inside of Pin<P<T>>.

Can I ask why wrapping offset rather than an overflow-checking offset? If an offset calculation overflows, the resulting offsetted pointer is almost certainly not pointing at anything useful in relation to the base pointer.

Because pointers offer offset and wrapping_offset (and add/wrapping_add that take usize), but don’t offer a checked version. Casting to a usize to do the math is less correct, as it makes LLVM lose track of pointer-metadata.

And if the offset overflows, then the original pointer wasn’t pointing at anything useful either, so the only useful data would be the difference between the two values, which given wrapping math, is still correct.

1 Like

When you have a Pin<P<T>> , you can’t move the whole type. If you only have access to &Field or Pin<&mut Field>> , it’s sound.

It's only sound if T cooperates, as the type could move the field in Drop. That's why in general, projecting to fields needs extreme care and is sadly not as simple as not moving as a whole and only indexing structurally, only if T: Unpin that is no problem :frowning:

One more idea for syntax, that also encompasses enum variants: somehow reuse patterns with a slight modification, such as in:

// Foo::member
Foo { member: *, .. }
// Foo::0
Foo(*, ..) 
// <no equivalent syntax>
Bar::Variant(*)

I dont think this is totally unwieldy, the best case would be that most explicit creation is hidden behind syntax for generic pointer traversal—such as proposed in the other thread—where the compiler creates them implicitely (e.g. ptr~member). And for many other cases you note the type itself and the member only once. This can also never collide with notation for accessing methods explicitely.

Pointer-to-enum-member seems like a pretty bad idea, because x.*y can now fail; this seems deeply questionable to me. In general I am suspicious of the value of doing pointer arithmetic on enums, given that they have no C equivalent.

Regarding unsafety of pointer offsets: making x.y always safe for a pointer x seems like an actively bad idea, since we lose out on optimization opportunities and make alias analysis extremely sad (alias analysis is one of the most important sources of optimizations; in particular, it helps with vectorization). In particular, we actively want the following to be UB:

((&raw 0i32) as *const (i32, i32)).1

If this is not UB, then LLVM must assume that x.y aliases with every pointer!

Pointers are not numbers, address spaces are not usize. The fact that Rust allows this cast is irrelevant: LLVM treats pointers as far more complicated objects than just a 48-bit address.

6 Likes