Need for -> operator for Unsafe Code Guidelines

The first time I saw this wording in this thread I thought I was misreading, but from repetition and context here I must conclude: is this proposal really that x.y, which so far always evaluates to a place, should instead evaluate to a pointer value if x is a raw pointer?

Let me try to clarify by giving some example that highlight the difference.

  • *x.y = ...; would involve different amounts (and kinds) of memory indirection depending on the types involved -- for example, it might write to a field of the memory pointed to by x: *mut Foo, or it might through the pointer y: *mut Bar stored in the memory pointed to by x: &mut Foo.
  • Consider the idiom for swapping two fields of a struct that you own or hold a reference to: mem::swap(&mut x.field1, &mut y.field2) -- if x is a raw pointer instead, then this would rather silently swap two temporary raw pointers (which has no effect).
  • (More examples involving longer place expressions occur to me but they depend on further details, e.g., would this syntax "auto-deref" through raw pointers?

It's probably easy to tell that I picked these examples "adversarially", that is, to illustrate what I believe to be an evident flaw of the proposal as I understood it. But these differences haven't been brought up at all as far as I saw, so I wonder whether I've misunderstood?

6 Likes

Yes, that is the correct interpretation. There would never be autoderef on pointers if ptr.field evaluated to the address of the field.

*x.y would always parse as *(x.y), as it does today. So (*x).y remains what it is today, and *x.y evaluates to the same thing by being essentially *(&raw (*x).y).

I admit freely that &ptr.field could then become confusing (especially if field is pointer typed); I had not considered that problem. We could lint against it, as I doubt it would ever be useful.

I believe that it should function rather intuitively, though I have no data to back this up in my favor. The main advantage it offers over -> is keeping all pointer derefs with * and not requiring the address-of operator for an operation that shouldn’t deref (calculating pointer-to-field) (potentially on pain of UB depending on how we define things).

If -> were to be ‘pointer value’ to ‘pointer value’, that would not only be useful to unsafe code guidelines. It would also smooth out some parts of writing accessor functions. Note how only the function header changes:

struct Foo {
    bar: usize,
}

impl Foo {
    fn get(&self) -> &usize {
        self->bar
    }

    fn get_mut(&mut self) -> &mut usize {
        self->bar
    }

    unsafe fn get_ptr(*const self) -> *const usize {
        self->bar
    }

    unsafe fn get_ptr(*mut self) -> *mut usize {
        self->bar
    }
}

Although I don’t consider -> to be perfect due to obvious presumptions from C and related languages. I always found it odd how there was another operator transitioning from pointer to value semantics but it how no actual parallel of . had been created operating within pointer semantics.

Aside: Such an operator could prepare a notation without the issues that effectively postponed the delegation rfcs. Addressing the first of the remaining concerns by cramertj.

1 Like

I think the idea of -> doing anything other than “go from a pointer-to-struct to a field lvalue from that struct” would produce surprising behavior.

But I do think a pointer-to-member mechanism makes sense. This seems closely related to the ongoing desire for offsetof.

1 Like

What about Pin ed pointers? The current way to access fields of a Pined pointer is quite ugly (unsafe as well).

1 Like

Using ~ as the operator syntax for now. I do not like offsetof as the fundamental operator, because it doesn’t add any safety and unlike pointer::align_offset this unsafety is mostly not required. But I could imagine an ops:: based design for customization:

/// Encapsulates, safely, a field of `S` with type `T` 
struct Field<S, T> { .. }

impl<S, T> Field<S, T> {
    unsafe fn offset_ptr(base: *const S) -> *const T;
    fn offset(base: &S) -> &T;
    // etc. for mut
}

trait std::ops::Traverse<S, T> {
    type Output;
    unsafe fn traverse(self, field: Field<S, T>) -> Self::Output;
}

The common implementations for that trait could then be done for *const_, *mut_, but also as I would imagine on Ref (!), Pin<P>, and maybe even exotic pointer types such as Cow?

impl<S, T> Traverse<S, T> for *const S {
    type Output = *const T;
    unsafe fn traverse(self, field: Field<S, T>) -> *const T {
        field.offset_ptr(self)
    }
}

impl<'a, S, T> Traverse<S, T> for &'a S {
    type Output = &'a T;
    // Note relaxed bound, no unsafe
    fn traverse(self, field: Field<S, T>) -> &'a T {
        field.offset(self)
    }
}

impl<'a, S, T> Traverse<S, T> for Ref<'a, S> {
    type Output = Ref<'a, T>;
    fn traverse(self, field: Field<S, T>) -> Ref<'a, T>{
        self.map(|ptr| field.offset(ptr)) 
    }
}

with the usage:

struct Foobar { member: usize }
let x = RefCell::new(Foobar::default());
let r = x.borrow();
let m = r~member; // Ref<usize>

Similar for Pin but with some bounds on the impl.

This design has an additional interesting property: One could have associated constants of type Field<Self, T> to expose some struct attributes without making their names public.

Seems to me the actual problem here is the & operator not being flexible enough.

Adding &raw or even an &_ that would use the same pointer type would allow to use &_ ptr.field or &raw ptr.field to get a pointer, which avoids having to introduce a new -> operator and allows to “.” operator to have the same behavior for references and pointers.

In @HeroicKatora’s example, “&_ self.bar” would work for all the method bodies and be much more intuitive than the “self->bar” syntax.

That makes it impossible to write correct code in some cases currently. But I don't see how resolving this fixes the issue ergonomics issues around raw pointers that @Gankra and me mentioned.

That would implicitly derefernce an unsafe raw pointer, which seems rather dangerous.

Adding &raw or even an &_ that would use the same pointer type would allow to use &_ ptr.field or &raw ptr.field to get a pointer, which avoids having to introduce a new -> operator and allows to “.” operator to have the same behavior for references and pointers.

This works for all builtin examples of references. However, I do not see any way to generalize it to user types in the manner of other operators and then already we have lost the opportunity to also provide it for Ref and Pin where appropriate. The problem is that it seems any unary operator here to choose some representation for the self argument, but none of them are simulatenously safe (if we chose a pointer type) or applicable to all possible representations (already we would be missing lifetimes if we didn't choose pointers). Thus a value self seems the only possibility for such an operator but it can not possibly take the targetted field by value, and as @RalfJung notes this is implicitely unsafe.

There is no real ambiguity for the compiler or backwards incompatible change, sure. But humans' interpretation of x.y now has to vary greatly in novel and confusing ways depending on subtle (and possibly quite far-removed) type differences. The case without dereferencing (foo(x.y) passing a projection of x instead of copying or moving a value out of its field, as the syntax suggests) is equally confusing, IMO. I do not want to sound dismissive but I feel like the only way this can be "intuitive" is if one's intuition does not incoporate the distinctions between values and places, or pointers and pointed-at values.

References to raw pointers can definitely occur (when generic code that handles &Ts is instantiated with T being a raw pointer). But in any case, that such a completely reasonable looking expression is so misleading that it needs to be linted against IMO illustrates how this proposal goes against the grain of the language.

And quite frankly, it seems to be trying to solve a non-issue. There's already multiple perfectly serviceable proposals for solving projections through raw pointers that don't have this confusion, and they're more general and orthogonal as they aren't tailored to field accesses exclusively (e.g., you could combine &raw mut with hypothetical indexing and slicing of *mut [T], if we ever add it).

It's perfectly clear how to define evaluation of place expressions such that intermediate derefs don't imply anything wrong that causes UB (e.g., a claim of validity for field a when the place expression actually navigates into the disjoint field b). In fact, I am pretty sure we already have that. The only potential UB that proposals like &raw ... dissolve is the temporary reference being created at the end, after the place expression is evaluated.

1 Like

On its own, I’m sort of unenthusiastic about the idea of adding a new operator for this for the obvious reasons (niche use case, uses syntax, new thing for people to learn, etc), and so I was sort of more inclined to just let you use the . operator on raw pointers.

However, I think this is an area that deserves a properly holistic examination. Unsafe code is just kind of a PITA right now in a number of ways. For me, the most annoying thing is that NonNull’s APIs often make me feel like I should just use raw pointers, even though my pointers are nonnull. I wonder if this proposal would make sense as part of a more complete look at how to make it easier to effectively deal with all potentially dangling pointers?

9 Likes

We could special case &_ foo.field to use a special trait and also have "foo.field" work by behaving as * &_ foo.field

The alternative is to introduce a new operator like "->" and have them be foo->field and make foo.field behave as *foo->field respectively.

To put my thoughts a bit more into context:

In the OP, @Gankra showed two examples that both used references to access a field of a raw pointer. My ptr::drop_in_place example was a variant of their second example. I don't think that the primary problem is writing (*ptr).field because you don't need to use &mut or & for that anyway. The problem is getting pointers to that fields. For that reason I'm advocating for the "Field Access on Raw Pointers as Sugar for Offset" solution proposed in the OP:

1 Like

Both in MIR and in my head, both &_ and * _ count as a form of pointer, and it would be suprising to me if the operator treated them in different ways (one resolving to an lvalue/place and one to an rvalue/value).

It seems unclear to me how this may be backwards compatible and how it is supposed to interact with auto-deref? As a middle ground of not adding new operators for accessing members themselves, but also adding pointer traversal while also not colliding with currently in-use place expressions, maybe this is possible?

&.foo.field

I can't say that I find it intuitively optimal but an exploration of the possible design for syntax can't hurt.

This is the high order bit for me as well. I think unsafe code is unnecessarily difficult to write. I used to be opposed to adding "special syntax" around unsafe code, but I was persuaded as part of the union discussion that, indeed, it sometimes makes sense to extend the language with support for "unsafe abstractions" in direct ways. I am not 100% sure if -> is such a case, but it seems plausible.

5 Likes

If I had my druthers I would make NonZero a proper lang item, *T, but I haven't seen any bugs that result from NonZero being unergonomic, and I am slightly concerned with encouraging people to use the covariant internal mutability type more.

I don't think it is sound to use NonNull for internal mutability on its own.

Something like this makes me want to repeat what Josh said upthread, which is that we really, really need

  • A ptr-to-member type (and an equivalent of C++'s .* operator).
  • A trait to overload .*, which, in effect, lets us overload foo.bar too. One could imagine implementing it for *?mut T so that a T::*U ptr projects to a *?mut U, instead of a &?mut U (something something associated type ctors...).
1 Like

I think it is insofar confusing to cite C++ here as the operator-> semantics work as follows:

  1. We are presented with t->u
  2. We start with some value that is, either pointer or reference type, of type T
  3. As long as T is not of pointer type:
    • a. If T has an overload for auto operator->([const] T&) -> S, check the constntess and follow it
      • Note: There are no value semantics for overloadable operators
    • b. If T does not have such an overload, fail.
    • We now have a value of type S
  4. T is some pointer type Foo const*, check if Foo has the wanted member
    • a. If so, dereference the pointer to the member (perform (*t).u)
    • b. If not, fail

This is similar for operator->* and .*, to connect this more closely to the topic of pointer to member. That is, C++ ptr-to-member does not project T* to U* but rather T* to U&. Seems benign in terms of C++ but inconsistent for Rust.

To be perfectly honest, I find these semantics rather confusing (drastically put insane). This not only fails in my eyes for rust in the implicit dereferencing of the pointer in the end which defeats the purpose we want if for. But also, the return type of the overload itself must nevertheless be an actual pointer in the end, meaning we can never use it for NonNull<T>. Also value semantics could provide clearer self types but that is likely orthogonal.

The concept of ptr-to-member from C++ suffers from similar failures in my eyes. They were/are defined to work solely on pointers (which is why .* can't be overloaded separately) and also do an implicit dereferencing on access, i.e. are part of an lvalue expression and not an rvalue one... If we are to provide a concept similar to this, I would very much regret having it specialized to built-in pointers again.

In conclusion, . in Rust works on places, not on references (modulo auto-ref-deref). It seems incomplete to port ptr-to-member but present it in terms of raw pointers only, skipping both embedding to reference pointers and custom pointers. And enforcing the result type to one of the options does not seem complete as well.

And something else that I had mentioned last time this surfaced, a holistic solution for Rust should in my eyes consider enum and union as first-class citizens. Mostly for enum, the C++ syntax and semantics for creating a ptr-to-member lose most of their meaning.

I think I expressed myself incorrectly. If this is Too Intense of a derail, let me know.

When I say I want rust to add ptr-to-member, what I sugest is adding the following (re-using C++ syntax, even know we can do better, since syntax is a silly bikeshed):

  • A pointer to member type T::*U for all T, U which is just a typed offset.
  • A .* operator, such that given a place t: T, t.*field simply offsets that place.

This is roughly equivalent to C++'s T::*U, .*, respectively.

There is, of course, one very unpleasant detail. Deref requires that you return a reference, which is kind of the whole reason we’re discussing this. Similarly, we can’t have a DerefField in any interesting way, since DerefField wants to take Ptr<'a, T> and spit out Ptr<'a, U> (think of Pin).

Unfortunately, associated type ctors (which are sort-of required to be able to play this game) have a host of problems, so I imagine that we’d just either special-case raw pointers (like we already do with *ptr), or just add

impl<T> *const T {
  fn field<U>(f: T::*U) -> *const U;
}

and accept having to write ptr.field(&_::my_field).