Need for -> operator for Unsafe Code Guidelines

197g · May 7, 2019, 7:42am

Maybe this indicates that in my complementary match based raw pointer rfc reference pattern were disallowed too early? With that design, this could be:

let Node { ref mut next, ref mut prev, .. } = node_ptr;

to avoid materializing a ref on the other attributes of node.

But it doesn’t generalize to NonNull and Unique, which -> could.

gnzlbg · May 7, 2019, 7:46am

@Gankra

As such, we often see code like:

// OOPS, we had a mutable ref to node.elem, and this aliases it!
let node = &mut *node_ptr;
// Code which doesn't actually access node.elem, but it doesn't matter
node.next = ...;
node.prev = ...;

let node = &mut *node_ptr;
ptr::drop_in_place(&mut node.elem);

So the correct code here using &raw would be:

&mut raw (*node).next = ...;
&mut raw (*node).prev = ...;

ptr::drop_in_place(&mut raw (*node).elem);

?

comex · May 7, 2019, 8:59am

I’m not a fan of adding an -> operator simply because having an operator that’s only used in unsafe code would contribute to making unsafe seem ‘mysterious’ (unless you already know what it does due to a C/C++ background). If anything I’d rather just start allowing auto-deref for raw pointers.

phil_opp · May 7, 2019, 9:22am

I don’t think that we need syntactic sugar for (*ptr).field, but we definitely need a way to get a *const FieldType from a *const StructType without constructing a reference.

Is there even a safe way to get a field pointer currently? I only know of &(*ptr).field as *const FieldType, but I don’t know if it’s safe because it creates a reference for a short time.

Ixrec · May 7, 2019, 11:09am

My only concern with this is a holistic one about what “unsafe-only syntax” should look like in general, and how of it much we should be adding.

We already have the proposal for &raw mut/const, and now we’re thinking about ->. Both of these are well-motivated in isolation. While -> has the obvious problem of being C-like enough to immediately encourage deep confusion (as it’s already doing in this thread!) it’s not obvious that we can come up with anything better. The bigger issue to me is that, unlike &raw, -> does nothing to indicate it’s a niche feature that only applies to raw pointers and is only useful in unsafe code.

And it seems unlikely that these are the only two syntaxes we’re going to need to get unsafe code into the right balance of ergonomic syntax and intuitive UB-related semantics. As a concrete example, in the past I’ve seen proposals to replace .get_unchecked() with some kind of “raw indexing syntax”.

Basically, I think we should have a general brainstorm about what other “unsafe-only syntax” we might want to add someday before we commit on -> as the new operator. Maybe a lot of them could have a raw keyword. Maybe a lot of them could reasonably be done with named macros/functions rather than syntax. Maybe a lot of them share this general pattern of “do two things, and pretend the intermediate result didn’t happen for UB purposes”.

RalfJung · May 7, 2019, 11:29am

One goal here is to make code like this easier to write and read:

(*new_root.as_mut().as_leaf_mut()).parent = ptr::null();

I don't think raw-ptr-patterns help there.

That is exactly the point of this discussion though -- I think not having this is one reason why people tend to use references over raw pointers when they really should not.

That would be RFC 2582.

phil_opp · May 7, 2019, 11:36am

Thanks for the pointer!

197g · May 7, 2019, 12:33pm

My focus was on the example before, where one wants references to multiple members but not all of them. With this example, what about that seems hard to read and write to you? The most annoying part of this expressions seems to me that the left side of an assignment (i.e. place expression) needs to jump between left and right side because the dereference operator * is not postfix and has no postfix variant.

I do not see how this is directly connected to retrieving members as pointer though. The place expression itself does not materialize the reference, or so I had assumed. Specifically, for (*ptr).field to be valid within unsafe code, only the memory of (*ptr).field needs to be initialized but not the whole of (*ptr) itself. Same for aliasing rules, &(*ptr).field only establishes that field can be validly referenced and not ptr. Right? If not, then I need to revisit my thoughts on the concerns of the syntax.

But in my reading here, this seems to be motivated not by the desire to get &raw itself but by it still requiring one to write &raw const (*ptr).field and that this presents a lot of overhead for what is one of the most simple forms of expression (.) on references. And that one is thus much more likely to (be tempted to) use the the less safe form, deriving a reference on everything first.

phil_opp · May 7, 2019, 12:37pm

I think the lack of a safe and easy way to get field pointers is a reason for this too. There is currently no way to do something like this without constructing a reference:

// node_ptr is a *mut NodeType raw pointer

ptr::drop_in_place(node_ptr.field); // not possible

It seems like the current default for constructing the field pointer is:

ptr::drop_in_place(&mut (*node_ptr).field);

This seems very hacky and unsafe. We construct a (temporary) reference anyway, so it is not apparent why this should be more safe than working with references directly:

let node = &mut *node_ptr;
ptr::drop_in_place(&mut node.field);

I think that resolving the RFC you posted might remove the confusion around this, so that the difference between the two approaches becomes more apparent and no extra syntax sugar is needed.

bill_myers · May 7, 2019, 3:41pm

Why not make raw pointers work like references with the dot operator as usual, so that ptr.field gives the value?

In general, raw pointers would be just like references except they have no aliasing, validity, non-nullness or alignment assumptions.

phil_opp · May 7, 2019, 4:35pm

I like the idea, but I would prefer if the dot operator would give a field pointer instead. Then you could do struct_ptr.field to get the field pointer and *struct_ptr.field to access the field. Most importantly, we would not need &mut/& for working with raw pointers anymore.

197g · May 7, 2019, 5:08pm

I like the idea, but I would prefer if the dot operator would give a field pointer instead. Then you could do struct_ptr.field to get the field pointer and *struct_ptr.field to access the field. Most importantly, we would not need &mut / & for working with raw pointers anymore.

I would find that confusing. Technically, the only reason that struct.field works when struct is a &_, is that there is auto-dereference desugars this to (*struct).field or in general applying dereferencing and Deref::deref as often as necessary. (Aside: for packed structs this is part of the problem as customized Deref::deref materializes an actual reference sooner than apparent from the code).

A new operator that works the same for raw pointers and reference (pointers) would be more appropriate. Without regards to bikshedding, I'd also expect to work:

struct Foo (usize);
fn foo(input: &Foo) -> &usize {
    input->0
}
// Unsafe because no inbounds guarantees for general struct members
unsafe fn bar(input: *Foo) -> *usize {
    input->0
}

CAD97 · May 7, 2019, 6:10pm

Quoting earlier in the thread:

The operation that's being solved here is

As such, I think the OP solutions are the most viable solutions. Whatever solution we pick is tailored for use with raw pointers, so it makes sense that it would work only with raw pointers (though it could potentially work with references by coercing them).

My favorite solution is making ptr.field "just work" to get the address of the field, though ptr->field to get the place (lvalue) also would work in tandem with &raw [const|mut]. I think &raw [const|mut] is still required anyway to get pointers from a place without going through a reference, however.

ptr->field would be best off being to get the lvalue, as it'd be mainly used for "C in Rust" and unnecessary departure from C expectations there is a dangerous game. ptr.field just working, however, I think makes sense. Whatever solution we take, though, shouldn't go through user code because of how it's intrinsically trusted in unsafe contexts. Especially if we want to make it omit the inbounds assumption that is all too easy to sneak in with a & somewhere that avoiding is the point of this.

hanna-kruppe · May 7, 2019, 6:35pm

The first time I saw this wording in this thread I thought I was misreading, but from repetition and context here I must conclude: is this proposal really that x.y, which so far always evaluates to a place, should instead evaluate to a pointer value if x is a raw pointer?

Let me try to clarify by giving some example that highlight the difference.

*x.y = ...; would involve different amounts (and kinds) of memory indirection depending on the types involved -- for example, it might write to a field of the memory pointed to by x: *mut Foo, or it might through the pointer y: *mut Bar stored in the memory pointed to by x: &mut Foo.
Consider the idiom for swapping two fields of a struct that you own or hold a reference to: mem::swap(&mut x.field1, &mut y.field2) -- if x is a raw pointer instead, then this would rather silently swap two temporary raw pointers (which has no effect).
(More examples involving longer place expressions occur to me but they depend on further details, e.g., would this syntax "auto-deref" through raw pointers?

It's probably easy to tell that I picked these examples "adversarially", that is, to illustrate what I believe to be an evident flaw of the proposal as I understood it. But these differences haven't been brought up at all as far as I saw, so I wonder whether I've misunderstood?

CAD97 · May 7, 2019, 9:59pm

Yes, that is the correct interpretation. There would never be autoderef on pointers if ptr.field evaluated to the address of the field.

*x.y would always parse as *(x.y), as it does today. So (*x).y remains what it is today, and *x.y evaluates to the same thing by being essentially *(&raw (*x).y).

I admit freely that &ptr.field could then become confusing (especially if field is pointer typed); I had not considered that problem. We could lint against it, as I doubt it would ever be useful.

I believe that it should function rather intuitively, though I have no data to back this up in my favor. The main advantage it offers over -> is keeping all pointer derefs with * and not requiring the address-of operator for an operation that shouldn’t deref (calculating pointer-to-field) (potentially on pain of UB depending on how we define things).

197g · May 7, 2019, 10:23pm

If -> were to be ‘pointer value’ to ‘pointer value’, that would not only be useful to unsafe code guidelines. It would also smooth out some parts of writing accessor functions. Note how only the function header changes:

struct Foo {
    bar: usize,
}

impl Foo {
    fn get(&self) -> &usize {
        self->bar
    }

    fn get_mut(&mut self) -> &mut usize {
        self->bar
    }

    unsafe fn get_ptr(*const self) -> *const usize {
        self->bar
    }

    unsafe fn get_ptr(*mut self) -> *mut usize {
        self->bar
    }
}

Although I don’t consider -> to be perfect due to obvious presumptions from C and related languages. I always found it odd how there was another operator transitioning from pointer to value semantics but it how no actual parallel of . had been created operating within pointer semantics.

Aside: Such an operator could prepare a notation without the issues that effectively postponed the delegation rfcs. Addressing the first of the remaining concerns by cramertj.

josh · May 8, 2019, 1:21am

I think the idea of -> doing anything other than “go from a pointer-to-struct to a field lvalue from that struct” would produce surprising behavior.

But I do think a pointer-to-member mechanism makes sense. This seems closely related to the ongoing desire for offsetof.

earthengine · May 8, 2019, 1:27am

What about Pin ed pointers? The current way to access fields of a Pined pointer is quite ugly (unsafe as well).

197g · May 8, 2019, 5:42am

Using ~ as the operator syntax for now. I do not like offsetof as the fundamental operator, because it doesn’t add any safety and unlike pointer::align_offset this unsafety is mostly not required. But I could imagine an ops:: based design for customization:

/// Encapsulates, safely, a field of `S` with type `T` 
struct Field<S, T> { .. }

impl<S, T> Field<S, T> {
    unsafe fn offset_ptr(base: *const S) -> *const T;
    fn offset(base: &S) -> &T;
    // etc. for mut
}

trait std::ops::Traverse<S, T> {
    type Output;
    unsafe fn traverse(self, field: Field<S, T>) -> Self::Output;
}

The common implementations for that trait could then be done for *const_, *mut_, but also as I would imagine on Ref (!), Pin<P>, and maybe even exotic pointer types such as Cow?

impl<S, T> Traverse<S, T> for *const S {
    type Output = *const T;
    unsafe fn traverse(self, field: Field<S, T>) -> *const T {
        field.offset_ptr(self)
    }
}

impl<'a, S, T> Traverse<S, T> for &'a S {
    type Output = &'a T;
    // Note relaxed bound, no unsafe
    fn traverse(self, field: Field<S, T>) -> &'a T {
        field.offset(self)
    }
}

impl<'a, S, T> Traverse<S, T> for Ref<'a, S> {
    type Output = Ref<'a, T>;
    fn traverse(self, field: Field<S, T>) -> Ref<'a, T>{
        self.map(|ptr| field.offset(ptr)) 
    }
}

with the usage:

struct Foobar { member: usize }
let x = RefCell::new(Foobar::default());
let r = x.borrow();
let m = r~member; // Ref<usize>

Similar for Pin but with some bounds on the impl.

This design has an additional interesting property: One could have associated constants of type Field<Self, T> to expose some struct attributes without making their names public.

bill_myers · May 8, 2019, 9:46am

Seems to me the actual problem here is the & operator not being flexible enough.

Adding &raw or even an &_ that would use the same pointer type would allow to use &_ ptr.field or &raw ptr.field to get a pointer, which avoids having to introduce a new -> operator and allows to “.” operator to have the same behavior for references and pointers.

In @197g’s example, “&_ self.bar” would work for all the method bodies and be much more intuitive than the “self->bar” syntax.

Topic		Replies	Views
[Idea] Pointer to Field Unsafe Code Guidelines	81	4936	September 3, 2019
Feature: Allow pattern-matching of a pointer language design	7	1894	July 10, 2021
Computing raw pointers to fields Unsafe Code Guidelines	3	4613	December 22, 2024
Raw pointer ergonomics	22	1979	May 3, 2023
Unsafe Deref Trait	4	1570	May 23, 2021

Need for -> operator for Unsafe Code Guidelines

Related topics