Need for -> operator for Unsafe Code Guidelines

I lost a long detailed draft of this, so I’ll keep it short this time. In my experience, the two most common miri-caught violations of the Unsafe Code Guidelines we have are:

This post is about the former. I am bothered by the former because it tends to be the consequence of a common problem: it’s a pain in the neck to offset a raw pointer to fields. As such, we often see code like:

// OOPS, we had a mutable ref to node.elem, and this aliases it!
let node = &mut *node_ptr;
// Code which doesn't actually access node.elem, but it doesn't matter
node.next = ...;
node.prev = ...;

or:

let node = &mut *node_ptr;
ptr::drop_in_place(&mut node.elem);

I contend that this would be less likely if we had the arrow operator, or some other way to ergonomically offset to fields without instantiating a reference.

Related: the raw reference RFC is also contending with similar issues.

Possible solutions:

Arrow operator:

ptr->field is an ergonomic alternative to (*ptr).field, especially relevant when ptr is a complex expression.

May alternatively be .* if you object to arrow syntax.

Postfix deref:

Make ptr* work, from-which ptr*.field naturally falls out.

NOTE: this proposal is ambiguous with x*.0, as this may be “multiply x by floating point literal 0.0”

Field Access on Raw Pointers as Sugar for Offset

Make ptr.field equivalent to ((&?mut (*ptr).field) as *?mut FieldTy) (or more accurately &raw (*ptr).field as defined by the raw reference RFC). This would allow things like ptr.field.drop_in_place(), which is very nice. Although *ptr.field = val is a bit surprising?

I believe there is no language-level ambiguity here though, as raw pointers have no fields, only methods. And foo.read when foo has .read() method seems to be unambiguous since we don’t support getting a function pointer in this way. Adding such a function pointer sugar would likely be a footgun, and would cause trouble for the common idiom of foo.field() being the common getter idiom.

10 Likes

I really want -> too, for another reason. Sometimes you just have to write a lot of unsafe code, and in that case the biggest thing that prevents Rust from being “a nicer C” is the lack of arrow. In such cases, I’d rather folks use Rust than C++.

15 Likes

Personally, I think this solution makes the most sense for this problem.

Interestingly, this is even a safe operation, as you're just doing math on a pointer. It also removes the potentially dangerous deref entirely as well, making it clear this is safe! Wouldn't this also solve a portion of "pointer to field" as well?

(How much of the &raw [const|mut] problem is solved by this? I'm not exactly sure. You'd need (&packed as *const _).field, I think, to do &raw const packed.field.

Also, *ptr.field = expr currently lints as "did you mean (*ptr).field = expr?". Making this work I think makes sense.

I think this also would almost eliminate the want for a C style ->, as the single-deref case "just works" and it's only a multiple-deref case that needs parentheses.

The one question comes up with ptr::NonNull. We'd probably want to add a NonNull::map(self) so you can nn.map(|ptr| ptr.field) rather than NonNull::new(nn.as_ptr().field).

2 Likes

No, pointer offsets are in fact unsafe.

Notably for this case, you are effectively asserting that the input and output pointers are non-dangling. In practice I do not consider this a significant pitfall, though.

1 Like

NonNull::dangling which just “makes up” an aligned non-null pointer is sound, so I don’t see how doing pointer math with an arbitrary pointer isn’t sound, but I’ll acquiesce that I’m not an expert and the docs contradict my assumption. I definitely agree that using a pointer that violates one of the listed requirements is definitely insta-UB.

If it is potentially UB, whatever syntax has to be unsafe, of course. It probably should be anyway, as unsafe around pointers “as a lint” probably isn’t too bad.

Cc wg-unsafe again, I guess.

The best intuitive answer I can give you is that if it's legal to pointer offset into anyone's memory, then every pointer offset has to conservatively be assumed to be potentially aliasing literally all memory. This would be very bad for codegen. At the end of the day, this is all inherited semantics from llvm, so nothing can be done.

edit: anyway, not relevant to the the thread. Feel free to ask about this more on discord/irc.

2 Likes

Both offset() and wrapping_offset() lead to the LLVM getelementptr instruction, but the former uses the inbounds keyword:

Mixing and matching two of the things:

What about a -> operator that preserves pointer-ness and reference-ness?

So a->b would be a &mut if a is &mut, a *const if a is a *const, etc

I think that would be a surprising departure from otherwise C-like syntax.

C has value.field and pointer->field, and then C++ adds reference.field that makes the reference look like a value too. It's kind of nice that Rust references work the same way, thanks to auto-deref. But if we add pointer->field, I would expect it to also work like C in dereferencing the pointer to get the field's value.

On the other hand, pointer.field is a newish thing, so I'm a little more comfortable with that being sugar for pointer offset.

2 Likes

TIL about C++'s pointer-to-member .* and ->* operators:

https://en.cppreference.com/w/cpp/language/operator_member_access#Built-in_pointer-to-member_access_operators

Yes please! I fully agree that we want a postfix-operator for getting from a raw ptr of a struct to either the address or the place (lvalue) of its field. This has also been one of the annoying aspects of fixing some of the issues Miri found.

IMO we should use non-inbounds GEP for this, to keep the risk down.

In fact IMO we should also use non-inbounds GEP for things like (*x).field. Basically, I think we should use getelementptr inbounds only if we already know for other reasons (like having a reference that we know is dereferencable) that the offset will definitely be in-bounds -- so the act of doing the access does not actually make any new inbounds assertions. Then the fact that we add inbounds there becomes entirely a detail of our LLVM backend that Rust authors do not have to be concerned with.

3 Likes

This seems pretty reasonable to me. I think I would prefer that -> is restricted to raw pointers (which might reasonably be thought of as “C pointers”). This seems to somewhat overlap with @josh’s interests around “C parity” or “FFI”

4 Likes

Given that references coerce to raw pointers, is it possible to restrict -> to them?

We control coercion sites, so it should be trivial to only support them on raw pointers

Just chiming in here with my two cents, I think the -> operator would make sense on everything that implements the Deref trait. This would probably mean a special case for pointers would need to exist.

as far as i can tell, “arrow operator” ptr->field also allows for ptr->field.drop_in_place().

That said, I’d personally like to restrict -> to being a purely unsafe operator for raw pointers and not intermix any other concepts with it. Unsafe code should obviously have as few confusing elements as possible, and having one operator be “the one for raw pointers” is very clear. We don’t need new ways to access Deref things in general, we already have . for that (unless I’m horribly missing something).

3 Likes

The thing is, raw pointers specifically don't implement Deref, so adding -> to Deref wouldn't even solve the original issue. Further, . is already -> for Deref types (except the very specific case where the Pointer and Pointee have a name conflict).

No, drop_in_place, read, write, etc are methods on the raw pointer type. ptr->field gets you an instance of FieldTy, and not *mut FieldTy.

oh, right, you were assuming that field is a non-pointer type. My mistake.

I frequently enough need to grapple with pointers that point to structures that themselves contain pointers. linked lists, trees, that sort of stuff. So, I think that whatever is examined and evaluated must be considered from the single depth and also from the multi-depth perspective.

By this metric, I think that arrow does best, “sugar for offset” is an okay-ish second, and then postfix deref doesn’t do so well (a little too close to a*b*c for me).

Yes, please, the -> is missing in Rust. But please introduce it alongside with a Trait similar to Deref so it can be implemented for NonNull<T> and Unique<T>. Maybe this way?

pub trait DerefPtr {
    type Target: ?Sized;

    unsafe fn deref_ptr(&self) -> &Self::Target; 
} 

And similary DerefPtrMut.

@TimDiekmann that is not a good type, this is all about raw pointers after all – if we are forced to create references here, that entirely defeats the purpuse.