Need for -> operator for Unsafe Code Guidelines

Gankra · May 6, 2019, 4:05pm

I lost a long detailed draft of this, so I’ll keep it short this time. In my experience, the two most common miri-caught violations of the Unsafe Code Guidelines we have are:

We materialize or hold onto a too-large reference when working with raw pointers
We mess up pointer alignment in low-level code, often in niche cases like with ZSTs

This post is about the former. I am bothered by the former because it tends to be the consequence of a common problem: it’s a pain in the neck to offset a raw pointer to fields. As such, we often see code like:

// OOPS, we had a mutable ref to node.elem, and this aliases it!
let node = &mut *node_ptr;
// Code which doesn't actually access node.elem, but it doesn't matter
node.next = ...;
node.prev = ...;

or:

let node = &mut *node_ptr;
ptr::drop_in_place(&mut node.elem);

I contend that this would be less likely if we had the arrow operator, or some other way to ergonomically offset to fields without instantiating a reference.

Related: the raw reference RFC is also contending with similar issues.

Possible solutions:

Arrow operator:

ptr->field is an ergonomic alternative to (*ptr).field, especially relevant when ptr is a complex expression.

May alternatively be .* if you object to arrow syntax.

Postfix deref:

Make ptr* work, from-which ptr*.field naturally falls out.

NOTE: this proposal is ambiguous with x*.0, as this may be “multiply x by floating point literal 0.0”

Field Access on Raw Pointers as Sugar for Offset

Make ptr.field equivalent to ((&?mut (*ptr).field) as *?mut FieldTy) (or more accurately &raw (*ptr).field as defined by the raw reference RFC). This would allow things like ptr.field.drop_in_place(), which is very nice. Although *ptr.field = val is a bit surprising?

I believe there is no language-level ambiguity here though, as raw pointers have no fields, only methods. And foo.read when foo has .read() method seems to be unambiguous since we don’t support getting a function pointer in this way. Adding such a function pointer sugar would likely be a footgun, and would cause trouble for the common idiom of foo.field() being the common getter idiom.

pcwalton · May 6, 2019, 4:57pm

I really want -> too, for another reason. Sometimes you just have to write a lot of unsafe code, and in that case the biggest thing that prevents Rust from being “a nicer C” is the lack of arrow. In such cases, I’d rather folks use Rust than C++.

CAD97 · May 6, 2019, 5:51pm

Personally, I think this solution makes the most sense for this problem.

Interestingly, this is even a safe operation, as you're just doing math on a pointer. It also removes the potentially dangerous deref entirely as well, making it clear this is safe! Wouldn't this also solve a portion of "pointer to field" as well?

(How much of the &raw [const|mut] problem is solved by this? I'm not exactly sure. You'd need (&packed as *const _).field, I think, to do &raw const packed.field.

Also, *ptr.field = expr currently lints as "did you mean (*ptr).field = expr?". Making this work I think makes sense.

I think this also would almost eliminate the want for a C style ->, as the single-deref case "just works" and it's only a multiple-deref case that needs parentheses.

The one question comes up with ptr::NonNull. We'd probably want to add a NonNull::map(self) so you can nn.map(|ptr| ptr.field) rather than NonNull::new(nn.as_ptr().field).

Gankra · May 6, 2019, 6:14pm

No, pointer offsets are in fact unsafe.

Notably for this case, you are effectively asserting that the input and output pointers are non-dangling. In practice I do not consider this a significant pitfall, though.

CAD97 · May 6, 2019, 7:05pm

NonNull::dangling which just “makes up” an aligned non-null pointer is sound, so I don’t see how doing pointer math with an arbitrary pointer isn’t sound, but I’ll acquiesce that I’m not an expert and the docs contradict my assumption. I definitely agree that using a pointer that violates one of the listed requirements is definitely insta-UB.

If it is potentially UB, whatever syntax has to be unsafe, of course. It probably should be anyway, as unsafe around pointers “as a lint” probably isn’t too bad.

Cc wg-unsafe again, I guess.

Gankra · May 6, 2019, 7:15pm

The best intuitive answer I can give you is that if it's legal to pointer offset into anyone's memory, then every pointer offset has to conservatively be assumed to be potentially aliasing literally all memory. This would be very bad for codegen. At the end of the day, this is all inherited semantics from llvm, so nothing can be done.

edit: anyway, not relevant to the the thread. Feel free to ask about this more on discord/irc.

cuviper · May 6, 2019, 7:19pm

Both offset() and wrapping_offset() lead to the LLVM getelementptr instruction, but the former uses the inbounds keyword:

If the inbounds keyword is present, the result value of the getelementptr is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely precise signed arithmetic are not an in bounds address of that allocated object. The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end. The only in bounds address for a null pointer in the default address-space is the null pointer itself. In cases where the base is a vector of pointers the inbounds keyword applies to each of the computations element-wise.

If the inbounds keyword is not present, the offsets are added to the base address with silently-wrapping two’s complement arithmetic. If the offsets have a different width from the pointer, they are sign-extended or truncated to the width of the pointer. The result value of the getelementptr may be outside the object pointed to by the base pointer. The result value may not necessarily be used to access memory though, even if it happens to point into allocated storage. See the Pointer Aliasing Rules section for more information.

scottmcm · May 6, 2019, 8:24pm

Mixing and matching two of the things:

What about a -> operator that preserves pointer-ness and reference-ness?

So a->b would be a &mut if a is &mut, a *const if a is a *const, etc

cuviper · May 6, 2019, 8:46pm

I think that would be a surprising departure from otherwise C-like syntax.

C has value.field and pointer->field, and then C++ adds reference.field that makes the reference look like a value too. It's kind of nice that Rust references work the same way, thanks to auto-deref. But if we add pointer->field, I would expect it to also work like C in dereferencing the pointer to get the field's value.

On the other hand, pointer.field is a newish thing, so I'm a little more comfortable with that being sugar for pointer offset.

cuviper · May 6, 2019, 9:06pm

TIL about C++'s pointer-to-member .* and ->* operators:

https://en.cppreference.com/w/cpp/language/operator_member_access#Built-in_pointer-to-member_access_operators

RalfJung · May 6, 2019, 9:17pm

Yes please! I fully agree that we want a postfix-operator for getting from a raw ptr of a struct to either the address or the place (lvalue) of its field. This has also been one of the annoying aspects of fixing some of the issues Miri found.

IMO we should use non-inbounds GEP for this, to keep the risk down.

In fact IMO we should also use non-inbounds GEP for things like (*x).field. Basically, I think we should use getelementptr inbounds only if we already know for other reasons (like having a reference that we know is dereferencable) that the offset will definitely be in-bounds -- so the act of doing the access does not actually make any new inbounds assertions. Then the fact that we add inbounds there becomes entirely a detail of our LLVM backend that Rust authors do not have to be concerned with.

nikomatsakis · May 6, 2019, 9:28pm

This seems pretty reasonable to me. I think I would prefer that -> is restricted to raw pointers (which might reasonably be thought of as “C pointers”). This seems to somewhat overlap with @josh’s interests around “C parity” or “FFI”

CAD97 · May 6, 2019, 9:35pm

Given that references coerce to raw pointers, is it possible to restrict -> to them?

Gankra · May 6, 2019, 10:18pm

We control coercion sites, so it should be trivial to only support them on raw pointers

mooman219 · May 6, 2019, 11:19pm

Just chiming in here with my two cents, I think the -> operator would make sense on everything that implements the Deref trait. This would probably mean a special case for pointers would need to exist.

Lokathor · May 6, 2019, 11:24pm

as far as i can tell, “arrow operator” ptr->field also allows for ptr->field.drop_in_place().

That said, I’d personally like to restrict -> to being a purely unsafe operator for raw pointers and not intermix any other concepts with it. Unsafe code should obviously have as few confusing elements as possible, and having one operator be “the one for raw pointers” is very clear. We don’t need new ways to access Deref things in general, we already have . for that (unless I’m horribly missing something).

Gankra · May 6, 2019, 11:36pm

The thing is, raw pointers specifically don't implement Deref, so adding -> to Deref wouldn't even solve the original issue. Further, . is already -> for Deref types (except the very specific case where the Pointer and Pointee have a name conflict).

No, drop_in_place, read, write, etc are methods on the raw pointer type. ptr->field gets you an instance of FieldTy, and not *mut FieldTy.

Lokathor · May 7, 2019, 12:13am

oh, right, you were assuming that field is a non-pointer type. My mistake.

I frequently enough need to grapple with pointers that point to structures that themselves contain pointers. linked lists, trees, that sort of stuff. So, I think that whatever is examined and evaluated must be considered from the single depth and also from the multi-depth perspective.

By this metric, I think that arrow does best, “sugar for offset” is an okay-ish second, and then postfix deref doesn’t do so well (a little too close to a*b*c for me).

TimDiekmann · May 7, 2019, 6:30am

Yes, please, the -> is missing in Rust. But please introduce it alongside with a Trait similar to Deref so it can be implemented for NonNull<T> and Unique<T>. Maybe this way?

pub trait DerefPtr {
    type Target: ?Sized;

    unsafe fn deref_ptr(&self) -> &Self::Target; 
}

And similary DerefPtrMut.

RalfJung · May 7, 2019, 7:29am

@TimDiekmann that is not a good type, this is all about raw pointers after all – if we are forced to create references here, that entirely defeats the purpuse.

Topic		Replies	Views
[Idea] Pointer to Field Unsafe Code Guidelines	81	4935	September 3, 2019
Feature: Allow pattern-matching of a pointer language design	7	1894	July 10, 2021
Computing raw pointers to fields Unsafe Code Guidelines	3	4613	December 22, 2024
Raw pointer ergonomics	22	1979	May 3, 2023
Unsafe Deref Trait	4	1570	May 23, 2021

Need for -> operator for Unsafe Code Guidelines

Arrow operator:

Postfix deref:

Field Access on Raw Pointers as Sugar for Offset

Related topics