Cleaning up `ref`, `&`, `*` confusion

Summary:

  1. In patterns, replace ref x with new syntax *x.
  2. In patterns, replace ref mut x with new syntax mut *x.
  3. In types, replace *const T with new syntax &raw T.
  4. In types, replace *mut T with new syntax &raw mut T.
  5. In expressions, replace ptr::addr_of!(x) with new syntax &raw x.
  6. In expressions, replace ptr::addr_of_mut!(x) with new syntax &raw mut x.

Rationale:

Ad 1. Using * in place of ref in patterns would be consistent with expressions:

    match Some(x) {
        Some(x) => {}
        _ => {}
    }
    
    match &x {
        &x => {}
    }
    
    match *x {
        *x => {}
    }

Moreover, ref is confusing because it sounds like & rather than *, but &x is the opposite of ref x. It really means "deref" not "ref" if patterns are to match expressions.

It seems like a reason to avoid * in patterns is because it might be confused with raw pointers. This is addressed by points 3 and 4.

Ad 2. ref mut x could be replaced by *mut x, but mut *x is more consistent, because it's really *x that is mutable, not x.

And also *mut is the current notation for raw pointers.

Ad 3. Using &raw T is consistent with &T and &mut T. Having *const T is confusing, because * is kind of the opposite of &. And it would make idea 1 more plausible because * in patterns wouldn't be confused with raw pointers.

Ad 4. Not sure there really needs to be a distinction between *const and *mut, since the two are semantically equivalent and can be freely converted to each other. But assuming there does need to be a distinction, &raw mut is consistent with &raw.

Ad 5, 6. We're already introducing the raw keyword, so might as well use it here. The notation &raw x is less clunky than either of ptr::addr_of!(x) or &x as *const T (the latter of which doesn't always work).

Edit: Added &raw and &raw mut in expressions.

6 Likes

Make sure to clarify what kind of syntax you are talking about. As far as I can tell, the first two are about patterns, and the second two about types. I would suggest you indicate that clearly in the summary already. Feel free to also indicate that the syntax on the right hand side is new syntax in each case; otherwise a reader might fall back to thinking in terms of existing syntax (for different syntactical categories), e.g. thinking of *x as a dereferencing expression (even though it's supposed to be new pattern syntax), or of &raw T as the &raw x-style raw_ref_op operator (for expressions, not types).

Also note that raw_ref_op operators are currently &raw mut PLACE and &raw const PLACE, not &raw PLACE. Note the additional "const".

1 Like

Done

TBH I think the time for this was during the design of the match ergonomics feature. Introducing this now would likely:

  1. Cause massive ecosystem breakage, which may or may not be fixable by something like cargo fix
  2. Cause mass confusion. The proposed syntax just doesn't mesh at all with my mental model of match ergonomics.

So then the question becomes: is this worth the technical and social (in the form of asking masses of people to relearn something that already works fine) cost? I do not believe so.

8 Likes

I completely agree that ref and *const are confusing. This was already discussed before and it's one of the few decisions about the language design that felt wrong to me when that was decided and that still fell wrong to me nowadays.

I like very much what you are suggesting, but I doubt a so impacting breaking change in the language would be accepted at this stage without a stronger argument than language consistence.

3 Likes

I do want to point out that thanks to editions, it seems like this wouldn't be technically a breaking change. Whether it's worth the confusion of two editions having completely different syntax for the same thing is a different matter.

Although I'm not sure the churn is worth it, I don't think the change even needs to be split across an edition boundary: unless I'm missing something, all editions could continue to accept both syntaxes, but just show a deprecation warning for the old one. (This would also help with macro compatibility between code using the old and new syntaxes.)

2 Likes

I didn't realize this already exists as an unstable feature.

The reason for the extra const is probably to match *const T and *mut T in types, but if those types get renamed to &raw T and &raw mut T as proposed, then it is more natural to have &raw x and &raw mut x in expressions respectively.

Getting rid of const in raw pointers, analogous to regular references, would make sense because this way const would always indicate a compile-time constant.

2 Likes

There's a reason the current syntax is &raw const «place» and not &raw «place»: lacking the const makes it a breaking change to add. Why? & raw ( a , b ) could be either a reference to the output of the function raw(a, b) or a raw reference to the tuple (a, b). raw const removes the ambiguity.

I'm generally in favor of &raw [const|mut] T becoming the main pointer type in the future. However, this very much should not just be a new name for * [const|mut] T; working with pointers in Rust is much harder than in should be. A bit of syntactic salt is okay to push people towards safe references, but the current state makes unsafe Rust much harder to write than the equivalent C or C++ not just because of having to deal with the aliasing requirements (void if you solely use pointers) but because of countless fixable syntactic pitfalls.

At a minimum imho, &raw mut T should strongly consider[1] being non-null, and Option<&raw mut T> the nullable type. There's also the question of if you can hang an advisory lifetime off of the raw reference. This ties into proposals for ptr::WellFormed<T> / "the 'unsafe lifetime" as well, which is a well-formed reference which was a valid reference at some point, but is unsafe to dereference because it may have been invalidated / been made to dangle — basically references without the borrow checker.

This of course also has to overcome the inertia of all of std's raw APIs dealing *mut T, and the decision not to expose versions that deal in ptr::NonNull. From the current place in time, I think that this was the correct choice (because we can and should do better than ptr::NonNull), but I think properly migrating to a better &raw mut T type will involve some unfortunate implicit conversion tricks between *mut T and Option<&raw mut T to happen cleanly[2].

There's a large amount of design space out there for Unsafe Rust to do better at being a better C. Imho, Safe Rust already is a better C, but if you want to write Unsafe Rust, the value proposition is solely in seamless Safe Rust interop, and not in the utility of Unsafe Rust. If we do so deliberately, I think we can make writing Unsafe Rust an order of magnitude nicer with careful design.


  1. The practice of e.g. pointer tagging a null pointer means that sometimes treating a maybe-null pointer equivalently to a not-null pointer is desirable, so I could be convinced otherwise. Plus, doing things "properly" with Option<ptr::NonNull<T>> currently is exhausting[3], so there would need to be more new convenience functionality available. ↩︎

  2. Short draft: all std APIs switch from dealing in *mut T to &raw mut T (if required/guaranteed non-null) or Option<&raw mut T> (if null is possible). On previous editions (but removed in future editions), there are free coercions *mut T <=> &raw mut T and *mut T <=> Option<&raw mut T>, such that existing editions can continue using the APIs as currently. (This needs testing to determine how bad it'd be for type inference fallout...) ↩︎

  3. To the point where most people's advice is to use ptr::NonNull at rest, and downgrade to *mut T for interfaces and actually doing any pointer work. ↩︎

4 Likes

Ah got it. I was thinking raw would be a keyword (in a future edition) so the ambiguity wouldn't exist. On the other hand, ref would no longer have to be a keyword.

While this is possible, the lang team much prefers to avoid new keywords that clash with commonly used identifiers in the wild. And, well raw is used in std for std::os::raw.

5 Likes

However the documentation for std::os::raw says:

Use core::ffi instead.

So maybe it's OK to make std::os::raw harder to access.

That's not the point. Having a module named raw which directly binds to a low-level API is already a blessed pattern, so you can expect to see it in many real-world crates.

But even if it wasn't, adding a new keyword is always a big backwards compatibility risk.

With regards to the proposal, the renaming of pointer types feels unmotivated, except for the consistency with the pattern syntax. However, the confusion around ref/ref mut is pretty much solved by match ergonomics, and I barely see it in new code. For this reason optimizing anything for ref patterns feels strongly counterproductive (a major backwards compatibility issue is traded for a basically solved problem).

The pointer types are also named this way specifically to be familiar to C(++) people, and for that reason it is unlikely to ever change.

I'm totally with you that this is unintuitive, but the syntax is very deliberate.

Patterns are duals of expressions. The reason why & does the opposite thing in patterns is not any syntactical ambiguity, but consistency with the fact that Some(x) in patterns also does the opposite thing! It unwraps Some instead of wrapping in Some.

https://h2co3.github.io/pattern/

If I was a time-travelling overlord of Rust, I'd make * dereference in patterns. Unfortunately, at this point Rust syntax is done.

6 Likes

I'll toss in that it might want to also consider having alignment in the type somehow. Imagine if we had Pointer<T, ALIGN = align_of::<T>()>, for example -- then .read() could always use its statically-known alignment, there'd be Pointer<T>: From<&T>, etc.

*mut T is invariant in T, while *const T is covariant in T.

Yes but that can be done with PhantomData. The only reason they're really needed is as a lint, but I do think it is important.

I'm just pointing out that the two, as Rust is defined today, have different semantics. I.e. making a correction to a false statement.

1 Like

I wouldn't go that far, but I think switching to * in patterns seems like more churn than value at this point.

1 Like

I read it as "referent", I.e. ref id as "the referent of (newly created binding) id".

Is it? I assumed it's just because * isn't an exact inverse of & (note Deref) and ref's pattern semantics are therefore not a mirror of * in the same way other patterns mirror their expression counterparts.

Since match ergonomics the "mirroring" between patterns and expressions is not quite as exact as it used to be, so special syntax is perhaps less justified. But then match ergonomics has also greatly diminished the need for ref, so fiddling around with it now would only affect a tiny fraction of code.