I'm wondering if it would now make sense to introduce split NonNull?
I appreciate lifetimes and different raw pointer types, having a standard library types that express whether NonNull pointer might be mutated or is strictly read-only seems like a useful addition to me.
The truly general solution would be to have a wrapper type that makes whatever is inside have a niche. That would work for all of: non-null raw pointers, non-zero integers, non-max integers, and whatever funky datatypes users may want to implement.
I'd rather see us move away from raw pointers and NonNull. We can already use references at various levels of composable granularity.
&(mut) T - all guarantees
&(mut) MaybeUninit<T> - opts out of init
&(mut) UnsafeCell<T> - opts out of noalias
Option<&(mut) T> - opts out of nonnull
There are some cases that aren't currently covered by the above:
Pointers without a lifetime known at compile time: unsafe<'a> &'a (mut) T?
Offsets (&T only has provenance over the specific T, not adjacent memory)
Casts
Unaligned and volatile accesses
etc
We could focus our efforts on enabling these things, then we wouldn't need raw pointers at all, and the programmer can opt out of exactly which guarantees they need to.
The other thing I've seen mentioned is to, with an edition, switch raw pointers to be non-null by default. Then *mut T would be what NonNull<T> is today, and Option<*mut T> would be what *mut T is today.
I'd love to nuke our current raw pointers and replace them with something better.
Because just non-null isn't enough, to really be good we also want the option to have a validity invariant for alignment too. It'd be nice to nuke read_unaligned and just be able to read whatever using the known alignment of the pointer, whether that's bigger or smaller than the type's ABI alignment.
(And it'd be nice to just say that the only want to have null pointers is via None.)
My use case is FFI, where I can either use *const T and *mut T, but unable to convey that they can't be null, or NonNull<T>, which feels more Rust-y, but then looses ability to provide const/mut distinction. This is a bit frustrating and makes data structure definitions worse than they could have been.
Making default pointer type non-null and using Option<*const T> to express pointer that can be null looks way more appealing to me than current status quo. It'll also make raw pointers more consistent with regular references (which can be wrapped with Option<> for a similar effect. After all it seems to be more common to have non-null raw pointers, from my limited experience with C/C++.
In fact it may make them more similar than different, only lifetime presence stands out. Maybe a way to erase the lifetime like proposed above for <'a> &'a T or even something simpler like &'' T is all that is really necessary to abandon raw pointers?
Digression about numbers
After writing above, I though about numbers. We sort of have a similar situation there with u* and NonNullU*, but there having nullable default seems to make more sense.
Deprecating/replacing raw pointers is a bold proposition, even over edition, though I agree that'd be a positive step forward.
I don't like the wrapper types because often doesn't matter if you're just toggling orthogonal properties. What's the difference between MaybeUninit<UnsafeCell<T>> and UnsafeCell<MaybeUninit<T>>? Especially if you want to add or remove one of those properties but keep the others that can require yoinking one of the types out of the middle of the stack which is quite ugly.
A Ptr<T, NOALIAS=true, NULLABLE=true, ALIGNED=false, ...> would represent this orthogonality better imo. To avoid the boilerplate one could then define a bunch of aliases for the commonly used combinations.
Another thing I've been pondering: we could also have the one primitive pointer type be entirely untyped, to avoid the variance questions and such. Would also help avoid the bazillion casts we currently end up with in MIR when you try to do stuff with pointers.
Then you basically just end up with Ptr<const A: ptr::Alignment>, and pass the type type as which you want to read/write, and other types can be wrappers around that as needed.
Would this be like void* in C? How is it different than *mut ()?
While there are use cases for type erased pointers, I think the goal should be to guide developers to opt out of as few safety guards as possible. 99.9% of the time I absolutely want typed pointers so the compiler ensures I don't screw up.
As an internal representation detail it is fine, as a rust user I don't have any opinions on the internals of MIR, but as a user visible type I'd be wary of it being abused and making bugs harder to find. Writing unsafe rust is too tricky as it is already.
(I'm going to say that writing unsafe Rust with raw pointers today is harder than writing the same thing in C or C++. This is in stark contrast to safe Rust and "library-unsafe" Rust[1] which is much easier than those two languages. For context: I have a decade of systems level C++ experience, and about 3 years of systems level Rust experience at this point. )
E.g unsafe around str UTF8, unsafe Sync/Send, that sort of thing. Any unsafe that isn't about raw pointers basically. ↩︎
Well, for one you don't have to pick if *mut () or *const () But also it'd be more like *(mut|const) CanonicalExternType, so you can't uselessly .read() it as a unit.
I think essentially all raw pointers should be wrapped, because there should be more safety guards. Even if just in something like a CppRef<'a, T>, as discussed in the custom receivers RFC. But that type would add PhantomInvariant or whatever as desired.
Once I'm dropping fully to raw pointers, I find that it's often because I'm doing something weird anyway, and getting rid of the cast/cast_mut/cast_consts would be nice.
Rust's raw pointers were a major pain point in zerocopy, where we found it difficult to keep track of the relationship between implicit invariants we imposed on our raw pointers, and the safety requirements of the stdlib's unsafe methods.
We've since migrated to our own abstraction over the pointer types, Ptr<'a, T, I>, where 'a is the lifetime of the referent, T is the type of the referent, and I is a typestate encoding the invariants known about the referent.
For example, Ptr<'static, NonZeroU16, (Shared, Aligned, Initialized)> is a static, shared pointer to initialized memory that's well-aligned for a NonZeroU16 (but not necessarily non-zero). The zero-cost transition methods between different typestates are unsafe, but once the right typestate is reached, operations like turning a Ptr into a &T are safe.
This has drastically improved the quality of our unsafe code. Rather than prove long lists of safety properties on pointer ops, we instead provide narrowly-scoped safety comments on the typestate transitions (e.g., unsafely asserting that the referent of the pointer is well-aligned).
Yes! It's a tricky component to spin out, but we're working on it!
One challenge we've stumbled upon is abstracting over variance. Ideally, Ptr<'a, T, (Shared, ..) would be covariant over T, and Ptr<'a, T, (Unique, ..) would be invariant over T. We initially thought we might be able to solve this with GATs, but GATs are always invariant.
This has lead us to begrudgingly consider that we might need twoPtr types, one covariant and the other invariant. We really don't want to duplicate that much code! If anyone has ideas of how to abstract over variance, we'd love to hear them.
If there is a relatively simple language level change/addition that could make this easier I'd love to hear your thoughts on that. Something like that might be worth it for this Ptr type alone.
Variance only makes sense with respect to certain generic parameters. Would it be possible to treat the mutability[1] of the pointer specially by passing the T as an argument to the appropriate typestate struct?
It isn't actually C++ FFI, but Rust<->Rust (across VM abstraction), but the point is not whether some checks can be done manually, but rather whether certain contracts can be expressed on type system level. And looks like there is plenty of people wanting to express various invariants that are not possible to express today, at least not ergonomically/with standard library.
On Looking at just the *const T to have its own type.
Is the challenge for rust to define what is meant by *const T? For example, rust can permit inner mutability of any field in struct T, so would *const T mean to be a read-only version of a data-type where the once mutable fields have been made read-only const? For the provenance of a new const pointer? Also would the definition of *const T be to exclude any impl added to T, for simplicity to pass just the data? If so, would that suit the use-case? I think it might be the most simple and straightforward version, eg
*const T = NonNullConstAllDataOnly ?
Would the ask then be for a const T* to be a stripped down T, to only the public fields in the struct, omit impls, omit traits, change inner mutable fields to constant pointers to the owner reference? Then lastly, the developer would be responsible for knowing that there would
Be thread safety and lifetime safety for all the raw pointers contained within NonNullConstAllDataOnly ?