Two flavors of NonNull (again)

I'm interested in having separate NonNull variants for *const T and *mut T pointers.

This was already suggested in Seperating NonNull into NonNullMut and NonNullConst a while ago with relevant discussions being Why do we need two kinds of raw pointers? and NonNull for *const T - help - The Rust Programming Language Forum

Seperating NonNull into NonNullMut and NonNullConst happened more than 5 years ago and a lot of things happened since, but most notably strict provenance.

I'm wondering if it would now make sense to introduce split NonNull?

I appreciate lifetimes and different raw pointer types, having a standard library types that express whether NonNull pointer might be mutated or is strictly read-only seems like a useful addition to me.

1 Like

The truly general solution would be to have a wrapper type that makes whatever is inside have a niche. That would work for all of: non-null raw pointers, non-zero integers, non-max integers, and whatever funky datatypes users may want to implement.

3 Likes

I'd rather see us move away from raw pointers and NonNull. We can already use references at various levels of composable granularity.

  • &(mut) T - all guarantees
  • &(mut) MaybeUninit<T> - opts out of init
  • &(mut) UnsafeCell<T> - opts out of noalias
  • Option<&(mut) T> - opts out of nonnull

There are some cases that aren't currently covered by the above:

  • Pointers without a lifetime known at compile time: unsafe<'a> &'a (mut) T?
  • Offsets (&T only has provenance over the specific T, not adjacent memory)
  • Casts
  • Unaligned and volatile accesses
  • etc

We could focus our efforts on enabling these things, then we wouldn't need raw pointers at all, and the programmer can opt out of exactly which guarantees they need to.

The other thing I've seen mentioned is to, with an edition, switch raw pointers to be non-null by default. Then *mut T would be what NonNull<T> is today, and Option<*mut T> would be what *mut T is today.

3 Likes

I'd love to nuke our current raw pointers and replace them with something better.

Because just non-null isn't enough, to really be good we also want the option to have a validity invariant for alignment too. It'd be nice to nuke read_unaligned and just be able to read whatever using the known alignment of the pointer, whether that's bigger or smaller than the type's ABI alignment.

(And it'd be nice to just say that the only want to have null pointers is via None.)

2 Likes

My use case is FFI, where I can either use *const T and *mut T, but unable to convey that they can't be null, or NonNull<T>, which feels more Rust-y, but then looses ability to provide const/mut distinction. This is a bit frustrating and makes data structure definitions worse than they could have been.

Making default pointer type non-null and using Option<*const T> to express pointer that can be null looks way more appealing to me than current status quo. It'll also make raw pointers more consistent with regular references (which can be wrapped with Option<> for a similar effect. After all it seems to be more common to have non-null raw pointers, from my limited experience with C/C++.

In fact it may make them more similar than different, only lifetime presence stands out. Maybe a way to erase the lifetime like proposed above for <'a> &'a T or even something simpler like &'' T is all that is really necessary to abandon raw pointers? :thinking:

Digression about numbers

After writing above, I though about numbers. We sort of have a similar situation there with u* and NonNullU*, but there having nullable default seems to make more sense.

Deprecating/replacing raw pointers is a bold proposition, even over edition, though I agree that'd be a positive step forward.

I have seen various people talk about this over the years. (E.g. Rust's Unsafe Pointer Types Need An Overhaul - Faultlore)

Is anyone actually working on this?

I don't like the wrapper types because often doesn't matter if you're just toggling orthogonal properties. What's the difference between MaybeUninit<UnsafeCell<T>> and UnsafeCell<MaybeUninit<T>>? Especially if you want to add or remove one of those properties but keep the others that can require yoinking one of the types out of the middle of the stack which is quite ugly.

A Ptr<T, NOALIAS=true, NULLABLE=true, ALIGNED=false, ...> would represent this orthogonality better imo. To avoid the boilerplate one could then define a bunch of aliases for the commonly used combinations.

6 Likes

const and mut is not really sufficient, we really should have a third type for pointers to owned memory:

Another thing I've been pondering: we could also have the one primitive pointer type be entirely untyped, to avoid the variance questions and such. Would also help avoid the bazillion casts we currently end up with in MIR when you try to do stuff with pointers.

Then you basically just end up with Ptr<const A: ptr::Alignment>, and pass the type type as which you want to read/write, and other types can be wrappers around that as needed.

(Somewhat inspired by the good things that came from the Opaque Pointers — LLVM 21.0.0git documentation transition)

7 Likes

Would this be like void* in C? How is it different than *mut ()?

While there are use cases for type erased pointers, I think the goal should be to guide developers to opt out of as few safety guards as possible. 99.9% of the time I absolutely want typed pointers so the compiler ensures I don't screw up.

As an internal representation detail it is fine, as a rust user I don't have any opinions on the internals of MIR, but as a user visible type I'd be wary of it being abused and making bugs harder to find. Writing unsafe rust is too tricky as it is already.

(I'm going to say that writing unsafe Rust with raw pointers today is harder than writing the same thing in C or C++. This is in stark contrast to safe Rust and "library-unsafe" Rust[1] which is much easier than those two languages. For context: I have a decade of systems level C++ experience, and about 3 years of systems level Rust experience at this point. )


  1. E.g unsafe around str UTF8, unsafe Sync/Send, that sort of thing. Any unsafe that isn't about raw pointers basically. ↩︎

1 Like

Well, for one you don't have to pick if *mut () or *const () :stuck_out_tongue: But also it'd be more like *(mut|const) CanonicalExternType, so you can't uselessly .read() it as a unit.

I think essentially all raw pointers should be wrapped, because there should be more safety guards. Even if just in something like a CppRef<'a, T>, as discussed in the custom receivers RFC. But that type would add PhantomInvariant or whatever as desired.

Once I'm dropping fully to raw pointers, I find that it's often because I'm doing something weird anyway, and getting rid of the cast/cast_mut/cast_consts would be nice.

Rust's raw pointers were a major pain point in zerocopy, where we found it difficult to keep track of the relationship between implicit invariants we imposed on our raw pointers, and the safety requirements of the stdlib's unsafe methods.

We've since migrated to our own abstraction over the pointer types, Ptr<'a, T, I>, where 'a is the lifetime of the referent, T is the type of the referent, and I is a typestate encoding the invariants known about the referent.

For example, Ptr<'static, NonZeroU16, (Shared, Aligned, Initialized)> is a static, shared pointer to initialized memory that's well-aligned for a NonZeroU16 (but not necessarily non-zero). The zero-cost transition methods between different typestates are unsafe, but once the right typestate is reached, operations like turning a Ptr into a &T are safe.

This has drastically improved the quality of our unsafe code. Rather than prove long lists of safety properties on pointer ops, we instead provide narrowly-scoped safety comments on the typestate transitions (e.g., unsafely asserting that the referent of the pointer is well-aligned).

20 Likes

Any chance of spinning that out into its own crate?

5 Likes

Yes! It's a tricky component to spin out, but we're working on it!

One challenge we've stumbled upon is abstracting over variance. Ideally, Ptr<'a, T, (Shared, ..) would be covariant over T, and Ptr<'a, T, (Unique, ..) would be invariant over T. We initially thought we might be able to solve this with GATs, but GATs are always invariant.

This has lead us to begrudgingly consider that we might need two Ptr types, one covariant and the other invariant. We really don't want to duplicate that much code! If anyone has ideas of how to abstract over variance, we'd love to hear them.

3 Likes

If there is a relatively simple language level change/addition that could make this easier I'd love to hear your thoughts on that. Something like that might be worth it for this Ptr type alone.

Variance only makes sense with respect to certain generic parameters. Would it be possible to treat the mutability[1] of the pointer specially by passing the T as an argument to the appropriate typestate struct?

I mean something like this:

Ptr<'a, Covariant<T>, (invariants..)>
Ptr<'a, Invariant<T>, (invariants..)>
Ptr<'a, impl Variance<T = T>, (invariants..)>

Playground


  1. uniqueness/sharedness ↩︎

Note, only &UnsafeCell<T> opts out, &mut needs Tracking Issue for RFC 3467: UnsafePinned · Issue #125735 · rust-lang/rust · GitHub

1 Like

Why not handle the 3 conditional cases rust-side before relating to the c++-ffi-fn ?

It isn't actually C++ FFI, but Rust<->Rust (across VM abstraction), but the point is not whether some checks can be done manually, but rather whether certain contracts can be expressed on type system level. And looks like there is plenty of people wanting to express various invariants that are not possible to express today, at least not ergonomically/with standard library.

On Looking at just the *const T to have its own type.

Is the challenge for rust to define what is meant by *const T? For example, rust can permit inner mutability of any field in struct T, so would *const T mean to be a read-only version of a data-type where the once mutable fields have been made read-only const? For the provenance of a new const pointer? Also would the definition of *const T be to exclude any impl added to T, for simplicity to pass just the data? If so, would that suit the use-case? I think it might be the most simple and straightforward version, eg

*const T = NonNullConstAllDataOnly ?

Would the ask then be for a const T* to be a stripped down T, to only the public fields in the struct, omit impls, omit traits, change inner mutable fields to constant pointers to the owner reference? Then lastly, the developer would be responsible for knowing that there would Be thread safety and lifetime safety for all the raw pointers contained within NonNullConstAllDataOnly ?