Hey all. I’ve had this on the back-burner for a while. Since I’m disappearing, I figured I’d just drop this here for people to discuss, and maybe for someone else to pick up.
Summary
ptr::Unique
-> ptr::Owned
add ptr::Shared
add many ptr
fns as methods on raw pointers
Motivation
Raw pointers are… annoying… to work with in Rust. Not necessarily for
good reasons. The best thing about *const
and *mut
is that they exactly
mirror C pointers, which I’m told is really important for FFI.
However there are other reasons to use raw pointers: when trying to model stuff that is indirected, but where references are inappropriate. However several peculiarities of raw pointers make them incredibly frustrating for this usecase.
*mut T
is invariant over T
Even though *mut
is the natural pointer to use to model an owned value,
its invariance interferes with this. Most owning things want to be able to
mutate the data they own (in which case *const
doesn’t work), but are sound to
be variant over the data they own because mutating operations only happen through
an &mut self
, which enforces invariance as appropriate.
For the most part, I believe this to just be a papercut, though. I don’t think many
people would run into much trouble if their custom container/pointer types were
needlessly invariant. Also since you can get an &mut T
out of an &*mut T
, this is
technically the right default. You definitely don’t want to be incorrectly
variant.
However invariance is a “no going back” thing. Although you can opt out of variance, you can’t opt out of invariance once one of your type parameters establishes this (variance computation is like a boolean that gets AND’d together).
To work around this, you need to store a *const T
and cast it to a *mut T
whenever you want to do mutations.
Raw pointers do not claim to own their referrents
This is a problem for dropcheck. If you contain a *mut T
or *const T
which
points to data that you own (read: that you will drop), dropcheck needs to know
that to do its job. If you do own your referrent and don’t claim it, it’s
possible to create unsound Drop impls.
This situation is the polar opposite of the variance issue: rather than being
a restrictive but safe default, it’s a liberal and unsafe default. However this
behaviour is motivated by the desire to not be an inescapable trap like the
invariance of *mut T
is. The only tool we provide for specifying ownership
of T
is PhantomData<T>
. Like invariance, once you’ve established ownership
of a type, there’s no way to undo this. There’s no way to use PhantomData<!T>
or something to specify that although you appear to own a T
, you in fact
don’t.
Since it’s plausible to use raw pointers in a non-owning context, this is
the most general default to take. Unlike the variance problem, there wouldn’t be
a usable way to opt out of this problem like using *const T
. We would have had
to introduce two new pointer types at the lang level to enable opting out of this.
Raw pointers are nullable
C pointers are nullable, raw pointers are C pointers, ergo raw pointers should be nullable. However this is really frustrating to work with for a Rust programmer. These are the only nullable pointers in Rust. Admittedly, they’re also the only pointers that can be dangling, which is a much more pernicious and undetectable problem!
The biggest nuisance is being unable to efficiently talk about an absent raw pointer via Option. You can manually encode it as just “if it’s null it’s absent”, but this means that you have to remember to check, and it’s not at all enforced or represented by your types.
This is a bit of a moot point for std abstractions, which actually use 0x01 (heap::EMPTY) as
the sentinel for “absent” so that they themselves can be null-pointer optimized
(Box, Rc, Arc, Vec, HashMap, etc…). Although it is perhaps worth noting that
this sentinel value is never (to my knowledge) checked. These types always
check whether they have a valid pointer using other state. Whether it be
size_of::<T>() == 0
, len == cap
, or cap == 0
. heap::EMPTY is just an agreed
upon garbage value to put there that isn’t null
.
Raw pointers aren’t Send or Sync
Another case of a curious default. This one can actually be overridden – you can impl both Send and !Send as appropriate. Since raw pointers are just integers, they are in principle trivially Send and Sync. Not being Send or Sync is actually basically a lint. If you’re doing stuff with raw pointers, it’s non-trivial and you may not have thought about thread-safety. So in order to force you to consider thread safety, any type that contains raw pointers must manually impl Send or Sync.
Raw pointers often require importing ptr
This is another paper-cut. Having to use ptr::read(ptr)
or ptr::write(ptr, val)
is just kinda annoying. Free functions are exposed because this allows passing
&
or &mut
to them and having them coerced as appropriate.
Historically this also hacked around the fact that doing anything other than directly re-exporting an intrinsic severally penalized the compiler. Literally a trivial wrapper function around the intrinsics would make it fall over.
Unique
ptr::Unique is the solution to many of these woes. It’s defined as follows:
struct Unique<T: ?Sized> {
_ptr: NonZero<*const T>,
_boo: PhantomData<T>,
}
And exposes the following functionality:
impl<T> Deref for Unique<T> {
type Target = *mut T;
// ...
}
impl<T: ?Sized + Send> Send for Unique<T> {}
impl<T: ?Sized + Sync> Sync for Unique<T> {}
Semantically, it’s specified to behave as if you literally contain a value of type T in your struct – which as “merely” an implementation detail is indirected to the heap. It’s not clear to me if the data must necessarily be on the heap, though it seems like that’s the only way to properly resolve its semantics (in particular, Send and Sync).
Consequently, you get the following behaviour:
- variant over T (yay!)
- claims to own T (yay!)
- derefs to
*mut T
, sounique.offset(idx)
produces a*mut T
(yay!) - non-null in a way the language understands (so null ptr optimizable) (yay!)
- derives Send and Sync as if you contained a T (yay!)
However it exacerbates the ptr::read
problem. Now you need to
ptr::read(*self.ptr)
, which is incredibly confusing.
Also, while it’s perfect for Vec and Box, it’s semantically inappropriate for Rc and Arc (which do not uniquely own their data, but rather share it). In practice the only thing is does wrong is derive Send and Sync (which e.g. Rc could opt out of). However it is in principle desirable to perform alias analysis to Unique: we should be able to use the fact that Box, Vec, etc contain a pointer to some uniquely owned data on the heap. In LLVM parlance, pointers derived from a Unique can only be aliased by other pointers derived from it. We do not currently provide this information to LLVM (to my knowledge).
Finally, Unique is a bit of a confusing name, as it suggests a stronger claim than is really intended. It is not that this is a unique pointer to that data (it’s fine to take references into it), but rather that it is the owner of that data. I personally was very confused when I first encountered Unique because of this. I wasn’t sure if some things could be marked as Unique because they didn’t seem to be unique in practice. But that wasn’t the issue at hand. What mattered was that the pointers were the only owner of the data.
Detailed Design
To better resolve all these issues, this RFC proposes the following changes:
Rename Unique to Owned
This, in my mind, provides a better intuition as to the meaning of this type.
Add ptr::Shared for Rc and Arc
At worst this is just a nice de-duplication of work between Rc and Arc. Note
that this is blocked on the fact that PhantomData prevents DST coercions
of Shared<T>
or Unique<T>
(which would break Rc and Arc). Box dodges this
by being legitimately magic, and having its definition totally ignored by
the compiler for DST stuff. I have a PR and RFC open for this.
Add ptr functions as methods on raw pointers
In particular:
-
read
,copy
,copy_nonoverlapping
on*const
and*mut
-
write
,write_bytes
,replace
,swap
on*mut
This enables the following expressions:
ptr.offset(idx).read();
ptr.offset(idx).write(elem);
unique.read();
This requires less imports, is more pleasant to read, and also avoids the
nastiness of manually derefing a Unique (or Shared) to be passed to
read
or write
(it genuinely looks like a normal ptr).
Drawbacks
Duplicated functionality in functions and methods for the ptr stuff.
Shared seems to have limited scope – worth adding just for Rc and Arc?
Unresolved Questions
Is the intrinsic wrapping problem still a thing? Will moving over to methods on raw pointers cause serious regressions in compile time or codegen quality?
Can anything else use Shared?
Do FFI authors want some types with better semantics (non-nullable, owned, etc), or is *mut
and *const
“good enough”? acrichto and sfackler seemed to think just mirroring headers was the right way to go.