[Idea] Return NonNull instead of a raw pointers on `into_raw` methods of smart pointers

KirilMihaylov · March 23, 2024, 6:10pm

Preface

The standard library features APIs like Box::into_raw which strip down the smart pointer to it's raw form. This is really useful when defining custom smart pointers.

The problem

It is feels strange though, that an API from a data structure that verifies it's allocation succeeds, null-checking the alloc result, returns *mut T instead of NonNull<T>.
I understand that it became stable later than Rust 1.0 which most likely is the reason for it not being the case right now.

Due to the approach that Rust takes to handle raw pointers, it defines "safer" APIs, such as <*const T>::as_ref.
The thing is that users who wish to use those APIs has to either do pattern matching or call Option::{unwrap|expect}, which is at best unnecessary when dealing with pointers returned from APIs like Box::into_raw. While NonNull can be used to check it once and then work directly from there, it still comes at the cost of having to go through the same process, even if only once.

The idea

APIs which strip smart pointers and return the underlying raw pointers, *mut T, should instead return NonNull<T> when it is guaranteed that the underlying raw pointer is non-null.
There are already APIs which currently don't guarantee that, such as rc::Weak::into_raw according to it's documentation. This will only target APIs that can make such guarantees.

Drawbacks

This change would require different APIs between pre-change and post-change editions.
While maintenance should be minimal, it still wouldn't be non-existent. Furthermore, it should be synced between said pre- and post-change editions.

pitaj · March 23, 2024, 6:52pm

Pattern types could help resolve this:

pub fn into_raw(this: Box<T>) -> *mut T is 1..

Vorpal · March 23, 2024, 8:54pm

That is neat. What is the status on those? I have seen them discussed for a long time, but so far not much have actually happened last I looked (a few weeks ago).

EDIT: I can't even a rfc for them, was one even submitted?

scottmcm · March 23, 2024, 9:45pm

It's not obvious to me that overloading these on edition would be worth it, especially since the migration isn't necessarily trivial.

Adding different versions that return nonnull would be way easier, and we could soft-deprecate the old ones if needed.

But really, I think a big part of the problem is that neither the existing raw pointers nor NonNull are the best form for the type, so I'd be tempted to just do nothing until we find a better type -- one that could have an optional alignment niche, for example, since most raw pointers actually have that just fine anyway.

KirilMihaylov · March 24, 2024, 2:03pm

I see where you are coming from. Separate APIs might really be the better thing to do.
It would reduce some burdens mentioned.

About better forms of the pointer representations, I didn't quite catch what you meant there. Could you please elaborate what did you mean by "optional alignment niche"?

pitaj · March 24, 2024, 2:16pm

NonNull has a null niche: because it can't be 0, the compiler can use that for layout optimizations. Option<NonNull> is that same size as NonNull.

I'm many cases, an allocation has greater than byte alignment. For instance, u64 has an alignment of 8. This means the least significant bits in the address will always be 0. In the u64 case, the address will always end in three zeros.

So we can use the same niche property here. For instance, you could have Result<AlignedNonNull<u64>, bool> be just the size of a pointer.

The type could look something like this:

pub struct AlignedNonNull<T> {
    inner: *mut T is mem::align_of::<T>()..
}

CAD97 · March 24, 2024, 4:35pm

The refined type which has self % align_of::<T>() == 0 as a validity requirement^[1] has been informally referred to as ptr::Aligned<T> or sometimes ptr::WellFormed<T>. This also ties into the informal concept of &'unsafe T, which would be a pointer with a requirement that it was at some point a valid reference, but may have since been arbitrarily invalidated.

General consensus atm seems to be to use ptr::NonNull "at rest" and *mut T "in motion." &raw syntax (what unstably powers addr_of!) is waiting on T-lang discussion on whether we want to pursue making current pointer kinds nicer to work with or a new pointer kind. Pattern refinement is a decent possibility.

Any changes to existing API should probably wait on such decision.

Disclaimer: member of T-opsem, but this is entirely my own opinion and recollection.

Safety: if this is violated, safe code may result in violating validity. Validity: if this is violated, you've done a UB and all bets are off. ↩︎

KirilMihaylov · March 24, 2024, 5:30pm

So what you are suggesting is to have a pointer type which ensures alignment. That indeed would also be useful for optimization purposes as well as propagating safety guarantees.
Such a thing would require at least a few tweaks to the compiler itself, for it to support it though. I am unaware whether such route is being explored as of now. If it is, that would fit in nicely.

Another option is to define a higher-order type which can be used as a factory and have the into_raw_v2 /placeholder name/ take it as a generic parameter. After all, since smart pointers already operate on valid references, any raw pointer type-constructor will be fine, as references are already have the strictest requirements.

Vorpal · March 24, 2024, 9:25pm

Doesn't this just reserve the first few values? As opposed to the least significant bits? A more correct representation of this should be something like & !0x7 (for reserving 3 bits as unused).

That makes a lot more niches available, since now the remaining bits don't have to all zero to use the niche, but the lowest bits of the pointer have to be all zeros.

You kind of need that to do proper pointer tagging without unsafe.

jrose · March 24, 2024, 10:50pm

Right, with full generality you want Result<Aligned<A>, Aligned<B>> to be a single pointer in size. (I forget if this happens with references and boxes today.)

CAD97 · March 25, 2024, 1:27am

This can't be done, because you need to be able to create &Aligned<A> or &Aligned<B>. It's the same reason why niche optimization can't use padding. You need something like "move only" fields to be able to do this kind of repr adjusting niche optimization.

But the extra validity restriction does still matter, because it would allow niche optimization to niche data into the high bytes of the pointer if the low byte is an always-invalid value.

jrose · March 25, 2024, 2:11am

I keep forgetting.* I guess it’s still possible to have more than 7 other cases in a (possibly nested) enum, but that does make it much less interesting in the most common case of “pointer-aligned on a 64-bit platform”. Having niches for the first N values covers the most common case.

* Swift made the opposite tradeoff: you can’t get the address of arbitrary enum payloads, or even struct fields, but that means when you do need a pointer or even a value there might need to be a mask step.

system · June 23, 2024, 2:11am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Seperating NonNull into NonNullMut and NonNullConst	5	2871	January 10, 2020
Get a NonNull<T> from UnsafeCell<T> libs	2	704	May 24, 2023
Array reference from raw parts missing? libs	7	745	November 10, 2022
`*move` raw pointers language design	22	1361	October 2, 2024
Pre-RFC: raw pointer cleanup	8	3967	March 25, 2019

[Idea] Return NonNull instead of a raw pointers on `into_raw` methods of smart pointers

Preface

The problem

The idea

Drawbacks

Related topics