Idea: introduce core::ptr::read_owned

newpavlov · June 21, 2023, 4:14pm

I have a relatively large type Foo which has consuming method foo. In my code I work with *mut Foo which semantically "owns" the underlying Foo value (i.e. we can safely call drop_in_place on it).

I want to call foo for value behind this pointer. I can write something like this:

let p: *mut Foo = ...;
let val = core::ptr::read(p);
val.foo();

This code works, but it has a big problem: compiler can not modify the data behind p, thus it has to copy Foo onto stack first and then call foo on the copied data. Obviously, it's not efficient for large types.

I wonder if it's feasible to add something like:

/// Reads the value from `src` by moving it.
///
/// The memory behind `src` must not be read or written before the
/// return value is dropped. If `T` is not `Copy`, reading the memory after
/// the return value is dropped can cause undefined behavior. 
pub unsafe fn read_owned<T>(src: *mut T) -> T { ... }

It would be equivalent to pointer dereferencing, but without the Copy bound.

HeroicKatora · June 21, 2023, 5:33pm

What might also work is a form of builtin owning pointer type here. Maybe that best parallel we have for this is Box<dyn FnOnce()>. By builtin magic it's somehow possible for op-call to consume that by-value despite only having a pointer, as an unsized type.

In general, a pointer owning the value but not its allocation is something still missing from the core language. Sometimes that's called stack-box or similar in crates. Maybe it'd would be possible to have as a builtin motivated by providing the ability to call methods that consume its value, somehow without those methods needing to be aware, or naming it as the receiver type? (or via DerefTake but that seems quite a huge feature to introduce at the same time).

newpavlov · June 21, 2023, 7:50pm

Yeah, something like *own Foo would work as well. It also could be a good hint for C APIs written in Rust. Maybe it could be implemented as *mut Foo, but with allowed dereferncing of non-Copy types?

scottmcm · June 21, 2023, 10:20pm

The call to ptr::read compiles to val = copy *p; in MIR now.

I wonder if we could largely address this with a read_move that compiles to val = move *p in MIR, so that MIR optimizations could then just reuse the *p place instead of actually making the copy to the temporary.

But that might get entangled in the "what can you do after moving out of something" conversation...

newpavlov · June 21, 2023, 11:06pm

I think semantics should be similar to drop_in_place, no? After you moved data out of pointer, the memory behind this pointer effectively becomes uninitialized, i.e. you can not read it before writing a new value into it.

WorldSEnder · June 22, 2023, 12:00am

That is because it has to assume that you wanted to leave a valid bit pattern for Foo in whatever was pointed to by that pointer. To allow you to move destructively out of that place, it should be possible to let the caller pass in a *mut MaybeUninit<Foo> instead.

I think a better approach is outlined in this gist. We have a SurelyInit struct which encapsulates the guarantee that a specific MaybeUninit is initialized. Crucially, it has a take() method which moves out that MaybeUninit and leaves behind whatever garbage the compiler wants to (but marks the memory uninitialized). You would then write your method consuming the data, passing a SurelyInit<'_, LargeStruct> instead of *mut LargeStruct:

fn consume(place: SurelyInit<'_, LargeStruct>) -> u8 {
    let val = place.take(); // This might look like it moves the struct
    // in reality, the copy is avoided under mild optimizations
}

Godbolt example to show assemly.

newpavlov · June 22, 2023, 12:31am

Raw pointers to MaybeUninit are virtually useless. Raw pointers do not provide any guarantees about validity. They can easily point to uninitialized or unaligned memory, or even be null pointers. Your mem::replace trick also does not work properly as can be demonstrated by this snippet: Compiler Explorer (the same happens with your snippet as well).

As noted by @scottmcm the problem is simply that the language currently does not expose an operation with the necessary semantics.

WorldSEnder · June 22, 2023, 12:56am

Seems I was under the mistaken assumption that mem::replace was sufficient here. But I don't understand why my original godbolt example works as intended (generates equal assembly), but taking a &mut to the data taken out of the MaybeUninit and passing that ref/pointer to some other function seems to force a mem-copy, see commenting/uncommenting this marked line

newpavlov · June 22, 2023, 1:04am

It's because your consumption code does not perform any mutation, so compiler is able to use the "replaced" memory directly. Even a simple side-effect free mutation can trigger insertion of memcpy: Compiler Explorer

scottmcm · June 22, 2023, 5:15am

This is not quite what happens today, though. It can't be what drop_in_place does, because ManuallyDrop::drop exists, and ManuallyDrop preserves niches, which means that it must preserve validity invariants, which means that drop_in_place overwriting the memory with undef can't happen because that would be UB.

SkiFire13 · June 22, 2023, 7:12am

That doesn't prevent other code from writing to it though, overwriting the value that was supposed to be moved out. To prevent this you would also need the same semantics of &mut T, which guarantees it will be the only pointer from which writes are allowed.

HeroicKatora · June 22, 2023, 10:29am

Right, that's what 'stack box' usually does. For instance, some code I wrote a few months back: LeakBox in static_alloc::leaked - Rust . With several unsafe methods for passing a value as a mutable reference such as from_raw for a pointer etc.

This usually requires promising that there are no other underlying mutable references active after the call (i.e. it is sound to deinitialize the value, similar to ManuallyDrop::take). However, for some types it is not actually necessary. For instance, using MaybeUninit<T> is always sound and so are T: Copy types. (All types that would be sound as union fields, but that's just an unproven observation!). This is because 'dropping' these types does not actually invalidate any other later read from that place. The MU in particular makes it possible to provide a parallel to Box::assume_init and its safe alternative, which provides a fully safe path for creating the type if the underlying place was declared with MU<T> as its type. That is, I think, very very cute.

There's a few other observations about soundness of such a wrapper struct hidden in the documentation with confusing naming..

If added to std the implementation could be switched from NonNull<T> to Unique<T> for the pointer, too, thus resembling a normal &mut _ more closely.

jrose · June 22, 2023, 5:17pm

std::mem::needs_drop is probably close to the right condition, including the part where it says “this can conservatively return true for everything”.

HeroicKatora · June 22, 2023, 6:58pm

Expanding the condition form Copy to !needs_drop() seems a little rushed, and I'm not sure the reason. For the Copy trait the absence of Drop is a compile guarantee, for the function it is probably somewhat right but I would not currently use it for that purpose. In particular, we're implicitly passing a (retained) value of T back to the owner of the place after *own is dropped. There should be a justification where it comes from and this is not explicitly what needs_drop provides when returning false.

newpavlov · June 26, 2023, 6:03pm

Unfortunately, it does not work: Add `ptr::read_move` by newpavlov · Pull Request #113066 · rust-lang/rust · GitHub

HeroicKatora · June 26, 2023, 8:58pm

(Edit: below are somewhat raw thoughts on whether the problem is a deep property of places, the notion how places are fundamentally created, how they are named, and where Rust requires creating a place but maybe it should not. I think I'm still trying to really understand where the problem of the thread comes from in the first place, before finding a solution for it).

There's a pretty fundamental property in a lot of the high level semantics: places are disjunct when their declaration points differ. (Which makes the temporary value sequence points in dereference chains somewhat devious to pure 'place pivoting' / place naming parts of expressions). For that reason an assignment like that always creates a conceptually new place and optimizations will definitely want to exploit the disjuncitve property. Trying to do this via any intermediate form that does declare a new place will thus require very carefully revisting each optimization which is probably infeasible. This disjunctive property is even much stricter than noalias, since it's a fact not about the current borrow/tag stack but on the allocations underneath. (Even though I don't think such a fully proper attribute is available in llvm currently so maybe it's feasible to solve in HIR/MIR for the moment. That would bar adding some of these attributes though for function attributes).

Your syntax request seems to have two major effects to me:

Change the compiler's initialization bool of an existing place, i.e. treat it as partially moved of necesary. This is also what moves the value ownership.
Introduce a new name for that partial place that will alias with existing places. (though that existing one may be a temporary one in the case of *raw_ptr).

A bit of that is also fairly engrained in surface language. Some syntax only works by creating new places in the semantics. I'd imagine the above would only be ergonomic if all of these instances had some alternative form where one could provide (and consume/fill) such an owning reference to an existing place instead, in order to keep the substitution principle somewhat working. Maybe I'm overestimating the problem, a look at the syntax reference would be helpful. In particular:

function arguments are always new places with respect to the caller scope.
return values are always new places, but there is actually quite a lot of motivation for relaxing this: multiple return values, construct-in-place, multiple return paths could all potentially be enabled? The details would need to be fleshed out though.
other value-returning expressions, in particular struct construction is always into a new place, which makes DSTs 'inconstrictible'. This overlap for this inverse operation to yours is somewhat interesting?

Sure, it would be useful if we could decouple the concept of value from its underlying place where the language currently mixes it quite often, as in function parameters. Incidentally, providing more explicit ways to manage places distinct from the values they might contain would also relate to in-place initialization as the inverse as well as multiple return values.

But maybe I'm just hallucinating some unfulfillable mess of language features working together

newpavlov · June 26, 2023, 9:27pm

Honestly, I am not knowledgeable enough about language semantics and compiler internals, so most of your post goes over my head.

The thing I want is to efficiently bridge unsafe pointer-based code (e.g. in FFI or certain unsafe wizardry) with consuming functions and methods implemented in safe Rust. I know that ABI-wise the compiler passes large arguments as pointers even if in Rust code they are passed as values. So if we have an "owning" pointer *own T, I would like to use it in place of T, then depending on function ABI the compiler would either use the pointer as-is, or will insert implicit read. I know that there is a whole world of language semantics into which we have to fit this feature, but, unfortunately, I can not make meaningful suggestions in this area.

Vorpal · June 26, 2023, 9:41pm

I'm not a language expert, just a somewhat novice user, but isn't it possible to work around this whole issue using a wrapper that holds the pointer internally? This wrapper could be consumed, with the proper semantics, calling into underlying unsafe methods on the inner type.

This seems similar to a smart pointer to me. If arbitrary self types was stabilised, then support for generic such smart pointers with consuming behaviour could be implemented directly on the inner type. As of right now I think you need the logic on the outer wrapper type.

HeroicKatora · June 26, 2023, 9:46pm

The largest contention point is only being able to call existing and independently defined by-value functions (including, via an fn-value where everything must work with the existing ABI for such a function). Maybe we just compare the current LLVM-IR for that.

; struct Large { val: [u8; 512} } 
; fn test(_: Large)
define void @_ZN10playground4test17h78c2f8a40f6d0b1dE(ptr noalias nocapture noundef dereferenceable(512) %val) unnamed_addr #1 personality ptr @rust_eh_personality

; fn test2(_: &mut Large) or ABI equivalent type
define void @_ZN10playground5test217h1bf6053f48c8c597E(ptr noalias noundef align 1 dereferenceable(512) %val) unnamed_addr #1 {

It definitely looks like the 'owning pointer' could be passed to the by-value function directly with suitable language primitives to enable this confounding of nominally different argument types and the proper translation (read: move) for types which are not passed by-pointer in the ABI. At least for now the signature has no attributes with caller-requirements beyond those for a normal &mut _. Only nocapture which is a requirement on the callee.

newpavlov · June 26, 2023, 9:49pm

Consuming functions/methods are external to my code, so arbitrary self types will not help. As mentioned earlier, there is a certain magic which allows to call consuming methods on Box<T>, but AFAIK it's currently not accessible to users.

Topic		Replies	Views
Smart pointer which owns its target language design	42	2751	October 8, 2022
[Pre-RFC] Non-builtin owned references language design	3	1161	March 25, 2019
Automatic boxing for receiver Box<Self> language design	5	740	September 23, 2023
`*move` raw pointers language design	22	1387	October 2, 2024
Moving out of raw pointers with FnOnce language design	13	1809	May 22, 2022

Idea: introduce core::ptr::read_owned

Related topics