Idea: introduce core::ptr::read_owned

There's two separate kinds of magic: the ability to DerefMove, i.e. deallocate and put the value into a temporary place with a single application of operator *. This does however move the value before calling its by-value method.

pub fn test_box(val: Box<Large>) {
    val.hey()
}
fn test_box(_1: Box<Large>) -> () {
// …
// Since this place is local it must be disjunct with the allocation of the box.
    let mut _2: Large;                   // in scope 0 at src/lib.rs:10:5: 10:14
// fragment where we move out of the box:

    bb0: {
        StorageLive(_2);                 // scope 0 at src/lib.rs:10:5: 10:14
        _5 = (((_1.0: std::ptr::Unique<Large>).0: std::ptr::NonNull<Large>).0: *const Large); // scope 0 at src/lib.rs:10:5: 10:14
// actual move introduced here
        _2 = move (*_5);                 // scope 0 at src/lib.rs:10:5: 10:14
        _0 = Large::hey(move _2) -> [return: bb1, unwind: bb4]; // scope 0 at src/lib.rs:10:5: 10:14
                                         // mir::Constant
                                         // + span: src/lib.rs:10:9: 10:12
                                         // + literal: Const { ty: fn(Large) {Large::hey}, val: Value(<ZST>) }
    }

    bb1: {
        StorageDead(_2);                 // scope 0 at src/lib.rs:10:13: 10:14
        _3 = alloc::alloc::box_free::<Large, std::alloc::Global>(move (_1.0: std::ptr::Unique<Large>), const std::alloc::Global) -> bb2; // scope 0 at src/lib.rs:11:1: 11:2
                                         // mir::Constant
                                         // + span: src/lib.rs:11:1: 11:2
                                         // + literal: Const { ty: unsafe fn(Unique<Large>, std::alloc::Global) {alloc::alloc::box_free::<Large, std::alloc::Global>}, val: Value(<ZST>) }
                                         // mir::Constant
                                         // + span: no-location
                                         // + literal: Const { ty: std::alloc::Global, val: Value(<ZST>) }
    }

(Interesting, the sequencing of call to deallocation is actually how it would be expected with owning pointer deinitialization. This is very surprising to me.)

The other, real, kind of magic is restricted to Box<dyn FnOnce> and its variants. These really do not require a move to be called which would otherwise be impossible due to having to move a dynamically sized object. This only works for the lang-item the call trait specifically though and I do not know the details really.

Yeah, I thought that calling consuming method on Box<T> does not needlessly copy data around, but I was wrong (note memcpy on line 40, even though compiler could've immediately called foo). I would say it's a clear deficiency which ideally should be fixed.

Consuming functions/methods are external to my code, so arbitrary self types will not help.

In that case a simple newtype wrapper around the pointer should work and would in fact be safer than the pointer (since the newtype wrapper can implement proper ownership and aliasing semantics internally). If the freestanding functions take those wrappers instead, the whole problem would be solved, and in a less error prone way.

I think you misunderstand the problem. Please examine the examples in this thread carefully. We are not talking about how to use raw pointers with consuming functions, but about how to eliminate the unnecessary memcpy generated by the compiler.

I might indeed have misunderstood. But I'm proposing that for an inner type T you have a function taking a mutable pointer to T. Not consuming (as far as the compiler knows). This function is unsafe (since it has consuming behavior that makes it illegal to use the object afterwards).

Then you have a newtype wrapper over mutable T pointers that implement a wrapper function with ownership semantics (where you only need to copy the pointer itself). This can work for both free and member functions.

I believe this side steps the original problem.

I would provide a code example, but I'm writing this on my phone.

As I wrote earlier, it's not possible since T and its consuming methods are defined in external code.

Try to eliminate memcpy in this example without modifying the code before the "my crate" comment and changing signature of ffi_foo. Code generated for ffi_foo should ideally look like this:

example::ffi_foo:
        jmp     qword ptr [rip + example::Foo::foo@GOTPCREL]

Dunno if this would fix it, but you could try Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process. by erikdesjardins · Pull Request #112157 · rust-lang/rust · GitHub and see whether the improvements that makes are enough to get LLVM to optimize away the memcpy.

The standard name used for such an owning reference is usually &move T or &own T. There are a few crates that offer a library version of a MoveRef type; I'm personally partial to the moveit crate (although it's more focused on address-aware initialization than move references).

For types which are not Copy, we can and do overlap multiple places when the places liveness does not overlap. When moving from one place to another (such as when calling a function), the source place has its liveness end before the liveness of the new place starts.

This is perhaps somewhat questionable when the addresses of the places are (potentially) observed, thus the overlap is (potentially) observed, but compilers do this in practice, and it is likely that T-opsem will pick an opsem that permits this. In either case, it is of course allowed if it's provable that one or the other place does not inspect the place address.

(I am a member of T-opsem but not speaking on behalf of the team.)

Example
Rust
pub struct Data([u64; 4]);

extern "Rust" {
    fn by_ref(_: &Data);
    fn by_val(_: Data);
}

#[no_mangle]
pub unsafe fn test(x: Data) {
    by_ref(&x);
    by_val(x);
}

Optimized MIR
fn test(_1: Data) -> () {
    debug x => _1;                       // in scope 0 at src/lib.rs:9:20: 9:21
    let mut _0: ();                      // return place in scope 0 at src/lib.rs:9:29: 9:29
    let _2: ();                          // in scope 0 at src/lib.rs:10:5: 10:15
    let _3: &Data;                       // in scope 0 at src/lib.rs:10:12: 10:14
    let _4: ();                          // in scope 0 at src/lib.rs:11:5: 11:14

    bb0: {
        _3 = &_1;                        // scope 0 at src/lib.rs:10:12: 10:14
        _2 = by_ref(_3) -> bb1;          // scope 0 at src/lib.rs:10:5: 10:15
                                         // mir::Constant
                                         // + span: src/lib.rs:10:5: 10:11
                                         // + literal: Const { ty: for<'a> unsafe fn(&'a Data) {by_ref}, val: Value(<ZST>) }
    }

    bb1: {
        _4 = by_val(move _1) -> bb2;     // scope 0 at src/lib.rs:11:5: 11:14
                                         // mir::Constant
                                         // + span: src/lib.rs:11:5: 11:11
                                         // + literal: Const { ty: unsafe fn(Data) {by_val}, val: Value(<ZST>) }
    }

    bb2: {
        return;                          // scope 0 at src/lib.rs:12:2: 12:2
    }
}

Optimized LLVM-IR
; Function Attrs: nonlazybind uwtable
define void @test(ptr noalias nocapture noundef dereferenceable(32) %x) unnamed_addr #0 {
start:
  tail call void @by_ref(ptr noalias noundef nonnull readonly align 8 dereferenceable(32) %x)
  tail call void @by_val(ptr noalias nocapture noundef nonnull dereferenceable(32) %x)
  ret void
}

No copies in sight!


This is accurate; the current ABI-by-ref passing convention for extern "Rust" is to pass by pointer to owned/mutable value.

The base example where move elision should be possible but isn't currently done is a simple lift of the previous example from taking (abi-by-ref) by-value to taking a box that needs deallocation. While this should imho be just as possible to elide the copy — the box is known to be uniquely owned and immediately afterwards deallocated without inspecting its contents.

Example
Rust
pub struct Data([u64; 4]);

extern "Rust" {
    fn by_val(_: Data);
}

#[no_mangle]
pub unsafe fn test(mut x: Box<Data>) {
    by_val(*x);
}

Optimized MIR
fn test(_1: Box<Data>) -> () {
    debug x => _1;
    let mut _0: ();
    let _2: ();
    let mut _3: Data;
    let mut _4: &mut std::boxed::Box<Data>;
    let mut _5: ();
    let mut _6: &mut std::boxed::Box<Data>;
    let mut _7: ();
    let mut _8: *const Data;

    bb0: {
        StorageLive(_3);
        _8 = (((_1.0: std::ptr::Unique<Data>).0: std::ptr::NonNull<Data>).0: *const Data);
        _3 = move (*_8);
        _2 = by_val(move _3) -> [return: bb1, unwind: bb4];
    }

    bb1: {
        StorageDead(_3);
        _4 = &mut _1;
        _5 = <Box<Data> as Drop>::drop(move _4) -> bb3;
    }

    bb2 (cleanup): {
        resume;
    }

    bb3: {
        return;
    }

    bb4 (cleanup): {
        _6 = &mut _1;
        _7 = <Box<Data> as Drop>::drop(move _6) -> [return: bb2, unwind terminate];
    }
}

Optimized LLVM-IR
; Function Attrs: nonlazybind uwtable
define void @test(ptr noalias noundef nonnull align 8 %0) unnamed_addr #1 personality ptr @rust_eh_personality {
start:
  %_3 = alloca %Data, align 8
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %_3)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(32) %_3, ptr noundef nonnull align 8 dereferenceable(32) %0, i64 32, i1 false)
  invoke void @by_val(ptr noalias nocapture noundef nonnull dereferenceable(32) %_3)
          to label %bb1 unwind label %cleanup

cleanup:                                          ; preds = %start
  %1 = landingpad { ptr, i32 }
          cleanup
; call <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
  tail call fastcc void @"_ZN72_$LT$alloc..boxed..Box$LT$T$C$A$GT$$u20$as$u20$core..ops..drop..Drop$GT$4drop17he5d5ae873e05d42bE"(ptr nonnull %0) #7
  resume { ptr, i32 } %1

bb1:                                              ; preds = %start
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %_3)
  tail call void @__rust_dealloc(ptr noundef nonnull %0, i64 noundef 32, i64 noundef 8) #6
  ret void
}

Even a version with a write to ensure that readers external to the function are causing LLVM UB from violation of noalias doesn't get LLVM to elide the copy. (The fact that writing [0] writes to the stack copy directly indicates that LLVM does in fact know for certain that nobody else could read the value from behind the pointer.)


This is, unfortunately, just a case of a missed optimization in LLVM. Short of a significant change to how we lower MIR moves and doing more move elision optimization ourselves, there's no small tweaks to emitted MIR that we could make to convince LLVM to make the optimization.

It's always been the case, for better or for worse, that box deallocation happens at scope exit, not when it's moved out of. This is why you're able to reinitialize the box value (e.g. drop(*boxed); *boxed = new(); drop(boxed);) and use the completed box again.

Notably, this means that if you have a bunch of boxes in a scope and then move out of them, their moved-from husks will stick around taking up stack space until scope end.

If you want to have deallocation happen first, you need to introduce an extra function so the temporary can't be hoisted to drop after the function call. In some cases (e.g. [playground]), doing so can result in better codegen by the elimination of the need for a landing pad. Alternatively, something like become could be used to reorder drop timing as such.

Unfortunately, no matter the implementation of the function, the copy cannot be eliminated with this signature without further constraints.

extern "C" {
    fn callback(x: &mut [u64; 4]);
}

pub struct Data([u64; 4]);

unsafe fn muck(mut data: Data) {
    callback(&mut data.0);
}

// only change this
#[no_mangle]
pub unsafe fn entry(data: *mut Data) {
    muck(ptr::read(data))
}
Optimized LLVM-IR
; Function Attrs: nonlazybind uwtable
define void @entry(ptr nocapture noundef readonly %data) unnamed_addr #0 {
start:
  %_2 = alloca %Data, align 8
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %_2)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(32) %_2, ptr noundef nonnull align 8 dereferenceable(32) %data, i64 32, i1 false)
  call void @callback(ptr noalias noundef nonnull align 8 dereferenceable(32) %_2)
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %_2)
  ret void
}

This is because, barring further constraints, the "other side" of the FFI could be

static mut EVIL: *mut [u64; 4];

#[no_mangle]
unsafe extern "C" fn callback(x: &mut [u64; 4]) {
    // the original and callbacked values are disjoint
    assert_unchecked!(!ptr::eq(x, EVIL));
    // this is known because your side moved out of the old place
    // if you did a `read_move`, my reading the place would be UB
    // (since a move logically deinitializes the moved-from place)
    // but I can still write to both places and they don't alias

    ptr::write(EVIL, [0, 1, 2, 3]);
    let evil = &mut *EVIL;    
    *x = [4, 5, 6, 7];

    assert_eq_unchecked!(*evil, [0, 1, 2, 3]);
    assert_eq_unchecked!(*x, [4, 5, 6, 7]);
}

fn main() { unsafe {
    let mut data = MaybeUninit::new(Data([0, 0, 0, 0]));
    let ptr = data.as_ptr();
    *EVIL = addr_of_mut!((*ptr).0);
    entry(ptr);
    // the life region of the *ptr place only ends here
}}

With the signature as provided, this code will never cause any UB. Thus an optimization which eliminates the copy would be incorrect.

What is needed to make this optimization valid is some kind of &move T which confers not just scoped-unique access like &mut T, but instead globally-unique ownership permission, carrying exclusive root-level provenance for the pointed-to place. This is necessary in order to say that the above code example contains UB (accessing *EVIL causes UB because it dereferences a pointer with invalidated provenance) in order to justify eliminating the copy.

I'm not 100% convinced that this alone would be sufficient to make the optimization sound (it reduces to similar to the previously discussed case where disjointly live places can observably overlap, except that the source place is actually still live, just protected-and-clobbered for the entire live region of the new place) but is for certain necessary.

But also, if the language has some sort of &move, you can use it to not have to rely on move elision optimizations, by reifying the owned-by-ref passing convention into your code.

(Unfortunately, &move proposals are typically bogged down by trying to provide some amount of drop guarantee such that Pin<&move T> can be used. Despite previously being one of the people pushing for such, I've eventually come around to that just not being possible without impractical compromise[1].)


  1. Two workarounds exist. The first is to make &move include a reference to drop flags in the borrowed scope, such that the parent can perform the drop of the child doesn't. (This makes &move T invariant over T, like &mut T, instead of covariant, like Box<T>. It basically becomes a fancy &mut Option<T>. C++ rvalue references are basically this.) The second is still theoretical, consisting of making &move a second-class type and essentially just a parameter passing mode, such that a function receiving some &move parameter can always perfectly track the drop flag for the referenced place (or delegate doing so to someone else). How to permit putting such in single-ownership containers (e.g. Pin<&move _>) but not multiple-ownership containers (e.g. Rc) is unclear at best. (It'd probably have to live in an unsizing slot to be properly tracked.) ↩︎

3 Likes

Yes, thank you. I should have been more precise. Those are static properties of the cfg though, right? At no point is there a consideration: 'if this drop-flag is already executed, we can reuse the storage space for this unrelated place'.

Anyways this is still a new operation with regards to these semantics. In val = move *p they want to keep the place *p live, really. At least that's the only way I can imagine enforcing the semantics of guaranteeing it not be clobbered by other place-choices. As soon as it's dead, it might alias with anything else. This should also be transparent over whether the place is a dynamic or static allocation. All current semantics of value expressions always turn it into a static allocation in the current frame (and this allow freeing the memory if p refers to alloc'd memory or assigning the allocation for a different allocated-box in the case of box as observe).

I'm not sure but maybe I can understand the problem this way: A value (_: Self) at the moment always refers to a particular, full, allocation, in the SB sense too. I think what's also asked is to decouple these notions a little and allow a value to be merely a tuple of owning-pointer and drop-flag. This is necessary to make code such as this optimize as intended:

let va = Box::new((0, large_val));
va.1.consume();

There are three allocations here atm: the dynamic box's allocation, the va itself, and the temporary place created for the va.large_val passed to consume. These allocations must all be live at the same time in the code as written.

Conceptually this is currently required for by-value passing since the dynamic allocation does not only contain the value, the value we want to own and the underlying allocation differ. So a new place is declared locally to hold the moved va.1 argument. But since the *va is still live at that point, this local must be disjunct from the existing allocation, hence full memcopy ensues.

What owning pointer would achieve is create a virtual place instead, where we partially move from *va for the notion of initialization state. But instead of introducing a new allocation for the value to move into, we create only a tuple of a pointer to the memory we want the place to occupy &(*va).1 and a drop-flag. Significantly, the pointer is not assumed to be disjunct from other live allocations but merely unaliased. (I'm unsure if there's existing LLVM optimization attributes for locals that would differ). It would indeed possibly not be a root for SB which is another way in which it conflicts with the existing implication (value argument => has its place => borrow root)!

Such a thing needs a language primitive, it can not be written in user code, only partially emulated by changing the consume method as discussed above.

Would the copy be elided if the MIR representation was changed to:

fn test(_1: Box<Data>) -> () {
    debug x => _1;
    let mut _0: ();
    let _2: ();
    let mut _4: &mut std::boxed::Box<Data>;
    let mut _5: ();
    let mut _6: &mut std::boxed::Box<Data>;
    let mut _7: ();
    let mut _8: *const Data;

    bb0: {
        _8 = (((_1.0: std::ptr::Unique<Data>).0: std::ptr::NonNull<Data>).0: *const Data);
        //_3 = move (*_8);
        _2 = by_val(move *_8) -> [return: bb1, unwind: bb4];
    }

    bb1: {
        _4 = &mut _1;
        _5 = <Box<Data> as Drop>::drop(move _4) -> bb3;
    }

    bb2 (cleanup): {
        resume;
    }

    bb3: {
        return;
    }

    bb4 (cleanup): {
        _6 = &mut _1;
        _7 = <Box<Data> as Drop>::drop(move _6) -> [return: bb2, unwind terminate];
    }
}

I was using the extern function as a simple optimization barrier to simplify the resulting assembly. This modification can be used as an alternative demonstration.

Sigh... Yet another thing broken by Pin... I will need to read a bit more about &move proposals.

1 Like

After quite some thought, I'm no longer sure I really follow. It rings truish but not tautologically correct. In part the parallel &mut T actually seems to support it being possible to add, not serve as an example against it. Indeed, if &mut T had literally the same semantic for the sake of provenance s then the optimization would be sound under SB at least; and probably under most other useful semantics of Rust as well.

The arguments x starts with read-write provenance and since it appears as argument the implementor must guarantee to uphold this until return. This, in particular, makes ptr::write(EVIL, …) undefined behavior itself (or at least returning from the function after that write) since that invalidates the protected provenance. The optimizer need not uphold the program semantics after that UB occurs. With these semantics the counter-example vanishes.

Now the second part of the paragraph I follow even less, I don't know where the connection to Pin really comes from. Pin is (only) a temporal relation between a value's Drop with its memory invalidation, currently. The elided or mistakenly performed move does not by itself change any temporal ordering here whatsover. Indeed the owning pointer should behave like a value by most means. Granted that it will be an open question how to obtain an owning pointer to a pinned value, and how Pin<&own T> might behave. But that, I think, does not directly impede on the design of an owning pointer itself.

There's some vague sense that pin should somehow undo the unqiueness guarantees of a contained mut-ptr for the sake of alias analysis but again there isn't a Pin argument type in any of the examples. And similarly if such an exemption might be taken then surely it could also hold for Pin<&own T> . Maybe you have a more precise link to the semantic problems?

It's not UB to write to EVIL in callback specifically because x describes a different place than EVIL does. For this purpose it doesn't matter whether the x pointee place was created from a copy or a move; it's still a distinct place.

In order for the use of EVIL to be UB, you'd need protectors on the pointer argument to entry. Which is why I bring up &move at all, since it would invalidate EVIL's provenance and conveys the semantic permission to drop the pointee.

The place reuse optimization isn't done with Box currently, but I do think the LLVM noalias semantics (combined with the deallocation of the box within the function scope) would make the optimization valid. (I don't know the exact specifics of noalias, though; it's possible that it's not strong enough to prove the lack of aliasing reads between the read out and the deallocation, depending on exactly how it's defined. SB/TB are both clearly strong enough to justify the transform unless I've fundamentally misunderstood something.) If we can't do the optimization with Box where it should be the theoretically easiest, it's not going to happen with any other weaker pointer type.

Pin isn't related to this case at all, I only mentioned it (in a parenthetical) as part of the reason &move is hard. The conflict boils down to that the fact that the pinning guarantee requires either for the pointee to be dropped or for the place to stay valid indefinitely, but owning &move T carries the responsibility to drop the T (or to leak/forget to do so) independent of the ability to free/reuse the memory.

But that's extremely off topic here and not relevant. Make a new thread and ping me if you want more thoughts.

Now I'm more confused about the example. The idea of owning pointer, for me, is less about the compiler eliding the copy but about being able to write code that contains no moves in the first place but still deals in values.

To be explicit, the question is if the type semantics of &own T could be engineered in such a way that the following could be allowed, and be extremely similar for the caller. Exhibit a.1, a.2:

fn extern(val: &mut BigVal);

fn main() {
    let mut val = MaybeUninit::uninit();
    // ..
    extern(val.get_mut())
}
fn extern(val: BigVal);

fn main() {
    let mut val = MaybeUninit::uninit();
    // ..
    // Note: probably a good idea to add `core::add_own!`.
    // Only consumes the owning pointer, no move in any semantic level.
    extern(&own { *val.as_ptr() });
}

I no longer understand if your example would relate to this/disprove its viability at all. On an ABI level the code in a.2 can be fulfilled by passing the pointer directly. With provenance of &mut this should work. For small values a register read would have to be made by the caller.

If the called function takes &move, there'd of course be no move of underlying place. I'm referring instead to code which passes source-by-value, and as an optimization, replacing the fresh temporary place introduced by a "ptr::read_owned" with instead reusing the source place.

The ability to pass &move T where T is expected would be a nice way to ask for this behavior rather than relying on optimization to recover it. If place address uniqueness rules prevent observable place colocation, passing &move makes sense as a way to permit it. Though colocation depending on ABI makes it a bit awkward, it doesn't prevent it.

Even though at an ABI level an abi-by-reference can be logically understood as taking a kind of &move, it's still source-by-value and its observable semantics (e.g. place uniqueness) is derived from that.

(I opened a UCG issue to determine concretely if address uniqueness would prevent the optimization without a source hint like &move.)

1 Like

This is already the case for unsized_fn_params and if I use #[custom_mir] to force it for a sized parameter, we do get the behavior of passing the pointer directly instead of making a stack copy with a new address.

Given that place reuse/colocation is required for unsized function parameters, that makes me think we'll want to specify function calls to permit this kind of place reuse.

Unfortunately a ptr::read_move function couldn't provide this benefit since the return value is certainly a fresh place. It'd have to be a directly exposed intrinsic (like mem::transmute) to give the desired semantic of producing a place. (Which is probably possible without major compiler changes, but I'm not sure, given intrinsics are still function shaped.)

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.