A sketch for `&move` semantics

Also known as: &own, rvalue references, first class places

At a high level, &move T is a theoretical third reference type which conveys ownership of and responsibility for dropping the T, but not its backing memory. A major reason it's considered potentially beneficial to Rust is that it enables library code to model the ABI behavior of pass-by-reference directly, eliminating repeated memcpy moves of a value in the source, without dropping down to the use of &mut MaybeUninit<T>. (Depending on the exact address uniqueness rules, it's not too uncommon to see LLVM not optimizing out such repeated stack copies for somewhat large values.) A second major reason boils down to feature compatibility with C++, though how exactly that gets expressed is highly dependent on the individual.

This sketch only handles "always uninitializing" move references, in essence serving as a safer version of &mut ManuallyDrop<T>. A separate but related concept which can be called "typestate references" also handles the initialization case, which would today be best unsafely modeled with &mut MaybeUninit<T>. It is the author's belief that the former can be made coherently sound even in the face of unwinding, but the latter cannot. For the purpose of the Abstract Machine reborrowing, &move T acts like &mut ManuallyDrop<T> (assuming a non-memory-recursive borrow validity).

The first question to ask is a two-pronged syntax one, but this syntax question has heavy implications on what semantics are reasonable.

  • When calling a function taking a &move reference, do you pass an argument by-value or by-move-ref? E.g. to call fn f(_: &move i32), given let x: i32: f(x), like a non-move-ref function, like how C++ references behave; or f(&move mut x), like other Rust references behave?
  • When passing a &move reference binding as a function argument, does using the value act like using a reference, or like using the referenced place by-value? E.g. to call fn g(_: i32), given let y: &move i32: g(*x), like how other Rust references behave; or g(x), (sort of) like C++ references behave?

I'll consistently refer to the concept as &move T, but note that the expression construction syntax cannot just be &move $place[1], since &move || {} is a valid expression (reference to move closure temporary).

I currently lean weakly towards implicitly creating move references, because the function interface behaves identically to accepting the move by value (assuming destructive moves). Exception: Copy types, since the existing value must not be clobbered. We might want a way to explicitly get a destructive move on them for optimization purposes[2] anyway, though.... Using implicit construction also avoids C++-minded viewers from assuming that &move mut $place is Rust's version of C++'s std::move(lvalue). Or worse, mem::ref_move!($place)

For the use of move references, I think they should remain only usable as references unless dereferenced, to avoid the incidental complexity present in C++ with forwarding references in templates and the use of std::reference_wrapper. (Patterns get a new binding mode of ref move which is the default when binding behind &move.) It would be allowed to call fn f(_: &move i32) as f(x) with x: i32 or with x: &move i32, to minimize the burden of this mismatch.

The big semantic question for &move is how do you guarantee destruction, such that Pin<&move T> is functional[3]? If you simply drop the pointee when the move reference is dropped (or moved out of), forgetting (or leaking etc) the reference would forget to drop the pointee.

The simplest answer is just not to guarantee destruction. If the move reference is forgotten, that's equivalent to forgetting the pointee, equivalently as if it had been manipulated by value. Pin<&move T> is unsound to create for T: !Unpin. This is unfortunate, because address-stable types are a major use case for owning references, since they cannot be owned and passed around by value.

The answer which the moveit crate takes is to make MoveRef into a fat reference type (extra-fat for an unsized pointee), including an extra reference to the appropriate drop flag in the memory's owning scope, such that that scope can run the drop glue if the called function fails to actually drop the value. Unfortunately, this scheme isn't without its flaws: it increases the size of move references to handle an edge case, and the library still has to provide "drop flags" which abort if the value wasn't dropped, since sometimes the lending scope can't actually drop the value at scope end.

My now preferred solution (and the reason I wrote out this sketch) deceptively subtle for how simple it is: put the drop flags in the scope manipulating the move reference. Not for the move reference itself (though it would still have them, as an owned value), but for the indirectly owned pointee. Effectively, immediately after the binding of some let x: &move T, insert the moral equivalent of defer! { if builtin#initialized(*x) { drop(*x); } }. The only "issue" is that this must be done as part of the language implementation, which has access to the drop flags, rather than as a library.

The impact of this is that the referenced place is always dropped as if it were a local place binding. Because the point of move references is manipulating places as-if they were moved into local scope, but without actually changing the pointee's address. Essentially, giving fn(x: &move T) identical semantics to fn(ref move x: T), except for the address of the pointee staying the same as in the caller's scope. (This equivalence also ties into why I think I like the pass-as-value call syntax.)

Thus, forgetting the move reference does nothing: the drop flags for the referenced place are tracked separately and the remaining value at that place is still dropped at the end of scope. Of course, if the move reference is dropped, that drops the pointee then and there, manipulating the drop flags such that it won't get double dropped, and much the same goes for moving out the value or creating a new fresh reborrowed move reference which takes over responsibility for dropping the pointee. Similarly, it's still valid to write mem::forget(*x) if forgetting the pointee is what is desired; there's just (somewhat unfortunately) no way to forget it without moving it (to counteract and enable maintaining the pinning guarantee), unless we add a mem::forget_in_place function.

And the majority of the indirect place drop flag tracking functionally already exists in the compiler, for Box. The box itself and the heap place's initialization states are tracked separately; this is what allows you to move out of a box (the "DerefMove") and still free the box allocation at the end of scope, or to even move a value back in, recompleting the box and allowing you to manipulate the complete box again. The "only" change that would need to be made for move references is decoupling the dropping of the "stack part" from the dropping of the "heap part," such that the latter can happen without the former. (Plus of course, all of the rules for creation of move references serving as a mut region on the borrowed place and considering the place deinitialized once access is regained.)


The semantics feels both reasonably workable and simple in hindsight to the point I'm somewhat surprised I haven't seen this approach of handling drop flags in previous discussion — or perhaps I just missed it or don't recall it; that's quite possible.

This seems almost too obvious and simple, and I fear I've overlooked some concern that would make this interpretation of move references impossible.

I don't think Rust is likely to accept an RFC for move references in the near term, but if Rust does support move references in the future, I currently believe that these drop semantics have the most straightforward and predictable behavior of any potential option (with the exception of just ignoring the issue and Pin<&move T>), so it should be used. (I'm less confident about the implicit construction syntax, but it does seem to have its benefits.)


  1. I've used &move mut $place as the expression construction syntax here, following behind the unstable &raw mut $place syntax behind ptr::addr_of_mut!. This syntax should be unambiguous, but it does clash with another far-future pseudo-proposal that offers mut || {} as potential syntax to indicate that a yield closure loops on return rather than poisoning and panicking latter resumes. ↩︎

  2. The short version of it is that if the address of the source place has escaped (i.e. been given to a noninlined function), it's impossible for the compiler to prove that any use of the value is the last in order to eliminate a defensive copy before pass-by-reference, which is necessary to ensure the two values, which are both "live" at the same time, have disjoint addresses. *&mut $place is an opaque way of navigating around this and invalidating any extant references, but only so long as that place computation isn't just completely dropped from MIR, losing knowledge of that side effect. ↩︎

  3. As discussed in @mcy's second blog post about the moveit crate. ↩︎

8 Likes

I always thought when we discussing &move, we are transferring the drop flag check to the function which manipulate the &move (like what you proposed in this post) . But since you are making this post now, it seems it wasn't the case? At least in my head, &'a move T is equivalent to T + 'a when borrow checking. And the only reason it's not being implemented is that there's not enough motivation and "it's too fundamental".

How would this work when the &move reference is put in a struct/Rc/whatever? Would that not be allowed?

This feels like it could cause ambiguities around generics, fn f(_: &move impl Sized) called with f(x: &move i32) could match either rule.

2 Likes

Not OP. But I think it no different to putting a regular short-lived T into Rc. Could you provide a code snippet where you think it's problematic?

So one of the future possibilities we had with our may_dangle stabilization proposal was to, well...

Our may_dangle proposal currently only concerns generics for lifetimes that end before the function. In practice this means functions get monomorphized based on drop flags. But what if it could be extended to lifetimes that start after the function, too? Or at least, what if the lifetimes could end in the function?

Further, with selfref, we kinda wish we could do uh something along these lines:

let x: Pin<Box<Holder<'_, MyType<ANoneSelfRef>>>> = ...;
let x: Pin<Box<Holder<'_, MyType<ASomeSelfRef>>>> = set_ref(x, |x| x);
x.operate_in(|x| x.selfref.get().whatever); // instead of needing to .unwrap() some Options.

specifically so that interacting with the self reference wouldn't require calling .unwrap() on an Option every single time, as we currently do. Which we feel like it has some overlap with &move? But we never figured out how to make this work, and we don't know how to describe what it's meant to do.

That just means that this special case would require disambiguation. It's the same situation as with e.g. { expr } & b, which requires explicit parentheses. The requirement of disambiguation could be introduced over an edition, or we could leave it as a lint, since the types of &(move || {}) and &move (|| {}) are different, and the coercion between them would strictly reduce reference permissions.

Resounding no. "Reference semantics" are one of the worst ideas in C++, which make the code entirely unreadable at call site. We could have made & and &mut implicit either if we wanted to, but it was rejected for a very good reason.

Also, like any proposal for implicit conversions, it would cause issues with type inference, and would cause confusion with generic parameters (does fn foo<T>(_:T) pass the value by move or by &move? both would be possible).

They would get a type error when using any foreign API, and resolve their confusion. I don't think this objection is any more likely than e.g. "C++ users will think that &T means a pass by reference, like in C++". Occasionally some users indeed get confused, but it's pretty easy to clear up, and there is no possibility of using the API wrong since if you use it with an invalid mental model, it likely won't compile.

Maybe I'm missing something, but imho that entire post is based on a misunderstanding of poorly worded Pin's documentation. Pin itself doesn't have any language-level invariants, only library-level ones. And on the library level Pin's drop guarantee exists to prevent inconsistent access to the pointee's contents and invalidation of self-references.

The post's discussion is based around the following issue: Pin says that memory must not ever be reused until the contents are dropped. But if someone calls mem::forget on a Pin<&move T>, then it will never be dropped, so we must never reuse the backing storage, even though the point of passing Pin<&move T> into a function is to be able to reuse the storage once it returns.

But that's not really what the safety contract of Pin requires. That's just the way the docs worded it. What's really required is that once someone produces Pin<P>, all other code can manipulate it under the assumption that P's pointee keeps its memory address, and the methods which take Pin<P> always work with it in a way which doesn't invalidate any pointers to P::Target's contents.

It's the value behind Pin<P> which must be protected. Saying "backing storage must not be reused until Pin<P> is dropped" is just the simplest sound way to express that requirement. But forgetting Pin<P> is also fine --- as long as we can guarantee that no one will ever access P or P::Target as a live value ever again.

Now, current Rust doesn't have the means to really enforce that restriction, nor does it need to. &T and &mut T are non-owning, which means that forgetting Pin<&T> or Pin<&mut T> cannot forget the pointee, making the drop guarantee above the only sound approach.

Pin<Box<T>> could in principle allow it. But forgetting Pin<Box<T>> makes the pointee heap entirely inaccessible, so there is no way to reuse it anyway. We could in principle allow to move the pinned T out of a Pin<Box<T>> in a way which is sound assuming self-references, but there is currently no way to express such API. The process would work basically like a C++ move assignment: we could soundly convert Pin<Box<T>> into a Pin<&move T>, and then T must specifically provide a "move assignment" method which moves a self-referential T between two Pin<&move T> storages, properly invalidating the former and activating the latter. However, while current Rust supports a "moved-out" Box, there is no way to talk about a "moved-out" Pin<Box<T>>. This means that the typestate after a move above would be impossible to express, or require unsafe and unergonomic Pin<Box<MaybeUninit<T>>>, thus making such API unlikely to ever be added to std.

Other owning pointer types (Rc, Arc, RefCell etc) have no way at all to express "moved-from" state.

But with a first-class &move T type, all of this wouldn't be an issue. If we make a p: Pin<&move T>, we would mark the original place inaccessible for the lifetime of p, and dead/uninitialized once p is no longer live, including leaked p. The borrow checker makes sure that any pointer derived from p wouldn't outlive it, and thus that we can only access the storage once no one can access the (leaked or dead) stored value.

2 Likes

Normal short-lived T can be leaked, but this proposal requires the T pointed by a &move T to not be leaked. However when put for example in an Rc the &move T could be leaked, while still being able to be used to e.g. move out the T. So how do you guarantee it is dropped once and only once?

I don't agree. For example a pinned struct could register a pointer to itself with some worker thread to get some data later on written there, and de-register on drop. The safety of this relies on the fact that the memory of the pinned struct remains valid until drop is called, when the struct is de-registered. From another point of view, guaranteeing that nobody will access the P::Target value ever again is not doable without dropping it, because Pin's documentation gives it permission to access itself whenever it wants until it is dropped.

I guess here you meant the pinned value, rather than the Pin<P> pointer.

Which falls under the syntax being "not just" &move $place.

I absolutely agree w.r.t. regular references, because passing T, &T, or &mut T have very different ownership/borrowing semantics in the caller. However, &move is different -- the ownership/borrowing semantics for the caller are identical between passing T by value or passing &move T, in that you're passing ownership of the T to the called function.

So from the perspective of the caller, if I have let x: String and call f(x), it doesn't* matter whether it is fn f(_: String) or fn f(_: &move String), because both behave identically.

*Actually, there's one difference: if (and only if) the lifetime of the &move is not an elided lifetime, then the specifics of that lifetime become meaningful to the caller, because the moved-from place remains locked until the lifetime expires. It does make sense to limit implicit coercion to elided lifetime &move T.

As you mention, the same applies to other coercions, which for function arguments covers autoderef and unsizing coercions. (And you could technically consider place-to-value a coercion, I suppose.)

... no. What the docs say are exactly what people are allowed to rely on.

Given a T: Unpin + 'static, it's perfectly allowed to, given Pin<&'a T>, communicate to a separate thread to give it access to read *const T indefinitely ('static), so long as a) this does not make any other provided APIs unsound, and b) when the T is dropped, that background thread's permission is revoked and it will not read the *const T anymore.

This would be unsound when combined with a Pin<&move T> which does not guarantee destruction of the T or that a non-destructed T's memory remains valid.

A Pin<&move T> must guarantee destruction if you allow &move to reference locations which the provider will later deallocate.

There is, assuming the presence of some drop-guaranteeing move reference. The moveit library does so by including library-level drop flag/guards to ensure the moved-from place gets uninitialized. Like how C++ move constructors work, it functions to uphold the pinning requirements, even for address-sensitive values, by logically still "dropping" the value in the old location. The difference from C++ is that because Rust moves are destructive, the value state and "drop" done for move_new don't need to be handled in the normal value state, up to and including the "moved from drop" being a no-op.

The "must be dropped" guarantee must be upheld. A better wording may be "must be uninitialized," such that moving out of in a way defined by the type is included in the definition. But leaving the value as-is without informing it that the memory is going to be invalidated is unequivocally disallowed.

That's the core of the proposal here: the scope calling Rc::new(&move mut place) would still own the drop flag for place and be in charge of dropping it if the Rc doesn't. Specifying exactly how this works is the difficult edge which I didn't consider in the original sketching; putting the drop flag in the owner works somewhat reasonably for short-lived, lifetime-elided &move but falls apart when passing &move to a function which returns a value capturing the &move lifetime.

To rephrase this is a more generous manner, consumers of Pin<P<T>> for unknown T must uphold the guarantees exactly as documented in the Pin documentation, because the T may be relying on any and all of them. &move T knows nothing about the T, and represents that ownership of the value has been taken from some place which will be later deallocated, so it must fully respect the guarantees and guarantee the T to be dropped.

For specific T, they are of course allowed to weaken the guarantees that T requires. Such a weakening could be Unpin, where all of the Pin requirements are dropped, or something inbetween, such as a fn(&move T) -> T or other move constructor shape, which can move from the T's location in a way compatible with the T's address-sensitivity, independent of whether than includes a logical call to drop_in_place::<T>.

It can't access itself if no one has a reference to it, which is what happens when you leak it.

You can't rely on Drop for safety since leaking is safe, thus such API isn't sound in the first place. It's probably OK for Pin<Box<T>> specifically because the pointers stay valid even if the box is leaked. But for stack allocations it's never sound.

let foo: Foo = make_foo();
let pinned = unsafe { Pin::new_unchecked(&foo) };
register_pinned(&pinned);
mem::forget(foo);
// And now the registered handler has UB, even though we used safe code
// (apart from the pinning itself).

To put it another way, if you assume that your handler is sound, then the unsoundness above comes from the use of stack pinning. You cannot rely on destructors running since leaks are safe. If you need to rely on the memory not being repurposed without destructor running, your only option is to make sure that the memory isn't repurposed even if the data is leaked. Which means that you must use exclusively heap allocated data --- that's what heap allocation is for. If you're designing a module with that API, you can make a safe function which creates Pin<Box<T>>, registers the handler and returns the box. You cannot just accept arbitrary Pin<&move T>, because you cannot guarantee that the backing memory lives for 'static, even if you accept Pin<&'static move T> (the pin can be leaked and the local variable overwritten).

You cannot demand anything Drop-related about local variables. The abstract machine doesn't have a concept of stack and doesn't guarantee you that the values live on the stack (otherwise you could do stuff like (&raw 0).add(10) to read the stack). From the PoV of language semantics, stack is infinite and local variables never alias, no matter how you enter and exit scopes. This is also true at the level of LLVM registers, but obviously not true in compiled code.

Basically, before you say stuff like "leaking Pin<&move T> is unsound" you should tell me where you got that Pin<&move T> from --- because you can't get it without unsafe code, so that's the obvious culprit for unsoundness.

They are of course not identical, because they involve different types with different semantics. For example, the called function can hold on to the reference's lifetime, e.g.

// this lifetime can be elided
fn foo<'a>(_: &move T) -> &'a T;

This means you can't use the referent again in any way until the returned borrow ends, which is something that should be explicit, if only for easier error fixing. Note that the returned type could also have nontrivial Drop, meaning that you can't just invalidate it silently, like a &T.

It's also important whether you can reuse the backing storage or not.

Then there is an issue of argument coercions, including the Deref (and ideally DerefMove) coercion.

That's the core of the proposal here: the scope calling Rc::new(&move mut place) would still own the drop flag for place and be in charge of dropping it if the Rc doesn't. Specifying exactly how this works is the difficult edge which I didn't consider in the original sketching; putting the drop flag in the owner works somewhat reasonably for short-lived, lifetime-elided &move but falls apart when passing &move to a function which returns a value capturing the &move lifetime.

I unfortunately believe this is unsolvable in the general case without carrying along a runtime drop flag together with the move reference; consider Rc::new may decide to drop it's contents based on runtime conditions, which means that it needs to communicate back to its parent scope somehow.

Of course, one solution would be to simply prevent the move reference from ever 'escaping' its owning scope (either by forbidding calls, or automatically 'decaying' &move Ts into &mut Ts), but then it means &move T isn't a "real type" anymore; so maybe the syntax should reflect that?

fn call_by_move(move val: T) {
  // 'val` behaves like a standard `mut` binding here
}

However, this is very restricted and is closer to syntax for a "copy-eliding" calling convention than full-blown move references.


P.S.: Note that if you're okay with giving up Pin<&move T>, you can get a very convincing version of 'callee-dropped' move refs by using a Box with a custom noop allocator: see this playground.

P.P.S.: Syntax-wise, I personally prefer the 'owning reference'/&own T terminology, , for two reasons:

  • it says what type of reference it is, instead of saying what you can do with it;
  • using &own expr avoids the ambiguity with move closures.

Compare it with how the DerefMove coercion for Box would work in this case. It's reasonable that we should be able to get Pin<&move T> from Pin<Box<T>>, similarly to Pin::as_mut or Pin::as_ref. But that coercion would have the signature

impl<P: DerefMove> Pin<P> {
    fn as_move(&move self) -> Pin<&move P::Target> { .. }
}

If we do mem::forget(Pin::as_move(&move b)), for b: Pin<Box<T>>, then b would also be leaked, because we move &move b: &move Pin<Box<T>> into Pin::as_move, marking b as moved (but not dropped, because neither function calls Drop on it).


With regards to your specific proposal, I don't think I fully understand it. Do I understand you correctly: you propose to turn &move T into a fat pointer, consisting of an owning pointer to the value of T and a &mut reference to its drop flags, living in the same scope as T itself. Is this correct? If so, I believe that the old proposals suggested something similar, and there are the following issues.

Most importantly, how do you deal with panics? The function foo(&move T) could panic inside, in a way which makes the referenced T invalid. You may not observe that panic, because some intermediate function may use catch_unwind. For example, perhaps we try to drop T, but its destructor panics. What is the state of the drop flags? If you don't clear the drop flag until the destructor finishes correctly, a panic in destructor would cause the caller to observe the value T as live, even if some of its contents were already dropped. If you clear the flag before running the destructor, then a panic would effectively leak T, returning us to the original issue (&move T may leak T or its storage). Also I'm not sure you can be panic-safe even if panics in destructors were forbidden, as some people propose.

Another issue is that you would make &move T into a fat pointer, increasing its size and making it inconsistent with other references. That's a cost which the called function would have to pay even if it doesn't intend to leak &move T or even to move it at all, so this makes your move references into a non-zero cost abstraction.

There is also an issue of ABI. It is undesirable to specify ABI for drop flags, since that would likely make them observable. But currently drop flags are an internal implementation detail, and are usually entirely optimized out in release builds. I don't think specifying them explicitly at the language level is desirable.

If you don't specify ABI for &move T, then you can't use it in FFI. That's also undesirable, and inconsistent with &T, &mut T, Box<T>.

Another issue is that I can't understand what would &move !Sized mean. Unsized types are an important use case for move references, because &move T would allow to entirely eliminate the half-working feature of unsized locals and unsized function parameters. If you need to pass a trait object into a function, you could just pass &move dyn Trait. But &move T is supposed to be a fat pointer, so &move dyn Trait is... even more fat? Doesn't seem to directly clash with anything, but probably a hard sell.

For &move [T], on the other hand, I have no idea at all what the "drop flags" would be supposed to mean. You can't pass a drop flag for every item in a buffer, and a single drop flag for the whole buffer but not its items doesn't make much sense (in particular, doesn't help with any potential leak issues).

Move closures are a very unlikely edge case. Closures are typically passed by value, not by reference, so &(move || {}) is unlikely to appear in practice, outside of some code dealing with trait objects.

It would also require to introduce a new keyword for a single use case, while the move keyword is already reserved, arguably works just as well, and is very underused in current Rust.

But does it own the value of T, the backing buffer, or both? I don't think it's much more clear. &move T, on the other hand, describes exactly what you can do: move T. Or not move T --- that's also a valid option! For something like impl DerefMove for Box that's what's desirable. It should be possible to turn &move Box<(A, B)> into &move A and &move B, and move or not move each field separately. With &own, it's imho less clear, since the goal is not to pass ownership of Box to some foreign code, but just allow to move individual fields of the wrapped type.

I'm not saying this. Forgetting Pin<&move T> must be sound.

The entire point of this sketch is to make Pin<&move T> to places which will be deallocated sound. Thus, the sketch is laying out how to ensure that it is sound, by guaranteeing the consumption of the T.

If dropping &move T drops the pointee, and forgetting &move T forgets to drop the pointee, then Pin<&move T> is certainly unsound for T which utilize the pinning guarantee. The whole point of this sketch is an adjustment to &move semantics which ensure this is sound, by guaranteeing the pointee to be consumed somehow, even if the &move T is leaked.

To perhaps oversimplify, the behavior of the following two signatures are identical by this sketch:

fn f(x: String) {
    let x = &move x; // shadows owning binding

    // code
}

fn f(x: &move String) {
    // code
}

On the contrary, this is exactly how Pin<&mut T> is used. A function which takes Pin<&'a T> where T: !Unpin + 'self can actually utilize the &T until either the 'self lifetime expires or the T's destructor is finished, whichever comes sooner. The common case is 'self = 'static. (Caveats around how exactly the dynamic aliasing rules will pan out still apply[1].)

Here, let me sketch an example using only async and an unsafely scoped thread:

async {
    // abort on unwinds
    let nounwind = UnwindBomb::arm();

    let mut place = String::from("example");
    let ptr = &mut place as *mut String;
    let exit_flag = AtomicBool::new(false);

    let handle = thread::spawn(|| unsafe {
        while !exit_flag.load(Relaxed) {
            dbg!(&mut *ptr);
        }
    });
    let handle = join_on_drop(handle);

    poll_fn(|_| Poll::Pending).await;

    handle.join();
    nounwind.defuse();
}

A scoped thread API relying on drop being run is unsound because a user can forget a join-on-drop handle. However, the language still does guarantee that stack-allocated bindings are dropped, including in async contexts. The user is fully allowed to rely on drops occuring for soundness.

This future is sound. If you disagree, we have a much larger disagreement which has nothing to do with any potential &move.

That's not how lifetimes work. Such an API is absolutely sound if it requires specifically &'static mut T, because the backing memory is promised by that signature to live until the 'static lifetime expires (which never occurs). It can be written in purely safe code, by sending the &'static mut T cross-thread.

&move T is strictly more powerful than &mut T. Passing &'static move T to a function would give that function 'static permission to manipulate that place, and the lender could never touch the place again, since the 'static loan will never expire.

No, it can't, not if the return value borrows from from the input value. As written, this returns an unbounded lifetime which is unrelated to the elided lifetime.

The elided form is

fn foo(_: &move T) -> &T;
fn foo<'a>(_: &'a move T) -> &'a T;

I presume this was just a typo.

I did actually attempt to call this out:

though I did forget that if there is only one input elided lifetime, it's usable as the output elided lifetime, which of course also captures it. But this only means that my rule is "lifetime isn't captured" rather than "lifetime is elided."

My sketched proposal boils down to if we make fn(ref move x: T) and fn(x: &move T) have identical semantics, then &move T covers a notable use case (guaranteed move coalescing through multiple function call layers) and the concept carries over to Pin<&move T> in a fairly straightforward manner.

The case which I had overlooked which makes this sketch not simple is in fact exactly the ownership case where the lifetime is captured. Disallowing capturing &move lifetimes is quite limiting — in fact, it prevents even constructing Pin<&move T> without further language support — and thus isn't a useful limitation to apply.

But I do think there's a looser limitation which forbids Rc<&move T> but allows Pin<&move T>, and has some pseudo precedent in the language already[2]: only allow direct ownership of &move T. That restriction I do think is reasonable.

To the point of calling syntax, implicitly creating move references from naming a place should be limited to only non lifetime capturing &move. Lifetime capturing cases should have to write &move to create a lifetime which outlives the temporary (and nonextendable) scope.

I'm not quite considering DerefMove yet, since that has further complications (e.g. how exactly does drop checking for the pointee place work, and how does dropping the container work; very #[may_dangle] related).

But for what it might be worth, extending valid &move positions from to include indirect places reachable by DerefMove, the chain of which is now also subject to the same restrictions, seems possible. That would allow you to use Box<&move T>.

Given I'm nominally at the front of the storages proposal, I'm well aware :slightly_smiling_face:

(Sorry I haven't made any progress there; it's unfortunately quite a low priority at the moment.)

If limited to the point of only being for explicit owning pass-by-reference, I actually do like this style of specifying... though it's quite unfortunate that you get pass-by-reference by specifying move val: T, not ref val: T which is allowed and does something completely unrelated.

The middle ground would be to commit to not having a move pattern binding mode and spell it ref move val: T, which produces val: &move T with the not-quite-a-type version of &move T.

Existing Rust prefers the "what you can do" naming. &mut is a reference that you can mutate through, as opposed to "&uniq" because the reference is unique.

If the only case the &move pointee is dropped is when the &move is dropped, yes, but not in this sketched proposal.

That's close to but not quite accurate. Rather than &move T being (&mut ManuallyDrop<T>, DropFlags<T>), the "DropFlags<T>" are included as part of DropFlags<&move T>. Forgetting the &move T means the &move T won't get dropped, but the pointer T is tracked separately and will still get dropped. The only way to not drop it would be forget(*moveref) (moving it out of the previous place) or a theoretical fn mem::forget_in_place<T: ?Sized>(_: &move T) which would need to be a compiler built-in like mem::forget was before ManuallyDrop.

My device is about dead and this post is already overlong; I'll come back to the individual points/questions later today/tomorrow.


  1. Currently, it's the case that async utilizes this to maintain self references though integration with the compiler, as well as it being permitted for captured variables for future::poll_fn; formally, it's not yet permitted to do this kind of shenanigan manually, and you're relying on implementation details to make such sound. I have a sneaking suspicion your objection is at least partially rooted in this direction. ↩︎

  2. The related restriction is how unsizing works. &move T would only be allowed to be used in some Container<T> if &Container<impl Trait> can be coerced to &Container<dyn Trait>. For non-generic types, the same applies; &move T takes the spot of the unsized tail, acting similarly to a new unsized kind with the drop flag as its unsize metadata, except not at all since the drop flags are only present for the owning scope, not borrowing scopes, and it's still sized. ↩︎

1 Like

ooh, that's cool!

hmm, can you drop a Vec that contains an &move if the &move is already dropped (somehow)? or does it have to be live at that point?

My example is not relying on Drop being called for safety. It's perfectly ok for drop not being called in my example, but if that doesn't happen the memory of my struct must remain valid (i.e. not repurposed). This is fine because it is pinned, and Pin specifically gives me this guarantee:

for pinned data you have to maintain the invariant that its memory will not get invalidated or repurposed from the moment it gets pinned until when drop is called

From std::pin - Rust

Your code is unsound because it pins foo without upholding the Drop guarantee of Pin, thus your unsafe is wrong.

Stack pinning with the pin! macro is sound because it's guaranteed that temporaries will be dropped (in sync functions, and async functions which are dropped) or the stack space will not be repurposed (by the Drop guarantee of Pin which must be upheld when polling the async function). Your code however does not do that since it leaves foo accessible even after pinning.

The goal was to make creating Pin<&move T> safe in the first place, thus is comes natural wondering how to guarantee that. Pin requires that either the T will be dropped or its underlying storage will be repurposed, but we know that the underlying storage of &move T will surely be repurposed since it's on the stack. Thus to make this safe we must guarantee that the T will be dropped, even if e.g. the Pin<&move T> itself is leaked.

How does it distinguish the case where Rc::new (or any other function!) stopped using the &move reference without dropping it from the one where it stored it somewhere for later? Lifetime annotations in the signature? What if e.g. the function stores it temporarily in some static place and manually guarantees the lifetime will be upheld in some other way?

I may be mistaken but isn't &move T just a static version of &mut Option<T>? I don't feel a need for &mut Option<T> that often to actually need it as a language construct.

Heh, as the author of:

as well as:

I have thought quite a bit about this topic :smile:

Naming-wise, own-ing references is better than move references

Indeed, the whole point of the references is not to move stuff. The "responsibility to drop" is actually more often called ownership. So, to keep things clearer, I'll keep referring to them as &own T references. Whilst move being a keyword seems convenient at first glance (and the reason back in the day I too called them &move references), I have since changed my mind: that naming confuses too many people ("something by reference is not moved").

  • I think we could afford a contextual keyword here, with another edition; incidentally it would remove the need to disambiguate &move ||, which is not that niche, since a good motivation for owned references is that of constructing &own dyn FnOnces and the like (see below).

Semantics are already fleshed out and exist in the aforementioned stackbox crate.

And they happen to be quite simple!

Click here to see the Rust code

&own value would be equivalent (modulo lifetime extension) to doing:

Storage::SLOT.init(value)

with:

pub struct Storage<T>(MaybeUninit<T>);

impl<T> Storage<T> {
    pub const SLOT: Self = Self(MaybeUninit::uninit());

    pub fn init(self: &mut Storage<T>, value: T) -> Own<'_, T> {
        Own(self.0.write(value))
    }
}

which would result in a &'local own T, or Own<'local, T> in user-library parlance, with:

pub struct Own<'storage, T: ?Sized>(&'storage mut T);
// and all the good `{Coerce,}Unsize`  impls for nice unsizing.

impl<T : ?Sized> Deref{,Mut} for Own<'_, T> {
    type Target = T;

    ...
}

impl<T : ?Sized> Drop for Own<'_, T> {
    fn drop(&mut self) {
        unsafe { <*mut T>::drop_in_place(self.0) }
    }
}

And that's it.


The missing part are thus ergonomics: creating a &own reference with library code is currently cumbersome (look at all the offered constructors in stackbox!), especially related to lifetime extension.

  • Lifetime extension would be key for this to be ergonomic

  • Being able to use &own self receivers too

    since it would allow for &own self methods in dyn Traits, thereby resolving the classic conundrum of "dyn-safe trait with an owned receiver without alloc/Box (e.g., no_std environments).

Supporting Pinning is more trouble than it is worth.

Conceptually, a Pin<&own T> cannot offer the Pinning guarantees, since it does not own the T's backing allocation (it only owns T's drop glue, so, if forgotten, the pointee will be deallocated without its drop glue being run, to summarize what has already been mentioned in this thread).

So, while maybe an effort could be made to support it; we'd be "swimming against the tide", of sorts, so it does not seem wise to start with that.

  • "Remote drop flags" would probably help tackle this design space, but it does not seem to be worth focusing on this for a first implementation: as mentioned, scoped APIs (or macros?) could let third-party libraries polyfill this design space initially; there is no need to rush language sugar for this initially. (Moreover, a Pin<&mut Option<T>> wrapper which would auto-unwrap, and Pin::set(it, None) on Drop, seems quite equivalent to this suggested language magic, so the magic seems unwarranted?)

I suspect drop flags may lead to a bunch of design questions, and thus an impression of lack of clarity around the design, which is very much not the case, as I've shown in the aforementioned code.

Benefits of &own T

It "fits the picture"

First and foremost, it would fill the missing third mode for references:

Semantics for T For the backing allocation
&T Shared access Borrowed
&mut T Exclusive access Borrowed
&own T Owned access
(drop responsibility)
Borrowed

That way the troïka/trinity/trifecta triumvirate of Rust design would finally apply to the &-indirection mode of references.

Some people, back in the day, complained about this point, because of a beginner mixup between ownership and being : 'static. You can be : 'static without being responsible of any drop glue (e.g., &'static ... references), and you can be 'lt-infected while being responsible of drop glue (e.g., BoxFuture<'lt, ...>). So the fact we have a &'locally-lived reference with drop glue ought not to be surprising (as a matter of fact, there is the tangentially related dyn* Trait + 'local design which runs into the same paradigm).

  • In fact, a Own<'lt, T> can be conceptualized with the storage API as a Box<T, Storage = Borrowed<'lt>> of sorts (hence my original StackBox name in the crate; but since no actual heap-Boxing occurs, I find the "stronger &mut" naming to better fit the picture than talking about boxes).

  • we already have one instance of this concept in the standard library: the pin! macro consumes ownership of the given value, and returns a temporary borrow to it (it's just that because of the aforementioned issues with Pin<&own _>, the macro "downgrades" its output to Pin<&mut _> for soundness).

It supersedes, with less magic / more honest and transparent semantics, unsized_fn_params.

That is, it trivially solves the Box<dyn FnOnce> : FnOnce ? and any other such occurrences wanting to take a dyn Trait "by value", ideally in an allocation-agnostic way:

/// This trait is object/`dyn`-safe
trait DynSafe {
    // No need for unsized_fn_params, thanks to `&own ?Sized` references:
    fn example(&self, f: &own dyn FnOnce()) {
        f(arg)
    }
}
// this is an example of `dyn`-safe polymorphism over an ownership-based trait.
  • For instance, this supersedes the usual &mut Option<FnOnce()> dance that is so pervasively polyfilling this API gap in several occurrences.

This does not exclude unsized_fn_params sugar, afterwards, if deemed ergonomic enough to warrant all the extra language magic, from being added; but at that point it would amount to:

fn example(f: dyn FnOnce()) {
    let g = f; // what does this do??
    g()
}

example(|| { ... })

being sugar for:

fn example(f: &own dyn FnOnce()) {
    let g = f; // Ok, it just "copies the ptr" / moves the owning pointer.
    g();
}

example(&own || { ... })

It unlocks "Future Possibilities"

Returning dyn Traits or [] slices

With the Storage basic API shown above, we could even start featuring returned dyn Traits:

type ActualFn = impl Sized;

fn create(storage: &mut Storage<ActualFn>)
  -> &own dyn FnOnce()
{
    storage.init(|| { ... })
}
type ActualArray = impl Sized();

fn create(storage: &mut Storage<ActualArray>)
  -> &own [String]
{
    storage.init([
        String::from("hello"),
        String::from("world"),
    ])
}
  • which is something unsized_fn_params can't even dream of, since it hides all the 'storage semantics from the picture;

  • which could help with the -> impl Trait in Trait effort;

In-place initialization

Here &'storage out T, coupled with generative API would allow writing in-place constructors, in a way that can perfectly be unwind-safe.

Basically this:

but with my own correction that it can be made sound (precisely by having each in-place initialization yield a Own<'storage, Field, Brand<'_>> token): see GitHub - moulins/tinit: An experiment for safe & composable in-place initialization in Rust.

Move constructors

Probably from the previous point and adding Pin into the mix, c.f. the moveit - Rust crate.

14 Likes

I don't agree. That's like saying "something &mut may be not mutated", that's irrelevant. The API guarantee is that a function taking &move T can move its contents anytime, even if it doesn't move it now. Consequently, this means that from the PoV of the caller, the value is moved into the function.

I also disagree that &move T is about responsibility to drop stuff. No, that's incidental. It allows you to drop T simply because it allows you to move T, thus you can always do drop(*move_ref). The autodrop behaviour is just a safeguard, to free the user from having to call drop manually, and to guarantee drop correctness in the presence of panics.

Most importantly, &move T doesn't force you to drop T, but it should allow you to move out an instance of T, and then move in a different instance later, just like Box allows you.

let mut v = vec![0];
let move_ref = &move v;
let _ = *move_ref;
*move_ref = vec![1];

In this example, it is exactly the capability to move in and out of a place that is important. The dropping is irrelevant and done in some other place. Note that between the move-out and the move-in, the reference owns no value, and the final value is entirely unrelated to the starting one. This highlights that ownership semantics are also misleading in some important cases.

The moveability is important if you want to be able to reuse allocations, which you can currently do with Box and, to an extent, Vec. It's also critical for implementing partial initialization and placement new semantics.

Support for pinning is absolutely critical. Without it, you can't implement moveit crate in safe ergonomic Rust. Being able to safely (or at least safer) work with self-referential and non-moveable types is a major reason to introduce &move in the first place. It's something which cannot be implemented in current Rust without copious unsafe code.

Besides, Pin<&move T> is a very natural type to construct. Even if you don't provide it in the stdlib, people will still use it in their code, but with more unsafety risk. Note that one can construct it via transmute.

1 Like