Pre-RFC: Move references

This is not true. At its simplest, &move out is just a borrow where you logically own the value, and are responsible for dropping it before the end of the borrow's lifetime. It's identical to Box, except that someone else owns the memory, not you. In fact, I could even see &mut MaybeUninit<T> being a valid storage, so you could theoretically spell &move out T as Box<T, &mut MaybeUninit<T>>.

You're projecting your own desire for PIT onto &move out. Nothing about &move out requires PIT or any extra behavior w.r.t. drop flags. Simply treat the value behind &move out the same way the value behind Box or even a local binding is treated: emit drop flags if necessary, and don't allow the value to escape except in a well-formed fully-initialized form.

&move in does generalize to being able to talk about piecewise initialization of values (roughly, PIT). But even &move in doesn't require PIT or exposing drop flag machinery in any way: either you fully initialize the place, or give the &move in to someone else to fully initialize, tracked, again, as a normal value would be (i.e. a basic let x;). And as I have said multiple times before, &move out is both simple and useful enough to be valuable to talk about on its own, distinct from the barrel of issues that need to be solved to even get to the point where adding &move in or PIT is possible in a sound manner.

Please stop hijacking threads by pulling discussion to your own pet features, @Soni. Even if PIT or whatever feature you're discussing is a clean superset of the other threads' discussion, it is useful to discuss a smaller, simpler subset without needing to also solve the extra issues of the fully general case.

8 Likes

A bit off-topic, but where I can find out which panics always abort and which I can catch?

This is, in fact, the exact extra behaviour that PITs add. But we digress.

You still can't have the ability to return move references. Unless you also wanna pass a reference to drop flags around with them. Just imagine panicking after returning from a function that claims to initialize a move reference, but just returns the uninitialized move reference back to you. This is why returning them is unsound. Not only that, but it's also completely unnecessary. You don't need the ability to return move references for them to be useful. It'll just cause a ton of trouble.

It moves the entire drop flags machinery to the type system, how isn't that increasing complexity?

It doesn't simplify anything, it just moves all the complexity to types, meaning type checking will be much more complex because now it also has to track initialization. Also, you're claiming that it doesn't expose drop flags, but actually it's the opposite, you're embedding drop flags in userfacing types.

That's not true, it should be perfectly safe to return &move out references if the storage will be valid after the function returns.

Given that an &move out reference implies that it is fully initialized there's no need to pass drop flags around with them.

That's a problem of &move in references, not &move out.

1 Like

The drop flags machinery should not be confused with initialization state.

The drop flags machinery should not be confused with initialization state.

With PITs, drop flags stay the same. What you get instead is monomorphization based on initialization state. You just use the existing drop flags machinery to select the correct monomorphization. That's not drop flags because drop flags is about dynamic dispatch whereas monomorphization is about static dispatch. The programmer would never see the actual drop flags, so any important optimizations (like, for example, the extremely important dead drop flag elimination) can still be applied.

While y'all were too busy claiming PITs aren't worth thinking about we were thinking about all the ways they could go wrong. It's much better to add the concept of 'self (lifetime of current stack frame) and bind all &move as &'self move so that you can't return the things than add unsoundness and/or workarounds because you didn't think about the caveats before trying to do it.

(And, y'know, you don't need to be able to return them.)

Wouldn't that conflict with the lifetime proposed for self-referential types? Also, that's not "much better", you're proposing to just cripple this feature and call it a day.

Again, &move out T doesn't add any unsoundness.

There are valid usecases that need to return them. Think for example splitting an &move out [T]/&move in [T], or an owned iterator over &move out [T].

1 Like

Highly skeptical about this "need" thing. It doesn't look like it's actually needed. Besides, isn't that just Vec.drain and the like? (is Box<[T]> as IntoIterator a thing yet?)

Those (currently) assume you have a heap allocated value, which is not always the case, and you can't split them. I guess you could implement them in the future if Box<[T], &mut [MaybeUninit<T>]>, but that won't help making Box less special.

As we've already mentioned, move references on their own won't help make Box less special either.

Latest version:

You'd want &mut DropFlags<T> there. They need to be able to propagate back to the caller/original thing. This is also slower than the usual stuff with drop flags on the stack due to the extra indirection and is also non-optimizable as you can't simply eliminate drop flags you don't care about.

No, I can. Drop flag of each binding is stored on the stack - in case of creating a move reference I just modify them if something was moved out. Moreover, these move\drop flags are stored inside of move reference in case of PIT extension (in future possibilities) and are accessible from both caller and callee stack frames (they are on joint of these). In this PIT extension I have nothing to do with anything I don't involve into an operation - I don't have to store the flags that are unused.

Drop flags and initialization state are equivalent. Drop flags are the (potentially runtime, potentially compile time) tracker for whether a place needs to be dropped, i.e. whether it is in an initialized state.

If the drop flag is "on," the place it is tracking is initialized and needs to be dropped. If the drop flag is "off," the place it is tracking is not initialized (either pre-initialization or post-drop, it doesn't matter) and needs to be not dropped or otherwise referenced (safely).

Drop flags typically are discussed w.r.t. being emitted at runtime, but the exact same system is used for tracking pre-init and post-drop state of places, even when they're known statically.

This doesn't even need to matter! &move out is returnable. Proof:

fn proof<'a, T>(r: &'a move out T) -> &'a move out T {
    r
}

It is more consistent to have move references follow the normal borrowck rules than to introduce new rules to forbid something like this.

So... you mean the normal borrowck rules for locals? There's no need to add a concept for lifetime of the current stack frame, because we already have it based on the way lifetimes work. Just take a reference to a stack local, and you have a reference which is most loosely constrained (before you add additional constraints) to begin no earlier than when the stack value was initialized, and end no later than when the stack value is dropped.

If you're assuming general typestate changing references (see later in this post): you may be right! But for the basic case of &move out, the standard lifetime rules handle it perfectly fine.

No they don't. As written, &move inout even does not expose PIT in any way. At the point you obtain &move inout, the pointee is fully constituted. At the point you give away &move inout (incl. by end of scope), the pointee is fully constituted. At the point you obtain &move out, the pointee is fully constituted. At the point you exit scope with &move out, the pointee is fully dropped. There is no dynamicism, no runtime sharing of partially dropped places, and everything on function borders is known to be either fully initialized or fully dropped.

However, @tema2: I maintain that &mut inout T is semantically equivalent to &mut T w.r.t. ownership and responsibilities as observed by the outside, and that you haven't done anything to solve the unwinding problem by separating them. I still think that in and inout both, as well as DerefMove, need to be deferred to a later RFC step, but if you want to keep &move inout as part of the RFC, please show in a concrete way how it behaves differently than &mut, and show how I can move out of a &move inout and then panic safely.)

I actually do see where @Soni got the impression of the need for &mut DropFlags, though: a sound &move inout in the face of panics does require communicating back to the caller which fields have been moved from in the face of a panic.

Still, @Soni: please, rather than asserting that something is true, show why it must be true. You've obviously convinced yourself, so convince others by showing them why what you're saying is true, not just asserting that it is. If someone currently disagrees with you, telling them (multiple times) what your position is isn't going to change anyone's mind. Showing evidence can.


Also, @Soni, IIUC your PIT proposal doesn't handle the case that &move inout is trying to address; that is, typestate does nothing to allow fn(&mut T(..)->T(..)) (using your syntax) to move from the reference temporarily, as a panic would then leave the place in a more uninitialized state. (And if you want the "magic word" to communicate what you're trying to do with your PIT proposal, I believe it is in fact typestate.)

But his is still not the thread to discuss PIT, except perhaps in how &move needs to behave to remain forward compatible with an initialization typestate future. That thread is ?Uninit types [exist today]. Also let's talk about DerefMove.

5 Likes

Couldn't the drop responsibility be temporarily moved with the &move inout? This way the caller only has to track whether the panic happened during the call where it passed the &movein, while the callee will drop any &move inout on panic, and only on panic. This should avoid the need for passing drop flags along with the move reference, however it doesn't prevent the caller from catching panics.

@tema2 regarding the RFC:

We introduce a new pattern kind ref move NAME : this produces NAME of type &move T! . The reason of the ! obligation is that we may not want to left a binding (partially) deinitialized after execution of a pattern-matching construct.

The reason of such design is that we may not want allow pattern matching to partially deinitialize a binding, as it will require way more complex analysis as well as will give pattern matching powers that do not align well with its original purpose (expressing "high level" logic).

However box patterns allow this, although on nightly only.

I also propose design of DerefMove:

trait DerefMove {
 type Output;

 fn deref(&move self!) -> &move Self::Output!;
}

Could you explain why:

  • DerefMove doesn't extend DerefMut
  • The method name is deref (clashes with Deref::deref)
  • It only works for &move T!
self.ptr as &move T! //just cast the pointer to a reference

This seems pretty weird. You can't currently cast between a *mut T and a &mut T, you need to do something like &mut *ptr. Also, is it supposed to be a safe cast? I see no unsafe there.

Could you explain how it allows Box to not be special cased in the compiler? In particular it isn't clear:

  • How would this method be called? e.g. if I move out of two fields and then back in, is it one or two calls to deref?
    • If one, that needs to be documented where and when it is one.
    • If two, I can't see how the second one would be valid since you didn't initialize back the previous field.
  • How would this allow moving out of a Box without moving back in?

The issue with panics is that they may interrupt modification of referred binding thus resulting in inconsistent state. But this is also true for &mut references, so it may cause only logical bugs.

This is different than exclusive references in that move references allow temporarily moving out of fields. You covered how you would avoid double drops, however there's nothing on what should happen if the caller catches the panic and considers the data to still be initialized.

In any scope of the program, move references created as described above must fulfil their obligations, if any. This means that any data structure holding such a reference is required to use the move reference.

How does this relate to Drop? Is initialization in Drop allowed? Is it even possible since it only provides an &mut T? What about moving them into closures?

his in instance means that &move T! , if something was moved from it, must be initialized back in the same scope in all possible branches. Analysis also must take into account diverging expressions: move reference have to be initialized before return , loop {} and loop {..} resolving to uninhabited types. break is included in the list only if a move reference was created inside a loop that a particular break breaks.

Can't we just ignore unreachable assignments like we do for local bindings? Is there a reason it has to be initialized before loop {}? Since it never returns, the uninitialized stated will never be observed and no drop will ever happen.

Due to the fact that moving a value of !Unpin type most likely will corrupt the data, we may not want it to be moved into and from a binding via such a reference.

That's not true. It's perfectly fine to move an !Unpin type if you have an &mut reference, so it must be safe if you have an &move T!, because it can be coerced into an &mut T.

impl<P> DerefMove for Pin<P>
 where P: DerefMove,
{
 type Output = P::Output;

 fn deref(&move self!) -> &move Self::Output! {
   self.pointer
 }
}

This is unsound, &move Self::Output! can be coerced into an &mut Self::Output, so this would be the equivalent of making Pin::get_unchecked_mut a safe function.

Future possibilities

&move T* kind

This kind of move references obligates to move in referenced binding, doesn't require it to be initialized.

Currently, we have no traits to describe mandatory operations on this kind (it's, in fact, a refined type).

Introducing this would require also Leak and !ImplicitDrop auto traits to describe things correctly.

The reason of not introducing this is that we could not fix soundness issues by only turning off GCE.

It's not clear from this section why &move T* would have problems that &move T! doesn't have.

  • &move T! and not Send nor Sync: this way we can't run cross-thread initialization problems and issues of scoped threads.
  • &move T! are subjects of analysis.
  • if we move out of &move T! we get byte-level-copy of a value => if it's Unpin (we can move out) then it`s same value and we do stuff with it in local scope, then we move a new value in place of old. In case of panic caller could observe old value in a binding, not uninit. memory UB, like in case of &move T*. (With known iffs around Drop)
    This is still logical confusing and WILL lead to behavioral bugs if a program is written incorrectly. Today we have exactly same situation with &mut T.

Isn't that still UB since the ownership has already been moved out? You can easily cause use-after-free, double-frees, and violation of other safety invariants.

fn foo(bar: &move inout Foo) {
  let x = bar.x;
  panic!();
  bar.x = x;
}

Yes, PITs do define it as going more deinitialized. This is why. But there's no reason the "going more deinitialized" part should be only for PITs - a move references proposal that doesn't bother with PITs should also have it, or be unsound.

Handling this doesn't require passing drop flags around, as you can use the separate unwind/return branches to select the appropriate initialization state/drop flags... That is, unless you add the ability to return move references and catch unwinds involving them and whatnot, ofc.

Fair enough; scoped threads require Send, which &mut _ is, but I still don't see a reason why Send is directly problematic for &move inout _ beyond just trying to ignore the existence of threads entirely. catch_unwind has no requirements, though.

This is still trivially UB, unless the value is Copy. Example:

fn bad<T>(r: &move inout T) {
    let v: T = *r;
    drop(v);
    panic!()
}

let b = Box::new(Data { .. });
catch_unwind(|| bad( &move inout b ));
// b was dropped but here I still think it's valid
drop(b); // double free

I believe you are correct about &move inout. Most people when they say move references, though, are just talking about &move out (&move in is "placement", and &move inout is take_mut), which is a perfectly normal type that has none of these restrictions.

This is actually clever, and the first time you've explicitly stated this.

To summarize:

  • A typestate changing reference is some &mut T->U that changes the type of the value behind a reference within some kind (typically, the set of types with varying partial initialization, but generally any owning-ptr-transmute-compatible types).
  • You can only create typestate changing references to some locally managed binding (i.e. the ones you can move from).
  • When the lifetime or scope, whichever is shorter, of a typestate changing reference terminates normally, the owner of the reference is obligated to have changed the value to conform to the new type.
  • When the lifetime of a typestate changing reference terminates normally, the owner of the pointee value now treats it as type U instead of type T.
  • On an unwind edge, the owner of a typestate changing reference is in charge of dropping the entire value, as it's the only place that knows the typestate of the pointee value.
  • On an unwind edge, the owner of the pointee place (not value) of a typestate changing reference treats the value as being in the bottom state, i.e. fully uninitialized.
  • And, importantly, a typestate changing reference is not a type itself, despite appearing in type position as if it were a type. As such, it cannot itself be stored in a binding, a generic type argument, or a field in an aggregate type, and cannot be composed upon; it is just usable as part of a function signature.
    [-- begin note: This is the biggest departure from current Rust semantics, what I dislike about C++'s design the most ((rvalue) references being not types (well, except that inasmuch as std::reference_wrapper still exists)), and I think the biggest flaw in this proposal. Not being manipulable as a type like everything else means that it sticks out as an alien, not first-class part of the language. "First class functions" aren't when a language has functions can do everything that functions can do, it's when functions are a type unto themselves that can be manipulated in bindings. I personally don't see a way to accomplish the goals of your proposal without a new non-type extension to function signatures, however.
    Plus, ideally, a typestate proposal would make fn(&mut T) and fn(&mut T->T) identical. The difference in drop responsibilities makes this impossible, though. end note --]

With this whole picture, I can understand how typestate could potentially integrate into Rust's type system. I still disagree that this is a desirable change, as we lose out on the uniformity of everything just being a normal, composable type, but I can understand the benefit.

I also still strongly disagree that the specific refined case of &move out T needs to undergo the extra "not really a type" restrictions, and both can and should be a normal type with normal type usability. I think it's useful enough to deserve its own name, but it can equivalently be spelled Box<T, &mut MaybeUninit<T>> with a proper storage-aware Box.

1 Like

I see. I will rework this.

What happens in details:

  • callee moves out, gets a value, marks referred binding to be uninitialized
  • process data (where panic can occur);
  • move new value in referred binding, mark it to be initialized.

Panic handler can see referred binding in only one state: contain a value, but marked as unitialized => no Drop would be called for it.

As stated here not calling a destructor is not unsafe.

I don't think that in Rust we can go further in terms of correctness.

I know. It was supposed to be a safe cast, but I'm not sure - arbitrary pointer may not be blindly casted to pretty safe by design reference, so I'll document it as unsafe.