Move Constructors in Rust: Is it possible?

https://mcyoung.xyz/2021/04/26/move-ctors/

I wrote up some work I did to figure out how to port the notion of move constructors from C++ to Rust. Hopefully it's of interest to this crowd. It includes something that resembles the fabled DerefMove, but which is mostly a hack to get certain properties that preserve Pin semantics to work.

This is not about move references and I hope people don't start bikeshedding those in this thread.

To alleviate some confusion: this is about making C++ move constructors callable from Rust, not about stapling move constructors to Rust as a language feature. The latter is... not the right solution to this problem, frankly.

8 Likes

I've only skimmed your blog post, but I don't see where you address at all the actual problem with move constructors. Rust allows unsafe code to move objects by memcpying their bytes to a new location, without necessarily running any code at all. That is, all types are guaranteed to be moveable without calling a move constructors, a guarantee which a lot of unsafe code relies on. Thus move constructors cannot be guaranteed to run, even if they were run every time the compiler moves a value, making them only as useful as destructors (which cannot be guaranteed to run), and not for the upholding memory safety.

Your blog post seems to focus on design issues of how move constructors would be implemented which don't relate to this problem, but this problem is the reason move constructors haven't been pursued seriously as an extension to Rust.

3 Likes

It first goes into detail on how to safely call constructors directly constructing their value at their final location. Next it describes how to safely call copy constructors and finally it describes safe move constructors. All of this uses Pin to ensure that no object is moved without explicitly invoking the copy constructor.

I think I understand now that this is about literally porting move constructors from C++ to Rust - as in, only concerning C++ FFI types which have move constructors in C++. Yes, this concept makes sense: if you provide an API which only gives you a binding to the address of the value (even if the value is on the stack) and only allow you to move the value using an API which calls the C++ move constructor, that should be sound. And the use of Pin seems like a correct use of Pin, though it might need to be another type with the same API as Pin because it might not conform exactly to the Pin guarantees, which I recall saying that the value will never be moved.

This lets you allocate these types on the stack, which is a nice improvement over heap allocating them, but there are still unavoidable limitations because of the problem I discussed. For example you can't push them into a Vec, because the Vec will move them around (ie to a new backing buffer when it reallocates) without calling their move constructors. But you could also create your own vector type which does call move constructors properly (or, more likely, move them into a C++ std::vector).

1 Like

I thought about whether I was in violation for quite some time. I believe that it's not for reasons I attempt to formalize in the article, which boil down to "C++ moves are not blind memcpys like Rust moves are".

Correct. The expectation is that you are forced to re-implement a Vec that knows how to call move constructors upon reallocation. I actually started doing this but got distracted. You get more sad if you want to use HashMap, but I think that at that point you give up and just box up the type. Ultimately, this is about avoiding a heap allocation for most "reasonable" calls across the language boundary.

(FWIW, it's not any worse than !Unpin types today. You can't directly put futures into a Vec, although async-generated futures probably have a viable MoveCtor in this scheme, even if users cannot implement it.)

I think this is the best we can do without language support, and I frankly don't think it's worth pursuing for the reasons you address. I suspect that the motivation isn't phrased super well in the blogpost, but this is entirely about C++/Rust interop. =)

For move construction (T(T&&)), I believe this is typically true. (Obviously, it is possible to subvert this.) However, this is distinctly not true for move assignment (T& T::operator=(T&&)), as it's somewhat common practice to implement it as a swap.

Although, the first rule about using the rvalue reference as scratch space is (that it will be dropped. But we care about the second rule, which is) that you don't know when the value will be dropped. General advice is that if it's okay to be dropped "eventually," you can leave owned state in it, but if you want more deterministic destruction, drop eagerly, and leave the rvalue reference in a null object state. (Obviously, it is possible to strengthen the guarantees of move assignment for a specific type to specify the state of a moved-from value.)

Because the type itself defines what it means for that type to be pinned (e.g. Unpin opts out of all pin-added guarantees), I think a model wherein a PinMove trait opts out of the "same address until destruction" in a very controlled way (i.e. you can use the "move constructor" to copy-move it to a new location). I think Pin would have to agree to this, though, under strict language lawyering. (Especially since Pin impacts the memory model's rules.)

However, we can work around this (for C++ types) rather trivially: just do the C++ thing, and leave a valid value there. Then you follow Pin's rules exactly: the value is still there, and you drop it before destroying the storage. You just did a weird copy-and-mutate earlier.

This would work transparently for C++ types, as it would just use the C++ semantics for move and then destruct. It wouldn't work for Rust types, though, because (most) don't have an equivalent "valid but unspecified" null object state that would skip doing work in the destructor.

I think there is a way to define PinMove such that it follows the law, however, though it may be kind of tricky to do so. The TL;DR is that you say that it copies the value to the new location, potentially mutating the existing value for efficiency, knowing that it then "drops" the existing value, satisfying Pin's requirements. And since the type itself defines what that operation is, I think that we can skip actually calling Drop::drop and "drop" the old value inline.

FWIW I have no good answers for move or copy assignment yet. Those tend to be slightly less interesting, since you can (albeit inefficiently) model it as new (this) auto(that). In practice I know few C++ programmers who aren't language lawyers that actively know the difference.

I think I mumbled something about this in a draft of the post but maybe it got snipped out. In general the correct mental model for C++ code is that all types are immobile, all dtors run unconditionally, and there's these weird mutating/non-mutating (move and copy, resp.) clone() and clone_from() that give the illusion of values moving around. What you describe falls out of this model.

This is precisely how my library defines MoveCtor and its userland DerefMove. To move a C++ object, you do the weird mutating copy, then immediately call ~CxxType(), and then you need to somehow destroy the storage as well. Unique ownership of the pinned value is required for this to work.

The upshot is that this allows you to move-construct Unpin Rust types if you really want to. The difference here you don't call a dtor at the source location. None of the Pin guarantees are relevant since this is an Unpin type. Whether pinnable Rust types can have move constructors is a little bit more of an open question, but I think we can probably make it work.

My main worry is that the Pin object is left in a slightly messed-up state that no one but the "move glue" witnesses. I have no idea if the memory model will notice, because the rules around what it's ok to do with a pinned value aren't... crystal clear in this corner case.

Well, turns out my solution for assignment was unsound, so there's still more work to be done. =)