Idea: Limited custom move semantics through explicitly specified relocations


#1

Idea: Limited custom move semantics through explicitly specified relocations

Please take my ideas lightly as these ‘ideas’ aren’t particularly thought through. I simply want to seek feedback. Also this post contains pseudocode! Not valid rust code.

This idea are some thought collections regarding the difficulties and solutions of self-referential structs. In particular, this idea is based on the fact that we could specify our own move semantics but in a very limited way (just to support self-referential structs).

To give you some context, please see this conversation between redditors discussing a similar idea:

Tl;dr: The idea is to support a custom trait which when implemented applies custom move semantics where you can ‘reinitialize’ fields which contain self-ref-lifetimes.

But this idea is flawed because you couldn’t possibly know where the references were pointing to. Also re-initializing heap space would be problematic in the same way.

This is where my idea comes in:

We may introduce two new entities: A trait called Relocatable and a internal struct called Relocation.

The idea is basically make relocatable references by creating them on-creation through a closure. The following pseudo-code shows the core-idea behind this approach.

// This trait must be implemented on structs which contain self-ref-lifetimes
trait RefMove {
    // This method is called whenever a move has occured.
    // Object moves -> object is memcpy-ed (default move semantics applied) -> move() is called
    // This method shall relocate each field which contains self-ref-lifetimes
    unsafe fn move(&mut self);
}

// This trait is implemented by entities which can be relocated, based on an 'new_obj' with a generic type O.
trait Relocatable<O> {
    unsafe fn relocate(&mut self, new_obj: &O);
}

// Magically generated struct for self-ref-lifetimes
// ----

// This is a Relocation which is basically a relocatable reference pointer, the idea is to to have a relocator which can generate a new reference based on the base object.
struct Relocation<T, O> {
    cached_ref: *const T,
    relocator: Box<Fn(&O) -> &T>
}

impl<T, O> Relocation<T, O> {
    unsafe fn to_ref<'a>(&self) -> &'a T {
        std::mem::transmute::<*const T, &'a T>(self.cached_ref)
    }
}

impl<T, O> Relocatable<O> for Relocation<T, O> {
    // This performs the the actual relocation by calling the relocator with the new base object.
    unsafe fn relocate(&mut self, new_obj: &O) {
        self.cached_ref = (self.relocator)(new_obj) as *const T;
    }
}

// ----

// The Relocatable trait may be additionaly implemented by Vec, Option, ... :

impl<O, R: Relocatable<O>> Relocatable<O> for Vec<R> {
    unsafe fn relocate(&mut self, new_obj: &O) {
        for elem in self.iter_mut() {
            elem.relocate(new_obj);
        }
    }
}

// ----

struct Data {
    a: u32,
    b: u32
}

struct Foo {
    data: Data,

    // Internally self-ref-lifetimes are represented by a compiler-generated 'Relocation<X, Y>'.
    // Where X is the reference type and Y is the enclosed type.
    a_or_b: &'data u32,   // Actually: Relocation<u32, Foo>
    heap: Vec<&'data u32> // Actually: Vec<Relocation<u32, Foo>>
}

impl RefMove for Foo {
    unsafe fn move(&mut self) {
        // This is actually invalid rust. But you get the idea.
        self.a_or_b.relocate(self);
        self.heap.relocate(self);
    }
}

impl Foo {
    fn new(c: bool) -> Self {
        let mut ret = Self {
            data: Data { a: 0, b: 1 },
            heap: Vec::new()
        }
        
        // relocate may be a library function which creates a 'Relocation':
        // fn relocate<'a, O, T>(obj: &O, relocator: impl Fn(&O) -> &'a T) -> &'a T
        // ^ This is probably wrong but I simply want to express that the closure
        //   returns a reference with a lifetime which derived from the &ret reference

        ret.heap.push(relocate(&ret, move |obj| {
            if c {
                &obj.data.a
            } else {
                &obj.data.b
            }
        }));
        
        ret
    }
}

#2

“Let’s add custom move semantics to Rust” seems to be a popular idea!

I personally think that “move is memcopy” is a HUGELY important property of Rust. Like, you can just grow vector with realloc.


#3

I agree but IMHO having to explicitly opt-in to enable some types to have their own move-semantics wouldn’t hurt. I mean it’s not something one has to do every day. The possible problem I see here with the idea of having custom move semantics with relocators is that relocators can panic whereas before, simple move operations wouldn’t panic.


#4

It’s not just panicking. It’s the fact that now, in Rust, let x = y; has no possible side effects outside of just moving the thing. It can’t hide a slow computation there. It can’t touch any global state. That makes thinking about it much simpler than eg. C++, where a move (or copy) can happen automatically and invoke arbitrary (arbitrarily buggy) code.

Furthermore, you can just take a type, pass it to C, let it copy it around as it likes and get it back some time later. The type will still work. The optimiser can assume a lot of things about it.

I think this is just too nice thing to give up without a fight :innocent:

Anyway, withoutboats promised to have a solution somewhere towards the end of the series. Let’s wait for that and see how good it is.


#5

I think that the problem is even larger than exception safety. Currently, Vec just literally calls realloc when doubling capacity, regardless of the input type

If you add custom move semantics, you’ll have to replace this all with for-lop, which moves elements one by one, which would be pretty sad in terms of both complexity and performance.


#6

Excuse me if I am wrong but I don’t see a problem here. As with my idea with relocators, you wouldn’t have to touch existing implementation, just make the thing relocatable because a self-referential field would just need to be relocated based on what the base object has been moved. In my view, the custom move semantics should be very limited just to serve self-referential structs.


#7

Yeah, I am just trying to throw in some more ideas of what might be done :slight_smile:


#8

Maybe I’m missing something, but it sounds like you’re saying you wouldn’t need to change Vec’s push() implementation (i.e. it would still implement resizing as just a single realloc() call regardless of its element type) and at the same time saying that any self-referential fields in those elements would get “relocated” during a push() (i.e. arbitrary user-defined code associated with that type would get invoked) which I’m pretty sure is a contradiction.

Maybe you’re thinking of using specialization to make it so that Vec’s implementation “remains the same” for any trivially moveable T? That’s arguably correct if by “implementation” you’re talking about the code in the final executable rather than the source code, and I believe that would solve the performance issue, but it’s still a significant increase in complexity for the source code implementation and the API contract of all container types (and probably a lot of other generic types).


#9

Actually, bigger picture response. I don’t think we’re in a position to even debate the pros and cons of any specific proposal for move constructors* yet. As far as I know, every use case for move constructors is motivated either by optimization opportunities, or by corner cases in existing language rules for which move constructors would constitute an escape hatch. In other words, nobody wants move constructors “for their own sake”, so each of those use cases is at least potentially an XY problem which may have viable alternative solutions.

In particular, the fact that self-referential types are by far the most popular motivation for proposing move constructors (maybe the only? did anybody suggest move constructors in a custom DSTs proposal yet?) indicates that we really need to finish exploring alternative solutions for this specific problem before we can even conclude that a change as fundamental as move constructors is the best solution. Assuming we did become convinced that move constructors are the only way to get async/await or some other “necessary” feature, and that it is worth paying that price, we’d then have to find out what other potential use cases for move constructors there are (I can at least conceive of custom DSTs using them), and only then could we finally attempt to answer “does the Relocation flavor of move constructors have any advantages over [other flavor]?”

At the very least, we need to wait for @withoutboats to finally tell us what his idea is :slight_smile: For all I know it might be as straightforward as giving MIR some new “make me immovable”/“moveable” primitives that only the borrow checker consumes and are only exposed to the surface language via generators, and worry about other kinds of self-referential types later. Or maybe that last sentence was total gibberish.

*In case clarification is needed, I’m using “move constructors” to refer to any Rust language change that would introduce the possibility of arbitrary user-defined code getting executed on an “implicit” move. If the move requires some kind of explicit annotation like x.move(), or is done outside the core language with some user-defined library function like X::move(&x, &y), that’s a totally different kind of proposal.


#10

Another important application for move constructors is wiping secret data. Today we can do it on drop, but we are unable to protect program from leaving copies due to the implicit moves. It’s probably can be done by marker trait with support baked into compiler, but I think more general solution which will cover other use-cases could be more preferable.


#11

I think I misunderstood you. Just to confirm the current situation, the issue with a Vec and a self-referential is that the Vec can reallocate and then references to the vec’s items would be then lost? If this is what you meant, then I don’t think my “proposal” would work with that. This “proposal” assumes that self-referencial references the object itself which is eventually going to be moved, thus it is incompatible with heap space (Vec).

Something like that would be not possible with this idea as a solution:

struct A {
    a: Vec<u32>,
    b: &'a u32
}

#12

I think making self-referenced field inaccessible through safe code as proposed in the “on movable self-referential structs” is a better solution and it can work with heap allocated data. (although compiler will have to be able to understand that this reference points to data inside of heap allocated structure) And it’s logical, as if you have created reference to owned data normally you can not use variable which owns this data until reference goes out of scope. The main problem in my proposal is how handle construction of such structs.


#13

The compiler back end can make arbitrary copies of any data for its own purposes, so you need something else entirely to prevent that from happening.


#14

That reminds me: One thing I failed to fit into my last post is that move constructors are only a partial solution for many of the use cases that motivate them. As @jpernst occasionally mentions, the broader issue with self-referential types is not just about movability but also the need for something like a “generative existential lifetime”. As you just said, the wiping secret data use case seems like it would need at least some language-level guarantees to be truly secure even if move constructors were available. And as the woboats blog post that prompted this thread said, we need a way to tell which generators should be immovable generators in addition to a mechanism for immovability itself.

In particular, the not-covered-by-move-constructors parts of these problems appear to have no overlap at all.


#15

Again, one step in the direction of C++ (and the associated feature creep). I would have to agree with @matklad on this one — Rust’s memory model is fundamentally dependent on the property that "move is memcpy". It beautifully fits basically all the non-niche use cases.

I strongly doubt that changing one such, such fundamental, core idea of the language from trivial to moderately complex would be a good idea in any case. Just think about how many places this behavior is being relied upon — I don’t want to imagine the number of bugs (security or otherwise) a change like this is going to inevitably incur.

Also, a comparatively smaller, social problem is that people will think “oh, Rust now has move ctors, let’s use them all over the place because they are how we make programs fast; after all, they were how we made programs fast in C++!” — and it will creep into the non-niche code as well, and no amount of linting will be able to undo that damage (because, let’s be honest, in reality people mostly just #[allow()] if the linter is complaining.)

So, I wholeheartedly disagree with the idea of making moves non-trivial for the sake of some corner-case problems or features, especially given that it might not be the (only) solution to the aforementioned minority problems.


#16

Perhaps there could be a new trait with the same relation to Move as Clone has to Copy. Ideally we’d have &in/&out references to use for this:

trait Relocate {
    fn relocate(&out self, dst: &in Self);
}