A thought on in-place

Hi folks. I’ve been moving my workplace over to Rust for as much as possible. One of the things we miss on occasion is placement new from C++, which is why I’ve been following the in-place work that’s being done.

A lot of the proposals (out ptr, etc) are pretty neat, but are perhaps not as ergonomic as other Rust features. I thought I might throw a possibility that, while not 100%, might be a useful idea to be picked over for inclusion into other proposals (even if it is part of the 'what definitely not to do' section). This is modelled a bit like C++.

(Note that this has some similarities to Pre-RFC: placement box with Placer trait, which we found only a little while ago. However, the sketch here is more simple (perhaps a mistake?) and feels more flexible, at least to us).

Here is the user-facing syntax concept (with silly example types):


// Declared somewhere
struct AType {
    a: u64,
}

struct BType(String);

// Elsewhere...

let spot = MaybeUninit::<AType>::uninit();

let place = AType in (spot) {
    a: 42
};

// Or...

let mut vec = Vec::<BType>::new();

// Possible future addition to std container classes
// Closure takes a destination provided by the container.
vec.emplace_back(|spot| {
    BType in (spot) ( "Text".into() );
})

where dest is a type that implements the InPlaceDestination trait (all names subject to bikeshedding, of course). For example, MaybeUninit can implement this, as well as raw pointers. The idea here being that the trait describes a 'place' and how the initialization should work with the memory provided, and the type that is returned in the expression, based on the semantics of that 'place'.

A sketch of the trait (might) look like this:


unsafe trait InPlaceDestination<T> {
    type Output;

    fn destination(&mut self) -> *mut T;

    fn finish(self) -> Self::Output;
}

It's a rough sketch to be sure, but the flow, for a line like AType in (spot) { a: 42 }; would be something like this:

  1. spot is evaluated to obtain a type that implements InPlaceDestination
  2. spot.destination() is evaluated to obtain a writable location. (Should destination early exit here? I'm not sure a failure paths would be a good idea). Returned pointer must be a valid writable, properly aligned storage area for the type, uniquely mutable, etc.
  3. Using the same initialization rules for stack objects, initializes each field, writing content via offset of that destination pointer. The destination cannot be dropped here, being borrowed.
  4. When done spot.finish() is called, and a result is provided (so things can be mutated afterwards, etc).

I think this approach has a few nice points (especially over just fiddling with MaybeUninit::write/as_mut_ptr):

  • Most types can be emplaced (right now thinking about structs and tuple structs), without a heavy auto-trait
  • It already uses the init rules for local aggregate construction (no partial initialization, can't forgot a field) and looks familiar. Can also use the default shortcuts.
  • The same rules can apply for panics during initialization: drop fields that have already been initialized. Maybe call a failure to let the destination clean up, and then continue propagating the panic.
  • The trait implementation determines the pre- and post- initialization handling. (I suspect a failed() or other function on the trait could be useful for cleanup on the post panic'ed destination.) For example, Box<T> (with a static emplace function a bit like the vector above) can return itself; it owns the resource. A vector, would return a reference, as it controls a number of resources. A pointer destination could return a pointer; you probably know what you are doing, and thus should have some mechanism to know when to drop the pointed-to content. I think the destination call could be pretty useful for things like smart pointers for replacing inner content, not just allocating.
  • Should work well with Pin. I know there are a lot of Pin and Move things being worked on, but with both (either the pinned wrapper or the movability trait) I think the destination trait here can convey that in the output.
  • Doesn't need return value optimisation.

Some immediate issues:

  • More syntax to learn. Never great. But, maybe in this case it's warranted.
  • in is kinda confusing, but it's a keyword. A contextual keyword is probably no better. A sigil just turns things into symbol soup.
  • Arrays and enums? Arrays would look pretty awkward. Enum initialization just doesn't have a good place for more syntax.
  • Parens around the destination; maybe optional in some contexts, but without them, when building tuple structs you'd get confused if the destination expression is a function call.
  • I'm not excellent at soundness, so I'm sure there is plenty I haven't considered

In any case, thanks for the read, and thanks to all the folks working on in-place support.

How does it know when you've actually initialized the memory?

vec.emplace_back(|spot| {
    return; // do nothing 
})

and:

vec.emplace_back(|spot| {
    panic!()
})

vs

vec.emplace_back(|spot| {
    BType in (spot) { text };
    panic!();
})

and how does that work with nesting when a struct has unmovable fields to initialize with their own constructor functions?

Excellent questions, definitely demonstrating why I shouldn't be in language design. That closure aspect definitely isn't well thought out...

I would have to say, using the sketch, the vector might provide a specific destination type that, if dropped without completion, could panic. It would handle the first two cases, but that doesn't feel very satisfying. The third would have at least initialized content, but no good way to signal that back to the host.

I'm wondering if it could be turned on its head; a EmplaceBack-ish wrapper around the vector, that is itself a destination:

let mut vec = Vec::<TypeA>::new()

TypeA in (EmplaceBack(vec)) { .. };

But one could imagine an explosion of terrible wrapper types if that was the approach.

For the last question, if I understand you right, yep that would be another issue (unless, as was suggested to me, there is some scheme of passing parts of the destination onward, in a sort of recursive fashion).

I guess it just all comes down to the lack of the constructor concept in Rust; I'm not sure anything will be quite as clean as that.

Thanks for the great points!

What would be the type of place here? Or equivalently, what would be the implementation of InPlaceDestination for MaybeUninit?


Another tricky but important detail is: how does dropping work? In particular what is dropped (spot, the InPlaceDestination::Output, the emplaced value and/or all of them) and by who?

This is also important for pinning, because something like Pin<OwnRef<T>> where OwnRef<T> is responsible for dropping T while something else is responsible for "freeing" its underlying storage is unsound due to being mem::forgettable.

I feel like correctly doing in-place initialisation with this approach requires the ability to do a strong update – you take a reference to an uninitialised MaybeUninit and return a reference to an initialised value within it, but somehow prove that they're the same reference so that the caller can treat the referenced value as now having a different type. (The thread I linked has some discussion about doing those proofs in the type system, but I guess it could be as simple as "conceptually do an address comparison, but it gets optimised out".)

For partially initialised structures, Rust keeps "drop flags". There's a hidden boolean variable for every struct that can become uninitialized within in a function. Additionally, there's autogenerated extra code around assignments that restores values to some sensible state when code panics during assignment.

This tracking of state can be invisible and performant enough when it happens within a function (early Rust actually had drop flags inside structs themselves, with a hidden bool field added to every droppable struct, but that was super annoying for FFI and an obvious waste everywhere that didn't drop deinit the structs).

The problem with placement is very similar to the drop flags, but performs the operation in reverse.

When you add closures, it makes harder to pass the drop flags. And you can't pass equivalent of &mut Option<T> to track the initialised state, because when T is part of a larger struct or a dense array, there's no room there to store an extra flag.

You'd probably need to make the places a type like (DropFlagsSet<T>, &mut MaybeUninit<T>) where the DropFlagsSet is some kind of bitfield tracking initialisation state of the object and all of its fields outside of the object being initialised. Redundant drop flag variables are easily optimized out within functions, but making a smaller DropFlagsSet would require looking at the initialisation code before making the type, which is a bit circular.