Pre-RFC: Move references

Edit: Rev. 2
Edit: Rev. 3
...
Edit: Rev. 6
Thank you all.

Original proposal

Summary

Move reference is a new kind of references that are intended to allow moving the value, but not the memory it's stored in and ease the story of initialization, essentially allowing more cases of its deferred and partial variants.

Motivation

  • Make Box less special by creating mechanism (DerefMove) of moving out of a reference.
  • Create a sound mechanism for partial initialization and deinitialization of a bindings.
  • Enable placement new to be just a language construct.

Guide-level explanation

&move references provide a way to borrow the memory of a binding while preserving the logic of moving its value. The type &move T is, in fact, a reference, but unlike other references it allows moving out initialized value of referenced binding.

There are a few types of move references: plain and annotated with ! or *.

About the functionality:

&move T &move T* &move T!
Allows to move out Allows to move in Allows both

Allowing to move out a value implies that it is initialized. So referencing an uninitialized binding by &move T or &move T! is prohibited.

Moving a value into a binding, thus making, or keeping it initialized, is a part of &move T* and &move T! contracts. This is assumed to be true by the compiler, and used in corresponding checks.

Partial initialization of a binding of a known type C is described via following syntax: &move C(a!,b*,c,...).

An example:

struct C {
  a: String,
  b: String,
  c: String,
  d: u32,
}

/// ...Promises to init `b`, keep `a` and uninit `c`, doesn't touch `d` at all.
fn work(arg: &move C(a!,b*,c,.d)) { //dot prefixed `d` may have been omitted.
  let mut tmp = arg.a; //we moved the String to `tmp`
  tmp.append(&arg.c) //we may not move the 'arg.c', but we haven't gave a promise to initialize it back.

  arg.a = tmp; //we initialized `arg.a` back; removing this line is hard error.

  arg.b = "init from another function!".into();

  //println!("{:?}",arg.d ); //error: use of possibly uninitialized value.
}

fn main() {
  let trg: C;
  trg.a = "Hello ".into();
  trg.c = " Hola".into();

  work(&move trg);
  println!(&trg.b); //legal, as work gave a promise to initialize
  println!(&trg.a); //legal
  //println!(&trg.c); //error: use of definitely uninitialized value.

}

The use case &move T* reference family is initialization: these references may be used to describe placement features.

&move T! is a move reference to initialized binding with ability to move from it. In fact it can be viewed as mutable reference. The reasons of creating it are simple:

  • It doesn't change existing behavior of &mut T.
  • It is easily implied as a logical continuation of all the syntax of the feature.

Fields of a known type that are unaffected by an operation are simply not mentioned in the header of a move reference type.

Syntax of &move references to a tuples is following:

Given a tuple (u32,i64,String,&str) the move reference syntax is like: &move (.u32,i64,String!,&str*) - note the dot prefixed u32 - it will not be touched by a consumer of a reference, but is here to distinguish different tuple types from one another (in cases of named structures untouched fields are simply not mentioned).

Reference-level explanation

Creation

It may be obvious, but creating a &move .. reference is only possible for local bindings and bindings referenced via another &move ...

Interaction with patterns:

We introduce a new pattern kind ref move NAME: this produces NAME of type &move T!. The reason of the ! obligation is that we may not want to left a binding (partially) deinitialized after execution of a pattern-matching construct.

Subtyping:

move references have the following subtyping rules:

  • &move T! is a subtype of &move T* for all T.
  • &move T(*list of covered fields*) is a subtype of &move T(*another list*) for all T if and only if first list mentions exactly the same fields as does second, and every mentioned field of the first guarantees the same as corresponding field of the second.

DerefMove trait

I also propose design of DerefMove:

trait DerefMove {
  type Output;

  fn deref(&move self!) -> &move Self::Output!;
}

The reason of such design is that we may not want allow pattern matching to partially deinitialize a binding, as it will require way more complex analysis as well as will give pattern matching powers that do not align well with its original purpose (expressing "high level" logic).

Aliasing:

Given that all move references are intended to modify referenced binding they all must be unique as &mut T is.

Interaction with panics:

The proposal is said to rely on current move behavior: byte-level copy. However these may be deleted during optimizations. This leads to concerns like "what if placement function or function with temporary move panics?":

It's matter of observability: if no one will ever see corrupted data, why to avoid it in first place?
Given aliasing policy of Rust and the fact that panic never returns not even the current user code in thread will see data corruption.

Summary of syntax

//TODO: EBNF for clarity?

Drawbacks

  • This adds an entire kind of references.
  • We may want a separate mechanisms for both partial initialization and &move\&own references.

Rationale and alternatives

The feature serves 2 distinct needs: partial initialization and moving a value but not the memory. The benefit of this proposal is that it lets talk about both needs, but in the same time doesn't allow to forget about their deep connection. Moreover, here functions are coupled in one distinct mechanism, making it easier to learn it and to work with it.

The main alternative is to have two different features for both partial initialization and move references.

Prior art

Unresolved questions

  • Should we allow moving an Unpin types out of a move reference?
  • Should we allow coercing &mut T to &move T! and vice versa?
  • Is the way the DerefMove trait is defined here right? What's about another kinds of move references?

Future possibilities

I can't think of anything valuable yet.

2 Likes

&move _ should be both by default, and I'm not sure if we need the restricted versions (especially for the MVP).

This is false. Unwinding panics can be caught using std::panic::catch_unwind, and language features shouldn't depend on the panic strategy.

This can already be done, and give a safe interface. Rust Playground (note the box_in_place function).

edit: I should really take more time on these examples (fixed)

I don't think this is the right RFC for views. Tacking on lots of other features to an RFC can make it harder to sell

4 Likes

I don’t understand this section, could you clarify? In particular this:

seems to indicate that &mut T and &move T! are “the same”… mutable references don’t allow to be moved out of (unless you immediately move something else back in with mem::swap etc; or, alternatively, you might use some library such as replace_with/take_mut to make sure the program aborts on panic or there’s a scope mechanism that will make sure to put back some value in the panic case) because it’s otherwise leaving “uninitialized/garbage” values behind the reference. I don’t understand your explanation of how this problem is solved in your proposal. In particular:

sounds like it is indeed supposed to be a requirement that passing an &move T! reference to a function means that the value behind the reference is still initialized after that call.

1 Like

In addition to the above: I don't even think this solves DerefMove. Any DerefMove RFC should show the implementation for Box, and it needs to maintain the "box is like a local binding, but on the heap" properties, where you can move out with a deref, and then later move back in with a deref, and that the heap allocation is not feed until the Box is dropped, whether it's "complete" or not. (This is #[may_dangle], the borrowck eyepatch.)

1 Like

Thanks, now I see. I'll update this soon.

The problem is about what happens when you deinitialize an obligated reference, then panic. How do you ensure correctness of a data if it all modifications are done in place, but then something blowed up?

That's why I putted this question about coerces between these into unresolved questions.
The alternative is to allow "temporary move out" of a plain &mut T, but I don't think it's a good idea in terms of complexity.

Yes, that's intended. This is to describe in-place mutations of value: operations on &move references are intended to be as optimizable as possible.

Okay. For context, this observation of mine was intended as an argument in favor of the hypothesis that &mut T and &move T! are essentially the same. And on that point

I clearly disagree. I don’t see how introducing a second, equivalent kind of reference type and calling it &move T! is supposed to reduce any complexity. The unresolved question that remains really should be “are there any differences between &mut T and the proposed &move T!?”. And if there aren’t any we don’t need the type &move T! at all. I think it is a terrible idea in terms of language design to have two distinct but completely equivalent types of references. Even if we didn’t offer a coercion for some reason, as long as &mut T and &move T! really are equivalent, unsafe code in a library could offer the coercion.


I know the problem. I said “I don’t understand this section” because I thought the section was trying to give a solution to the problem. I guess it doesn’t (yet?) do so… if someone comes up with a solution for panics and &move T! interaction, it should most likely also be applicable to &mut T directly. I guess that would already be enough content for a whole RFC by itself in that case, no need to also include two more new kinds of references and some DerefMut trait and some special references tracking initialization of individual fields, including a whole bunch of cryptic new potentially confusing syntax proposals in the same bundle.

Another point now that we seem to agree that there’s a requirement for &move T! references to get initialized by the end of a function call:

this table is probably more accurate if you also say that &move T* requires you to move something in (correct me if this isn’t what you had in mind). There could probably be a forth type that is initially uninitialized and also requires to contain no value anymore in the end. On that note, I guess that &move T also requires the callee to move out of it; however the compiler could ensure that by introducing an implicit move-out + drop in case the move out doesn’t happen. Regarding panics, again, &move T* would come with the same problem that &move T! comes, how to ensure that something gets mover into in case of panics? &move T is unproblematic OTOH, unwinding could just drop the contained element if it wasn’t moved out already.

4 Likes

Can you elaborate more on that? I seem to have missed things entirely.

Views are constructs like

They are a complex feature all on their own, so bundling them in will make it more difficult to get support for the core elements of this RFC.

This seems like a massive change with very little gain.

IOW, it doesn't compose.

Why can't DerefMove take self by-value? Anything that interacts with memory allocation will need to have unsafe anyway, and at that point, a combination of ptr::read and mem::forget is a fine solution.

Is partial initialization/deinitialization unsound in current Rust? I'm not aware of any such issues.

Why does that need to be a language construct? Or is it not already? What are the benefits of this? Is there a concrete placement-new proposal this would improve upon, and if so, how?

Because taking self by value doesn't describe the operations that Box allows that would be generalized into DerefMove. DerefMove is meaningfully different from just into_inner(self) -> Inner.

Given b: Box<!Copy>, specifically we have at least

  • {*b} moves the value out of the box, but does not free any memory. Memory is freed when b goes out of scope (implicitly drops).
  • Given a moved-from box, you can assign back into it with *b = value. This moves the value into the allocation that the Box is still holding onto.
  • (*b) works pretty much exactly like a local stack binding. This means you can partially move members in and out of the box. Drop flags will be generated so that only fields actually in the box will be dropped at the end of scope. Additionally, if all fields are known statically to be moved back in, the box is "reconstituted", and you can move the whole box again.

The exact specifics of how allocations are kept alive and reused isn't observable without UB (or a logging allocator, which isn't guaranteed to actually be called for allocations anyway), so could be changed. However, the behavior interface is stable and desirable, so no DerefMove functionality is going to be accepted without supporting the exact current behavior of Box.

For better or for worse, "DerefMove" refers to basically all of the magic of Box dereffing to a place the compiler treats identically to how it treats a stack slot. Tbh, DerefMove isn't about making Box less magic, because all of that magic will still be there. It's more about letting other types opt into the same magic, allowing other people to write their own Box.

(I refrain from commenting on the specifics of this proposal, I just want to clarify the needs of DerefMove.)

13 Likes

Wow, I haven’t known any of this. Where is this documented? I only knew of *b for moving a value out of the box, thinking it was fancy syntax for something like an into_inner(self) method.

4 Likes

Just wondering: is that actually implemented in a published crate?

Your API is not sound.

fn main() {
    let _x: bool = *box_in_place(|_| Init(Box::leak(Box::new(false))));
}
   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.51s
     Running `/playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri target/x86_64-unknown-linux-gnu/debug/playground`
error: Undefined Behavior: type validation failed: encountered uninitialized bytes, but expected a boolean
  --> src/main.rs:63:20
   |
63 |     let _x: bool = *box_in_place(|_| Init(Box::leak(Box::new(false))));
   |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ type validation failed: encountered uninitialized bytes, but expected a boolean
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
           
   = note: inside `main` at src/main.rs:63:20
   = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
   = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
   = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:66:18
   = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
   = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40
   = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19
   = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:431:14
   = note: inside `std::rt::lang_start_internal` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:51:25
   = note: inside `std::rt::lang_start::<()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:65:5

error: aborting due to previous error

Isn't that easily fixed by making Init's field private?

Yeah, that might be enough.

Using move references as out parameters too seems to be conflating different requirements, and may sink this RFC for being too ambitious :warning:.

I'll mention two crates of mine which show I have been giving these pattersn their fair share of thinking:

  • https://docs.rs/uninit with a whole module / type that has to deal with out parameters;

    These eventually require usage of unsafe except for trivial cases: truly solving this problem with compiler support is equivalent to solving the placement new RFC, which is a big undertaking;

  • https://docs.rs/stackbox features moves references / local-allocated "Box"es (that is, a &mut ManuallyDrop<_> reference but which cannot be reborrowed and with a Drop impl that does a drop_in_place.

From now on, I'll thus only be talking about the more reasonable &move reference proposal / language-blessed-StackBox, and not about out parameters.


Language level support for StackBox is definitely something that:

  • is up the alley of compiler support: the semantics are clear, it's mostly a question of smoothing the ergonomics (just see the different macro constructors in that crate to see what I am talking about).

  • it is also the easiest way to feature unsized_locals without weird magic, à la you can use an unsized local here but not there, etc.. Having a classic -> &'storage move [T], with a clear &'storage mut Slot<impl Sized> parameter (with impl having existential semantics, not universal one), a parameter which could even be elided for the simple cases, if the RFC wanted to go that far, is something that conveys clear semantics and yet does offer enhanced ergonomics.

    Also applies to &'storage move dyn FnOnce…, etc.

  • In that regard, one of the most painful points of a user / library-defined abstraction over &move references is the lack of magical unsize-coercion in stable Rust; see Add unsize implementation for StackBox by HeroicKatora · Pull Request #3 · danielhenrymantilla/stackbox.rs · GitHub

  • Having &move … references would allow featuring Pin<&move …> references too; that is, a macro-free way to perform stack pinning.

  • Finally, but that always been one of the motivations for &move references: it would allow hinting the compiler to use by-reference ABIs over by-value ones.

5 Likes

Actually, here’s another unsoundness:

use std::any::Any;
fn main() {
    box_in_place(move |u| {
        fn impl_ify<'b>(i: Init<'b, bool>) -> Init<'b, impl Any + Into<bool> + 'b> {
            i
        }
        let i = impl_ify(u.write(true));
        let b: bool = (*box_in_place(|_| i)).into();
        panic!("{}", b)
    });
}
   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.41s
     Running `/playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri target/x86_64-unknown-linux-gnu/debug/playground`
error: Undefined Behavior: type validation failed: encountered uninitialized bytes, but expected a boolean
  --> src/main.rs:74:23
   |
74 |         let b: bool = (*box_in_place(|_| i)).into();
   |                       ^^^^^^^^^^^^^^^^^^^^^^ type validation failed: encountered uninitialized bytes, but expected a boolean
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
           
   = note: inside closure at src/main.rs:74:23
note: inside `box_in_place::<bool, [closure@src/main.rs:69:18: 76:6]>` at src/main.rs:62:9
  --> src/main.rs:62:9
   |
62 |         f(Uninit::from_raw(bx.as_mut_ptr()));
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `main` at src/main.rs:69:5
  --> src/main.rs:69:5
   |
69 | /     box_in_place(move |u| {
70 | |         fn impl_ify<'b>(i: Init<'b, bool>) -> Init<'b, impl Any + Into<bool> + 'b> {
71 | |             i
72 | |         }
...  |
75 | |         panic!("{}", b)
76 | |     });
   | |______^
   = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
   = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
   = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:66:18
   = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
   = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40
   = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19
   = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:431:14
   = note: inside `std::rt::lang_start_internal` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:51:25
   = note: inside `std::rt::lang_start::<()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:65:5

error: aborting due to previous error

cc @RustyYato @SkiFire13


Presumably fixable by making sure that the lifetime parameter of Uninit and Init is invariant.

2 Likes

Yeah it doesn't work if Init and Uninit are invariant with respect to their lifetime parameter. I was playing with a version of this concept with invariant lifetimes and if it doesn't allow compiling your snippet.

I think this could also allow temporary moving out of a reference, just pass an Init<'a, T> and ask for an Init<'a, T> back with the exact same lifetime 'a. Add some macro magic for field projection and we could even get partial moves.

Sorry for this discussion being a bit off-topic, but I noticed something…

Actually, the Any + in the impl can be removed for this counterexample. However it is kind-of weird that an impl Any + … + 'b type is considered to be “short-lived” (in the sense that every reference to it lives not longer than 'b, which is used in the inner call to box_in_place) and and still implement Any even though Any: 'static. And indeed, this seems to be a bug in rustc:

trait StaticDefaultRef: 'static {
    fn default_ref() -> &'static Self;
}

impl StaticDefaultRef for str {
    fn default_ref() -> &'static str {
        ""
    }
}

fn into_impl(x: &str) -> &(impl ?Sized + AsRef<str> + StaticDefaultRef + '_) {
    x
}

fn extend_lifetime<'a>(x: &'a str) -> &'static str {
    let t = into_impl(x);
    helper(|_| t)
}

fn helper<T: ?Sized + AsRef<str> + StaticDefaultRef>(f: impl FnOnce(&T) -> &T) -> &'static str {
    f(StaticDefaultRef::default_ref()).as_ref()
}

fn main() {
    let r;
    {
        let x = String::from("Hello World?");
        r = extend_lifetime(&x);
    }
    println!("{}", r);
}

(playground)

Edit: Now on github

Edit2: The issue with impl Trait + 'lifetime return types being unsound is actually worse than I initially discovered. Perhaps even with the lifetimes still covariant the API won’t be unsound anymore once this compiler/language-bug has been fixed. In any case, invariant lifetimes doesn’t hurt either, so it’s still reasonable to make it invariant, I guess.

1 Like

Thank you all!

Here's updated version!