Pre-RFC: Move references

Rather than try to assign punctuation to &move variants, I'd suggest using contextual keywords (&move in T, &move out T, &move inout T) with a note that this is placeholder syntax that may be changed as the feature is experimented with. Assigning punctuation to it is going to have an uphill battle as it is far from immediately clear what each one means.

(Though maybe &move in is also confusing, since that's what you'd use for an "out parameter," as you're moving into the reference...)

I'd also recommend dropping field-granularity in/out specification and DerefMove to future possibilities. &move on its own actually has merit to stand on without these, and that makes it a smaller more incremental stepping stone.

With those pieces torn out and deffered, I think this might actually have standing as an eRFC. Roughly, the Experimental RFC is a proposition of a design that we (as a community) would like to see experimented on in-tree, but without the expectation that it could be stabilized without another RFC round taking the experience learned into account.

But that also said, you haven't solved the panic problem for &move in. &move out is gracefully handled by drop flags, as it's just treated like a stack binding in someone else's scope. &move in is the problem case, though:

fn uhoh() {
    let data: Data;
    catch_unwind(|| initialize_data(&move in data));
    dbg!(data); // 💥 data is believed to be initialized
}

fn initialize_data(_: &move in Data) {
    todo!("I'll get to this later")
}

As far as I can tell, there are only a few options that can make this sound:

  • The solution taken by the take_mut crate: unwinds with &move in on the stack are "double panic" aborts. This works, but is undesirable, and not a great solution to require in the design of the language.
  • Poisoning. Basically, &move in is a reference to Data and some shadow state, likely equivalent to the drop flags for Data. If the unwind is caught, a new one is started up "whenever" it's noticed that Data wasn't actually fully initialized.
    • Poisoning, but less granular. The called function gets a slot to say whether Data was properly initialized, or that it wasn't, no more granularity. The called function is responsible for dropping any partially initialized parts of the Data.
    • Poisoning, but more magic. If a function can be proven to always move into the &move in reference, the compiler tags it as such, and omits the extra shadow state. This is probably just an optimization (though an ABI impacting one), but also one that would rarely apply, due to there being no simple way to prove the lack of panics (and most code doing won't panic but statically could operations like integer arithmetic and indexing).
    • Poisoning, but catch_unwind aware. Monomorphize every use of &move in for whether it's transitively in a catch_unwind or not. But then again I think the runtime calls main() in a catch_unwind, and each thread is effectively a catch_unwind, so that's probably moot.
    • Also I just want to note that "whenever" is not an acceptable specification for when the unwind resumes, and that catch_unwind can be hidden behind an arbitrary amount of abstraction, and potentially opaque ones.
  • Make &move in require unsafe somewhere early, before a panic can expose uninitialized data. At that point, though, what's the real advantage over MaybeUninit and writing unsafe pointer code?
  • Make &move in not UnwindSafe, so you can't catch_unwind across &move... except UnwindSafe is just a lint and can't be used to enforce safety.
    • Add a new idea of UnwindSafe that can be relied on for soundness. Probably untenable because API design already relies on the fact that everything is required to be sound even in the face of unwinding.

So looking at that... the first eRFC really should only concern itself with &move out, as that's fairly simple comparatively. It should leave space for &move in, but not try to solve those problems yet, as solving placement new is a hard problem.

(And in this incremental world, I think we say &move in => &move in, &move out => &move, and &move inout => &mut.)

4 Likes

The problem with catch_unwind could be solved by either forbidding to capture &move in references in closures (but how can you do this when you factor generics in?) or creating a stronger version of UnwindSafe (which would however be a breaking change, unless it becomes another implied trait like Sized).

But how would you handle leaking the &move in? In that case there's almost nothing you can do to detect it other than tracking initialization flags at runtime (but would it really be possible? Do you always have somewhere to store them?).

3 Likes

I guess that it would be better - this both solves concern with catch_unwind and retains future possibilities. And AFAIK UnwindSafe is required by catch_unwind, so no leakage of &move in\inout should happen.

I want to store them as a part of &move reference itself. In case of inlining - with drop flags.

The use of &move in(/inout) is intended to be tracked statically, not dynamically. The reason is like "how do we handle a detected failure?".
I guess that existing branch analysis should be enough to track the usage.

Since you seem to be unfamiliar with the concept that “UnwindSafe is just a lint and can't be used to enforce safety”, you might be interested in reading the documentation of AssertUnwindSafe.

2 Likes

As others said UnwindSafe can't be relied upon for safety. That's why we would need to add a new trait which can be relied upon by unsafe code. Adding it to catch_unwind's bound may however break any generic code that assumes a AssertUnwindSafe<T> can be passed to catch_unwind. The only possible solution that I can see is making that trait implied like Sized, and make &move in T not implement it.

That won't work if the &move in is leaked. You need something that lets the caller track the initialization.

If we could statically prevent leaks then we wouldn't have Box::leak, mem::forget and the whole leakpocalipse.

Well, this is true only if you allow &move in T to be a type like the others. How much do we want to treat &move in T as a type?

  • Can you store &move in T inside other types?
    • What are the semantics of nested move references? Like &move in &move in T, &move out &move in T, &mut &move in T ecc ecc
  • Can you instantiate a generic type parameter with &move in T (or a type that contains an &move in T)?
    • When can we assume that a function that takes a value parameter of generic type T and is instanted with an &move in U will init it?
      • If this is implicit, what about accidental semver violation?
      • If this is explicit, how do we mark it? Another syntax? Auto trait (can we use it for catch_unwind too)?

Well, &move .. refs. are supposed to be types, so they may be used as generic parameters.

They may be stored in compound types - in this case the rules are the same as for &mut references - reference captures a borrow and uses it, then, when the compound structure is dropped, the "move borrowed" binding is assumed to be initialized\deinitialized\unchanged.

As of nested case: DerefMove and its coercion(?) works only for &move T!, so in other cases each "layer" should refer to a binding.
This way, things like &move (&move binding*) are forbidden (which binding the top layer refers to?).

I don't think that the contract of any &move .. kinds is permissive enough to accidentally break SemVer. Though this requires additional investigation.

Then how can you ensure that the function has initialized the ref?

Then that's no good because they can be leaked by Rc/RefCell cycles, in which case they're never dropped but the code returns normally.

What if a function that takes a T and creates an &move T is instantiated with T=&move U? It ends up creating an &move &move U

That was assuming that whether a generic function can take an &move in or not is implicit, in which case changing the body of the function could make it no longer accept &move in references, resulting in a breaking change.

I guess I see what you mean. I was supposing that &move .. cannot reference rvalues, as they have no guaranteed place in memory, as well as immutable bindings - because move references imply mutations (and immutable bindings can be optimized out).
This way, in generic code you can have

    let mut a = &move b!; //move ref is in place of T
    let move_ref = &move a*;

but not:

    let move_ref = &move (&move b!)*; //you cannot create move reference to an rvalue.

You cannot reference rvalue with move reference, this rules out generics concern in its root. I should have mentioned this in RFC...

I think we need something like #[must_use], but a hard error at language level. I briefly mentioned this in RFC - seems I had to describe better.
The idea is to use branch analysis and exhaustiveness checking to check whether a function (that consumes &move T* or &move T!) always initializes them or not - in the latter case an error should be thrown.

However, the concern about cycles, leaks and presence of generic data structures makes me think that we need new implied auto trait...

I don't see a way we could ensure usage of a few kinds of &move .. references in case where they are stored in a structure. The case of &move T! is clear, as it can't produce incorrect state, but cases of &move T* and &move T are impossible to reason about (with known reservations about SMT solving - but this still breaks SemVer).

We need an auto trait that also would be implied for generics. With it we could mark &move T!, but not other kinds - this way we dissect both the concerns about catch_unwind, placing move references in data structures.

If &move in T worked just like an uninitialized local let x: T;, it would be an error to capture it in a closure, move it into a struct, or pass it to a function expecting some U by value. The requirement to initialize could mean that any (non-panic) returns from the function count as a use of the initialized value, so the &move in must have either been directly initialized or passed to some other function. After doing either, it could be used normally as a &mut.

For unsafe code you'd need to be able to borrow it as a &mut MaybeUninit<T> or take a raw pointer, and there would have to be something like a function fn assume_init<T: ?Sized>(x: &move in T), which would be found somewhere behind many non-trivial uses.

When unwinding from a function taking some &move in, it should be considered uninitialized. Any other &move in references that are in scope are either initialized or not (known statically or via drop flags), and should be dropped if they are.

impl<T> Box<T> {
    fn new_with(f: impl Fn(&move in T)) {
        let b = Box::new_uninit();
        let move_in_ref = b.placeholder_into_move_in(); // &mut MaybeUninit<T> -> &move in

        f(move_in_ref);

        // Safety:
        // *move_in_ref is initialized here, meaning that so is the box it came from
        unsafe { b.assume_init() }
    }
}

fn main() {
    // OK
    let x = Box::new_with(|x_ref| { *x_ref = 42; });
    // ERROR: x_ref must be initialized before returning
    let x = Box::new_with(|x_ref| { });
}

At this point it looks more like a different style of parameter than a proper type. I don't know if the limitations leave it sufficient power. For example, a function initializing only some prefix of a slice would want to (and have to!) return the remaining uninitialized part to the caller, but to do that in a usable way it would have to be part of some tuple or Result, which goes outside of what uninitialized locals can currently do.

Maybe &move in references could be moved into structs or enums if those compound values were then infected by the same restrictions. However, not being able to pass to a generic function is not so simple once the reference is hidden in a Result or such. Some form of proper linear types along the lines of Pre-RFC: Leave auto-trait for reliable destruction would seem to be needed.

A more complex example that would require returning tuples of &move in:

// initialize a slice by recursively initializing both halves
fn recursive(v: &move in [String], mut s: String) {
    // this would need some new API since [T]::len takes &[T]
    let len = v.len();
    match v {
        [] => (),
        [x] => {
            if rare_error() { 
                panic!("oops")
            }
            *x = s;
        }
        v => {
            // this can't work with the above limitations,
            // since how would the function create the tuple to return
            let (head,tail) = v.split_at_movein(len/2); 

            let mut s2 = s.clone();
            s.push_str("0");
            // panic from this call would drop only s2
            recursive(head, s);
            s2.push_str("1");
            // panic from this call would drop only head
            recursive(tail, s2);  
        }
    }
}
let x = Box::<[String;4]>::new_with(|v| {
    recursive(v as &move in [String], String::new())
});
assert_eq(*x, ["00","01","10","11"]);
2 Likes

I agree that if &move in references exist, that they would follow the same rules as let x; uninitialized bindings w.r.t. non-panicking control flow.

Treating &move in as a true linear type is basically a nonstarter, as now you're relying on another new concept with heavy implementation and understanding burden, and linear types wouldn't even work well tagged on to Rust's existing type system; see Gankra's Linear Rust.

Again, I will encourage y'all to focus on &move out for the time being. &move out has a path forward; &move in has a thousand different issues to solve. (And "placement by return" (i.e. limited GCE, with an explicit option) offers a solution to placement with many fewer major roadblocks to overcome, though it still has a number of them.)

Barring further meaningful development, this'll be my last post on this topic, I've said my say.

7 Likes

After a deep dive into linear-rust, leak auto trait and hard thinking about the reasoning - I agree that it's better for this references to not be a normal type - this retains all the possibilities and makes feature cleaner.

Sorry for non-sense.

I'll update the RFC soon.

Here is an updated RFC: Revision 3.

Through my experience in language design I’ve found that the caveats of a feature speak volumes about its practicality, usability and consistency. Logically, the more caveats a feature has the more "gotchas" and little edge cases it contains, those little rough edges that cut users and introduce inconsistencies into the language. More edge cases means more cognitive load on the user which is a bad thing regardless of context. Ideally, a feature should just work, no matter what. From a user’s perspective referencing a value works, why would it matter if the value being referenced is itself a reference? The more exceptions you have to add in order to salvage a feature, the less desirable said feature actually becomes and the less useful it becomes. I think that as-is this eRFC has too many workarounds to fundamental design flaws to be generally applicable to rust

6 Likes

Only if it doesn't cause problems in edge cases. And doesn't make it possible to run into UB in safe Rust.

Because this feature is about bindings, that may contain values, not about values.

Fixing many of actual flaws requires: Leak implied auto-trait, some support for linear types.

Also I relied upon "facts" that moves are always byte-level copies, mutable binding should have distinct memory location.

Fixing issue with observing corrupted state in a panic handler can be done with a new auto-trait: a stronger version of UnwindSafe.

But all these auto-traits have their big cost. Their addition on its own need an RFC with very good motivation.

So now I'm trying to balance this RFC to reach best value\cost proportion.

This is just partially initialized types while trying to hide their partial-initialized-type-ness from the programmer.

Can't think of how this would play nice with Drop, tbh. Altho personally, we wouldn't mind using &move as the syntax for PIT mutations, instead of overloading &mut.

PITs also have an actual, statically-determined behaviour on panic: fall-back to the intersection of initialization between pre- and post-call states.

(PS: we do like the idea of "this moves in and out of foo" - as a way to identify that a panic will cause it to deinitialize. it would be useful to also have "this requires foo to be initialized, but will not deinitialize it".)

Fallback is really good idea to consider.

But how then can I be sure was something initialized or not? Btw, I'm not trying to define the behavior statically - I try to define the outcome of any move-reference-involving operation in face of panics.

But for a good reason: abstraction. Currently, move references can participate in data structures, but when they are hidden\abstracted behind generics - we don't have any trait in language (on paper we have!) that could help us with PIT operations.

Pessimistic approach with any move referenced binding considered uninitialized in presence of panic handler seems to be the least evil. But tracking captures isn't an easy task, nor has it a clear semantic (like why this closure can capture move ref. but that can't?).

Here's the thing, by allowing move references in data structures... how would that work, exactly? Move references are designed to provide guarantees at function entry and function exit (and unwind), those aren't things you can store in a data structure...

By disallowing move references in data structures, you automatically prevent closures from capturing move references. This is a win, because then you don't have to deal with catch_unwind. Not dealing with catch_unwind is always a win, because it really simplifies working out what to do with panics. And it's pretty simple really: all functions already have drop/initialization flags for everything. All you have to do is mark everything touched by the callee as deinitialized on panic, and deinitialize the rest on the caller. Further, the callee has its own drop flags for move references (ideally), so you just drop those things on the callee instead.

Abstractions aren't about hiding functional principles, they're about hiding implementation details. The drop flags are hidden and abstracted away, but they're not functional principles - they are and (hopefully) always will be implementation details. (Incidentally this is one thing we don't like about other DerefMove proposals: they propose exposing the existing drop flags machinery - an implementation detail - to the programmer. Those would kill a lot of optimizations that aren't blocked by true PITs. But anyway.)

Besides, as we already argued before, the observable effects of PITs are already uh, observable. They're not hidden. You can have partial (de)initialization today, you just can't pass those objects around. It's kinda silly, in our opinion, and we'd be happy to see first-class support for things we already have today.

1 Like

Not really, there are different variants of "move references". Some of them provide guarantees, but others don't. For example move references that are initialized at the start and uninitialized at the end don't really provide any guarantee to the caller, they're just owned references. And storing owning references in structs have practical applications, for example they would allow owning iterators over owning slices, so I wouldn't blindly disallow them.

What we can observe are not partially uninitialized types, but partially uninitialized values. They might look similar, but one concept belong to the type system and the other to the borrow checker/ownership semantics. Mixing them is gonna increase complexity a lot.

1 Like

... Well, regardless, returning move references is unsound no matter what you do about it - unless you want to expose the drop flag machinery. Which is not a good thing to want, because that's supposed to be an implementation detail. (This means the proposed DerefMove is bad, also.)

As far as users care that's the difference between first-class support and second-class support. :‌)

(It also, really, does not increase complexity "a lot". For starters, it's a fairly straightforward extension to the existing drop flags machinery, so at least on that end there should be no issues. Further, it simplifies the "figuring out what to drop" logic, as it can rely on monomorphization instead of exposing the drop flags machinery to the programmer's Drop impl. Ofc there are alternatives like always DerefMove'ing everything out before calling the Drop impl but that just sounds wrong.)