Pre-RFC: Allow partial moves before `forget`


#1
  • Start Date: 2015-02-18
  • RFC PR: (leave this empty)
  • Rust Issue: (leave this empty)

Summary

Allow partial moves from drop-implementing types before forget.

Motivation

There is a class of APIs that can transform the shell surrounding a piece of data, while leaving the data itself untouched. If the shell implements Drop, then it is impossible to retain the contained data by move while disposing of the container. For example, a generalized String class that supports small string optimizations may be implemented as an enum:

enum GeneralizedString {
    SmallString([char; 24], usize),
    BigString(Vec<char>),
}

impl GeneralizedString {
  pub fn into_vec(self) -> Vec<char> {
    match self {
      SmallString(ref chars, size) => unsafe {
        Vec::from_raw_buf(chars as *const char, size)
      },
      BigString(vec) => vec,
    }
  }
}

Of course, if GeneralizedString implemented Drop, then this implementation would be impossible, since current-Rust forbids partial moves on types that implement Drop. This RFC addresses this issue, by changing the restriction from "disallowing partial moves on types that implement Drop" to “disallowing partial moves as the value is moved into the drop glue”. Since forget prevents a type from being consumed by drop glue, this would make a small transformation of the above code enough to make compilation succeed, even if Drop were implemented for GeneralizedString:

impl GeneralizedString {
  pub fn into_vec(self) -> Vec<char> {
    let rval = match self {
      SmallString(ref chars, size) => unsafe {
        Vec::from_raw_buf(chars as *const char, size)
      },
      BigString(vec) => vec,
    };
    std::mem::forget(self);
    rval
  }
}

Detailed design

This design has the following components:

  1. Remove the restriction against moves from types that implement Drop.

  2. Enforce the restriction against partial moves at the point the drop glue is inserted.

  3. Allow std::mem::forget() to be called against partially-moved values.

In practice, the first two points should mean that this error:

error: cannot move out of type `GeneralizedString`, which defines the `Drop` trait

Would be replaced with this error:

error: use of partially moved value `self` while calling destructor for `GeneralizedString`.

While the last point allows the user to prevent the error from occurring.

Drawbacks

As pointed out by @eddyb and @pnkfelix, this proposal makes forget a special case in the language: it would be the only function that can ever be defined that can receive a partially-moved value. As such, it would be impossible to write routines that forward their implementation to forget, while accepting partially-moved inputs. This could be addressed by reifying the concept of a partially-moved value (perhaps a PartiallyMoved<T>, which can be implicitly coerced from T), and making this be the argument to forget, perhaps this idea should be folded into the detailed-design properly.

I’m having a failure of imagination, and can’t think of other reasons this would be undesirable. It would take some implementation effort, and because it is backwards-compatible, it would not be a priority for a Rust 1.0 release. Are there perhaps some cases that I don’t see that the current restriction (against partial moves from types that implement Drop) would not result in identical behavior under the new restriction, when forget is not called?

Alternatives

@eddyb suggested an Internal<T>, to capture the idea of an “internal” type that provides access to the same data as T, but with insertion of drop glue prevented. I personally have a parsing challenge when reading the Internal<T> type, where any type declaration Type<T> reads like a decoration around T, but what is happening in actuality is that the drop-shell surrounding T is being removed: an implicit decorator is, in essence, being removed. I also suspect that the approach described here will be easier to implement, and is certainly less of a “new concept” to understand than Internal<T>. This design also does not preclude the Internal<T> design, though it does, I think, address many of Internal<T>'s use-cases.

Another way of achieving the same ends may be to allow destructuring on a type that implements Drop to prevent drop-glue insertion. I have a hard time understanding how this approach would work when applied to enums that implement Drop, if only some of the variants have associated data.

Unresolved questions

There are implementation details that require knowing how the language intends to allow function-pointers for intrinsics.


#2

Maybe I missed it, but this one detail that I think this pre-RFC as written is glossing over is that a call to std::mem::forget would become “special”, in a way that a call to a wrapper around std::mem::forget would not.

i.e., this variant of the GeneralizedString example given would still not compile:

impl GeneralizedString {
  pub fn wrap<X>(x: X) { std::mem::forget(x); }
  pub fn into_vec(self) -> Vec<char> {
    let rval = match self {
      SmallString(ref chars, size) => unsafe {
        Vec::from_raw_buf(chars as *const char, size)
      },
      BigString(vec) => vec,
    };
    wrap(self);
    rval
  }
}

I don’t think we have other functions in the language that behave that way.

(Maybe this construct should be a macro instead, to make such distinction in behavior clear.)


#3

Thanks, @pnkfelix (and @eddyb) - I’ve updated the text to reflect that concern.


#4

This seems to be related to the desire to have a version of Drop that takes ownership of self. I think I’d prefer to let this sit and see if these two cases can be addressed in a uniform way. I’d also prefer not to tie this to the function forget. It seems like something that merits a more obvious syntax: basically something to decompose a struct into its component pieces without triggering drop on the whole. I’d almost suggest the move keyword, but that might make closures confusing since it would be a distinct use of move.

Overall, I agree this is a case that ought to be addressed at some point, but I don’t feel urgency to implement this extension just now (nor by-value self in drop). I believe both can often be worked around with swap, and I think we need to solidify the type system we have somewhat.


#5

@nikomatsakis This isn’t just about Drop taking self, it’s also about improving the ergonomics of some consuming transformation functions. I agree that Option and swap is a valid work-around. I am not yet comfortable using destructuring as a strategy for avoiding drop-glue for enums, though it probably works.

For what it’s worth, I’m also not in a particular hurry, my feeling is that any RFCs that are accepted at this point will probably only be those that address backwards- or forwards-compatibility in the language. On the other hand, I don’t like to design under pressure, so the lack of urgency on this sort of question actually makes me more comfortable discussing it now. I understand those heavily involved in the push to 1.0 won’t feel the same way, but opening the topics at least gives an asynchronous place to describe design considerations “at leisure”. Better to have ideas explored before we need them, right?

I’ll try to think more about this. Thanks


#6

I’ve come up with a scheme that allows forwarding, and stops treating forget as special (in fact, it would almost allow forget to stop being an intrinsic): an unsafe cast from T to Opaque<T>, that can be applied to any value (including partially moved values), and which inherits certain properties of the type. This won’t compile in today’s Rust, but consider this:

// opaque structure, guaranteed to live in exactly the same memory as the
// T type parameter.
struct Opaque<T> {
  // consider as though it's implemented like so:
  // force alignment to agree with type-parameter T
  aligner: [Aligner::<std::mem::min_align_of::<T>()>; 0],
  // takes up same memory as type parameter T
  data: [u8; std::mem::size_of::<T>()],
}
impl<T: Copy> Copy for Opaque<T>;

There is no access of any kind given to the inside of an Opaque<T> (at least not without going through a transmute, but that would require further thought on details). I think it might even be possible to impl Drop for Opaque<MyType>, once we get rid of the in-value drop-flag, though the utility of doing something like that is non-obvious to me (and allowing Drop on Opaque would be the behavior that requires forget to remain an intrinsic, at least if we kept current forget semantics).

Speaking of a non-intrinsic forget:

unsafe fn forget<T>(x: T) {
  // if Drop were implemented against Opaque<T>, this would
  // result in drop-glue for Opaque<T>, rather than for T.
  let _ = x as Opaque<T>;
}

In this case, the transforming function serving as a motivator for this RFC would look like this:

impl GeneralizedString {
  pub fn into_vec(self) -> Vec<char> {
    let rval = match self {
      SmallString(ref chars, size) => unsafe {
        Vec::from_raw_buf(chars as *const char, size)
      },
      BigString(vec) => vec,
    };
    unsafe { std::mem::drop(self as Opaque<_>) };
    rval
  }
}

Which resolves the issue of making forget's behavior special. (In fact, it makes forget less special, since it would no longer need to be an intrinsic… though, of course, at the expense of adding a new lang-item.)


#7

I’d like to point out here that I’m abandoning this approach, for now. I’ve been convinced that treating the argument to forget specially is a mistake, and while an Opaque<T> would resolve that issue, there doesn’t seem to be another immediate-term use for the type, so it seems like too big a hammer to wield for just this one use.

On the other side, I’ve come to appreciate @eddyb’s Interior<T> design in a new way (someone can correct me if I get this wrong, but I’d now characterize Interior as an “open newtype” (IOW a newtype that provides direct access to the base-type’s structure), instead of as a way of “peeking into” an existing value by move). I’ve been convinced that this is a better basic approach than the others I’ve investigated.