Pre-RFC: Leave auto-trait for reliable destruction

EDIT: Proposal Updated - See Updates

Summary of updates here:

I'm leaving the rest of this post as it is for posterity, so please see the update.

TL;DR

Define a new auto-trait Leave, which means that the type can be implicitly destroyed when it goes out of scope. Users can declare must-destroy types by through impl !Leave for Foo {}, which are statically guaranteed to be destroyed before their lifetime's upper bound is reached.

This super-charges the RAII pattern, which opens up static, zero-cost programming patterns that rely on deterministic destruction, such as:

  • Scoped tasks and threads, i.e. concurrency with static borrowing
  • State machines that run to completion, notably async functions & expressions, alleviating the need for async drop

Prior art

History

Today, implementors must assume that any type can be leaked -- in other words, that a type can live after its upper lifetime bound. Here's my take on how we got here:

  • Nobody thought really hard about leaking, so mem::forget was unsafe, just in case.
  • Rc, Arc and a few other std APIs had leaks, in safe Rust, leaving two choices:
    • Prevent some leaks with language features and std API changes.
    • Simply allow leaks generally.
  • As a response, leaks were formally allowed, and mem::forget became safe.
  • It was discovered that scoped threads could result in unsafe code by leaking a JoinHandle.
    • (Note: Scoped threads are powerful because they can borrow locals from their logical parent thread.)
  • As a response, scoped threads were removed from std.
  • Scoped threads was re-implemented outside std (using an extra closure) in crossbeam etc.
    • The approach is much more limiting - no join handle is available for the scope.
  • Async was introduced to Rust.
  • The async equivalent of scoped threads, scoped tasks, can't use the closure-strategy without blocking.
  • Async code is forced to workaround by relying on Arc
  • (Mostly unrelated): "Simply drop a future to cancel it" causes silent logical bugs - hard to reason about

Why now?

Leaks were relatively harmless as long as data was leaked. It just consumes memory - it doesn't kill your dog. However, this view broke down when behavior was leaked - which is represented by task- and join handles. As a result, we can't statically borrow locals from logical parent tasks or threads. We work around this problem with Arc, which has three problems: it's not zero-cost, it is noisy, and it can itself cause leaks and non-determinism around destruction. As multi-threaded executors - like Tokio - became a dominant force, this anti-pattern is growing.

It's not just concurrency - we see verbose, defensive patterns (pre-pooping your pants) in other places too, originally discovered in DrainRange. Again, async has exacerbated this problem: futures can always be arbitrarily dropped - this is worse than it sounds. Even an async fn foo(&mut self) function can be silently interrupted - which can cause inconsistent state. While "pre-pooping" works in theory, it is verbose, easy to get wrong, and as a result -- largely unused.

Why an auto-trait?

An annotation wouldn't work, as it wouldn't be transitive. Also, this feature is on the hook for safety, so it must be integrated into the type system - as far as I can tell.

"But it's impossible to prevent leaks!"

It's true that we can't prove the absence of memory leaks; scopes may never exit, reference cycles, etc. However, there are many cases where it's possible to prove that a value is destroyed before some other piece of code runs. That other piece of code can be written without having to defensively assume that the first piece of code (the destructor) may not have run. As such, this proposal is about control flow, not leaks.

Proposal

Define a new auto-trait Leave, which means that the type can be implicitly destroyed by the compiler.

  • All types are Leave by default.
  • Leave is a super-trait of Drop
  • A !Leave type is called a must-destroy type. Opt-in using impl !Leave for Foo {}
  • A type which owns a !Leave type is also !Leave, transitively
  • Must-destroy types must be destroyed before going out of scope, at every exit point (return, break, ?, etc)
  • Destroying a must-destroy type is done through destructuring, generally through fn foo(self)
  • Generic code, e.g. containers, can opt-in to support must-destroy types using T: ?Leave.
    • Arc, Rc and mem::forget do not support ?Leave
    • Vectors, maps, combinators can support ?Leave, but not all methods (e.g. clear)
  • References cannot be destroyed, so they are always Leave
  • Panicking with a live must-destroy type causes abort

Implementation cost

Whenever the compiler inserts drop automatically today, it would throw an error if the type is !Leave. When unwinding during a panic, a live !Leave type causes an immediate abort.

Since Leave is implemented by default, this is a backwards compatible change. All existing drop-semantics etc stay the same.

That said, an auto-trait is a huge deal. Libraries, including std, futures, would need to add ?Leave support to several collections, combinators, etc. This is a lot of work, which I believe to be worthwhile. Support can be rolled out incrementally and phased out behind feature flags.

Mental model

A must-destroy type can be thought of as a shift of destruction responsibility: our existing (Leave) types can be explicitly or implicitly destroyed and - worst case - not destroyed at all. A must-destroy type, on the other hand, is always explicitly destroyed. Users and library authors should implement !Leave when deterministic destruction is required for safety and/or fundamental invariants.

Examples

Here's a basic must-destroy type:

// Resource could be a handle of sorts, e.g. to a kernel resource
struct Foo(Resource);

// Declare non-destroy. (Ergonomic derive-macro could be added?)
impl !Leave for Foo {}

impl Foo {
  // A simple destructor.
  fn destroy(self) {
    let Self(res) = self;
    // Ensure some invariant on the underlying resource
  }

  // Custom destructor, with bells and whistles!
  fn fancy_destroy(self, b: bool) -> u64 { .. }

  // Async destructor
  async fn async_destroy(self) { .. }
}

Basic control flow:

{
  let foo = Foo::new();
  // Error: foo must be manually destroyed
  let foo2 = Foo::new();
  foo2 = Foo::new(); // Error: Re-assignment without destroying old value
}
{
  let mut foo = Foo::new();
  let _ = foo.fancy_destroy(true); // OK
}
{
  // Vec with must-destroy support
  let v = vec![Foo::new(), Foo::new()];
  let f = v.pop(); // OK: Moved element out of vec
  v[0] = Foo::new(); // Error: overwriting without destroying existing value
  for e in v { // OK: All values are returned from the iterator
    e.destroy(); // OK
    if roll_dice() { break } // Error: breaking leaves elements in the iterator
  }
}

Some functions:

fn bar() {
  let mut foo = Foo::new();
  if roll_dice() {
    foo.fancy_destroy(true);
    return; // OK: foo destroyed in exit point 1
  }
  foo.destroy(); // OK: foo destroyed in exit point 2
}

fn baz() -> Result<()> {
  let mut foo = Foo;
  can_fail()?; // Error: Exit point without destroy
  foo.destroy();
}

We can bring back scoped threads:

fn bar() {
  let v = String::new("world");
  // The handle is must-destroy
  let handle = thread::scoped(|| {
    println!("hello, {}"); // Borrow from outer scope
  });
  handle.join();
}

If executors and combinators add ?Leave support, we can do neat things in async:

// This future uses a must-destroy type, so it can't be arbitrarily dropped
async fn bar() {
  let foo = Foo::new();
  let a = 32;
  foo.async_destroy().await; // Async destruction
}

struct Bar(bool, bool);
impl !Leave for Bar {}

impl Bar {
  // Future becomes must-destroy, since self is must-destroy
  async fn bar(&mut self) {
    self.0 = true;
    sleep(1).await; // Can't be interrupted here
    self.1 = true;
  }
 fn destroy(self) { .. }
}

// Scoped tasks
async fn baz() {
  let v = String::new("world");
  let task = Task::scoped(|| async {
    println!("hello, {}"); // Borrow from outer scope
  });
  task.join().await;
}

Issues

  • Multiple exit points, from ?, break, match and other branchings, makes it unergonomic to use must-destroy types. This can be worked around using a wrapper closure. A defer-statement could help with this:
    let mut foo = Foo::new();
    defer foo.destroy(); // Runs before every exit point
    can_fail()?; // OK
    

Unresolved questions

  • Is the Leave trait declared unsafe (even if negative impls are safe to add)?
  • Can must-destroy types recover from a panic? If so, how?
  • Unsafe escape hatch to force-destroy a must-destroy type?

Variations

  • "ImplicitDrop" (does not super-trait Drop): !ImplicitDrop types can implement drop, and it's invoked during unwinding, but never implicitly.
    • Pro: Customizable - opportunity for panic recovery.
    • Con: Overloading - ?ImplicitDrop containers can't specialize their !ImplicitDrop impl.
  • "Leak" (does not super-trait Drop): !Leak can implement drop - implicit drops are OK, but cannot be ref-counted, forgotten, etc.
    • Pro: Least invasive.
    • Con: Prevents the "async destruction" use-case (see above).

Feedback requested

I know this is a large change. To me, it appears worthwhile, but I don't expect everyone to agree. At the moment, I'm focusing less on the "is this truly worth it?" and more on the "let's make the best possible proposal". (Don't worry if you're not fond of this direction, there will be plenty of time to argue against the proposal when it's polished). In particular, I'm looking for:

  • A more flexible panic/unwind story than abort
  • Holes, mistakes or missing considerations & drawbacks in this proposal.
  • Alternative approaches for the problems listed above.
9 Likes

The fact that every function call can potentially unwind (panic!) makes this more difficult. Effectively, if you have a !Leave type, the only thing you can do is 1) destructure it, 2) pass it by value, or 3) defer one of the other options.

3 Likes

Indeed. Added panic/unwind semantics to "feedback requested" section.

FWIW, crossbeam's scoped threads block until all of the scoped threads are terminated - I assume for safety reasons. Such behavior would be possible with the ImplicitDrop variant of this proposal -- perhaps that's a better direction. I'd like to understand the panic recovery use cases better; pointers appreciated.

I like to think about this visually so it hopefully gives more of an intuition. Looking at the control-flow graph of a function, like the following.. (block bodies are MIR and not important)

we can see every function call (like json::encode above) has two possible outgoing edges, one if the function returns and an unwind edge if it panics. Notably, this is also true of destructors.

In the unwind path, implicit destructors are run, and unwinding continues up the stack.

We would therefore need something like defer which "submits" explicit destructor code to run in both the graceful return and unwind paths, to ensure that the value is always explicitly destroyed.

5 Likes

One implication being that in this proposal, passing a !Leave value to Vec::push wouldn't work: the allocator could panic, causing push to drop your value on the floor, since it doesn't know how to destructure it.

A Vec::try_push API would work, because it returns a Result that hands you back your value in the Err case. This makes it your responsibility to handle the Err case and destruct the value yourself.

Of course, the compiler must know that no functions called by try_push can panic, or we could create the same situation as before! So there must be an assertion somewhere, either an unsafe one in try_push or on the signature of the functions it calls, that they won't panic.

2 Likes

This feels like the main foundational sticking point in the approach. It really scares me if using this would mean needing to make a new "can panic" effect that everything would need to be generic over, and it would even mean things like not being able to even use addition when one of these types in scope (as that can panic in debug mode).

8 Likes

Thank you for this proposal!

About the vec example:

for e in v { // OK: All values are returned from the iterator
    e.destroy(); // OK
    if roll_dice() { break } // Error: breaking leaves elements in the iterator
}

How does the compiler know that all values are returned from the iterator? After all, Iterator::next can call arbitrary code. It is my impression that the iterator would have to be destroyed explicitly, whether or not it contains elements – unless std::slice::Iter implements Leave for must-destroy types and performs a run-time check that the iterator is empty when dropped.

The other problem is panics. It seems to me that enforcing that a section of code can't panic is very difficult.

One solution is to always use defer statements, which are run even in the case of a panic.

Another solution is to have a #[no_panic] attribute for functions that mustn't panic and are very limited in functionality (no arithmetic except saturating_*/checked_*/wrapping_* functions, no allocations, ...).

It seems to me that enforcing that a section of code can't panic is very difficult.

Yes, that seems more complicated than necessary. I'm not suggesting that - apologies for the noise. Let me clarify: If a panic unwind occurs, and there is a live must-destroy type on the stack, the panic handler can:

  1. Abort the process (my initial proposal)
  2. Call drop (the ImplicitDrop and Leak variations)
  3. New: Add another default destructor for must-destroy types, such as destroy(&mut self)
  4. Some form of deferred expression, although I'm not suggesting this - my defer example was intended for normal (non-unwind) control flow.

How does the compiler know that all values are returned from the iterator? After all, Iterator::next can call arbitrary code.

Great point. Future will have the same problem when destructed after Poll::Ready(..). It must be safely destroyed using a runtime invariant check, through a destructor. Possibly, such a destructor could be the same as for panics, i.e. (2) or (3) from the paragraph above.

1 Like

Thanks for taking the time to write this up!

As already discussed on Zulip, I agree that especially the async cancellation situation is tricky and might lead to unexpected errors, and I'm looking forward to improve on this. However I'm not 100% convinced with whether it's worthwhile to generalize this more.

I think for the synchronous function world, this RFC might have not that strong of a motivation. The main one from the text I can find is that the extra scope for crossbeam would not be necessary - but I think this is a minor cosmetic issue, and it might not justify adding more complexity into the language. If there are other stronger motivators, I would recommend adding them. Future-proofing for something isn't the best motivator, because more often than not we find out that things which were added for future-proofing aren't the things we need later on.

Questions

  • How does the compiler know which function destroys the object? Apparently all of destroy, fancy_destroy and async_destroy do it. Would there need to be an attribute (like [#destructor]) on the method? If yes, how much is this different from directly calling drop/async_drop, and just requiring this must be used?
  • How do you get a !Leave future out of an async fn which doesn't take self? Would it be based on any parameter?
  • What is the panic behavior? This had more or less been pointed out in other questions, but it came up for me too.

Usage concerns

I'm a bit worried this adds another marker trait to the language. The experience with my team (> 20 developers, non Rust-experts) is that those are hard to follow for people who are not that much invested into the language (or in advanced programming languages in general). Send/Sync make sense to most after a few explanations. But e.g. Unpin and UnwindSafe are something that most users really don't want to be concerned with - not even in error messages. It's currently unclear to me how much forwarding of ?Leave or !Leave bounds this change requires.

In addition to this I think negative trait impls are currently also not stable (but my knowledge might be outdated). So I think this needs to also work with an inversion like Unpin/PhantomPinned - which makes things even trickier.

I also share the concern that this might be hard to use within loops, iterators and other control flow. I fought plenty of battles with the compiler, trying to tell it that I really re-populated variables after I used them in the last loop iteration or doing other things that the borrow-checker didn't really understand.

Overall

Overall I'm feeling this is an attempt to generalize over a couple of challenges and doing more clever static analysis via traits. However I'm feeling like it might lean a bit too much into the "generalization" area, and might therefore be not very accessible to a large set of users. And as others pointed out, this generalization might also require additional generalization work (e.g. the panic "effect" that @scottmcm pointed out), and therefore require a substantial deep-dive to understand how everything fits together. Before doing this, it might be worthwhile to figure out which user-problem this solves which can't be solved another way.

For the "unexpected async cancellation" challenge, there is the proposal of introducing async fns which run to completion, which level-sets them with synchronous functions. For synchronous as well as asynchronous code which must run - even if panics occur - defer blocks might be an interesting solution. While they won't provide the strong guarantees that static analysis via trait checking can provide, they seem to be well understood by most users.

For other challenges - we should identify them :slight_smile:

3 Likes

One example where this could be useful is the guard pattern. An API designer could implement !Leave for an RAII guard, to force the user to destruct the guard manually. This can prevent bugs where a guard is accidentally dropped too late.

It doesn't need to know. A function is a destructor if it takes self by value and doesn't return it. However, there's no compiler magic involved, a destructor just takes ownership of the value and destructures it. This prevents the Drop impl from being called.

I guess that you'd have to hold a value that implements !Leave in the async function across an .await point.

I don't think this is a problem. The standard library can simply provide a ZST that implements !Leave, which can be added to other structs to opt out of Leave.

1 Like

Is this a new mechanism you are proposing? Because it is not currently possible to use destructuring to destroy a Drop type without its Drop impl being invoked.

I wasn't aware of that, my bad. That means that a !Leave type can't implement Drop.

1 Like

Thanks for the detailed feedback @Matthias247.

Details

How does the compiler know which function destroys the object?

Through destructuring the type into its constituent parts (i.e. let Self { c1, c2 } = foo). Most commonly, through a method that takes self. No annotations are needed.

How do you get a !Leave future out of an async fn which doesn't take self ? Would it be based on any parameter?

Either drive it to completion locally or return the future back to the caller. Please give an example if I misunderstood your question.

What is the panic behavior? This had more or less been pointed out in other questions, but it came up for me too.

Explained here. The discussions have shown we need to tweak Leave or break it up.

General

Overall I'm feeling this is an attempt to generalize over a couple of challenges and doing more clever static analysis via traits. However I'm feeling like it might lean a bit too much into the "generalization" area, and might therefore be not very accessible to a large set of users.

There's a big difference between just "a new auto-trait" and "how invasive is the new trait". Unpin turned out quite invasive, and this is problematic whereas Sized is less invasive. I am only happy if we achieve minimal mental overhead, have sane defaults and actionable compiler errors.

The main one from the text I can find is that the extra scope for crossbeam would not be necessary - but I think this is a minor cosmetic issue [...]

The extra closure is needed because the desired logic cannot be safely expressed in Rust's type system (due to leaks being allowed). This is not jost a cosmetic limitiation, but a type system limitiation, which is why scoped tasks (in async) can't be implemented today. Given the trajectory of async, not having static borrowing across tasks is a very real issue, with problematic workarounds: "Arc everywhere" are not zero-cost, causes boilerplate noise and has correctness pitfalls.

I'm a bit worried this adds another marker trait to the language.

Yes - don't worry, the "is it really worth it"-discussion is unavoidable - independent of which proposal gets widespread scrutiny. My goal here is making this proposal as good as possible before that. Note that nobody, not even me, wants to merge this into Rust at this point. That's why I started this thread.

I also share the concern that this might be hard to use within loops, iterators and other control flow.

Yes, we need to see how it "feels" in real world code. Fortunately, we do have some precedent - uninitialized variables is the spiritual inverse of this feature.

For the "unexpected async cancellation" challenge, there is the proposal of introducing async fn s which run to completion.

Yes, and here's the link. This proposal deserves its own review. Very briefly, my perspective is:

  • Some futures are state machines.
  • Some state machines should not be dropped prior to reaching a terminal state.
  • In non-panic cases, a state machine that is not driven to its terminal state correctly should be able to cause compiler errors (much like must_use does - but stronger), requiring the attention of the programmer.

This proposal assumes that this is a general problem which deserves a general solution. It is not clear to me that a new future type with an unsafe interface that needs a parallel eco-system of combinators is a less invasive change, but I'm not ruling it out either. I think more prototyping is needed, especially this proposal.

1 Like

Updated proposal

Due to your feedback, the original proposal needs to be changed. Originally, I stated that Leave is a super-trait of Drop. This meant that no destructor was available for a !Leave type, even in the panic case, which in turn makes unwinding hard to do.

Additionally, I realize I was trying to solve two related, but separate, problems:

  1. Some state machine types (like certain Futures) should not be arbitrarily dropped under normal non-panic control flow. Instead, they should be driven to their terminal state.
  2. Types that cannot leak, i.e. live past the lifetime that rustc expects them to have.

Both of these could be auto-traits, and considered independently. These traits could be merged to a single trait which upholds the guarantees of both, but for the sake of discussion and separation of concerns I present them separately.

ImplicitDrop auto-trait

An !ImplicitDrop instance cannot be dropped regularly by going out of scope or through std::mem::drop(..). This forces the user to destruct the instance gracefully on all code paths, or optionally force-destroying the value.

Purpose: Prevent accidentally breaking invariants of state machine types.

How to destruct an !ImplicitDrop type:

  • Graceful: Destructure the type (typically through a fn foo(self, ..) function).
  • Forceful: Add an "explicit/force drop", e.g. std::mem::destroy<T: ?ImplicitDrop>(t: T) function:
    • destroy invokes drop as usual but suppresses the error.
    • destroy is safe, but should be considered potentially invariant breaking. It can be invoked by a caller that has driven the instance to its terminal state (e.g.when an iterator is consumed or a future is ready)
    • destroy is invoked on the panic-unwind path, just like a regular drop.

Use cases: Futures, iterators and custom state machines that should be driven to completion before dropped can force the user to consider all exit points. Generic containers offer ?ImplicitDrop impls if they can uphold this contract transitively.

Alternatives: There's some wiggle room for considering a lint instead of an error here. The problem with lints is transitivity in nested types and so on.

UPDATE: I was not aware that destructuring a type currently invokes Drop. (Thanks @mbrubeck for pointing it out). We need to find a better definition of "accidentally permaturely dropping".

Leak auto-trait

Very simple: it is unsafe to leak a !Leak type. As such, not implemented by Arc and Rc, and other generic containers that can leak.

Leaking is defined as outliving the input lifetime.

Immediate use case: scoped threads and async tasks.

1 Like

For a similar reason, I think traits object types must have an implicit Leave bound by default, unless explicitly overriden with dyn Trait + ?Leave.

Otherwise, a type like Box<dyn Read> could own a !Leave type. All existing trait object types would become !Leave, breaking existing code.

1 Like

@mbrubeck Yes, that sounds completely right (although I just updated the proposal and no longer using Leave). I thought this was inferred by the compiler for auto-traits, but maybe I'm wrong?

Could you please give or point to an example of code that does not compile because a certain type involved has got the new !Leak implemented for it?

So far it seems as if embedding PhatomData<&'a u8> into a type would be sufficient to not allow it to bubble up above the stack frame associated with lifetime 'a.. This brief definition is probably obvious to people familiar with scoped threads but on its own it looks quite opaque..

Update: .. what is input lifetime?

Sure, for instance:

struct Foo(u64);
impl !Leak for Foo {}

let foo = Foo(42);
let _ = Rc::new(foo); // Error: Rc<T> not implemented for T: ?Leak
mem::forget(foo); // Error: mem::forget(t) not implemented for T: ?Leak

As you can see, there's nothing magical - the implementation of Rc, Arc, mem::forget etc can simply not uphold the contract that is required by !Leak, so they won't let you. Note that they rely on unsafe code in order to cause leaks in the first place - it can't happen by accident in safe Rust.

Not so for Rc and Arc. Even if you set an input lifetime to 'a (say until end of the stack frame), you can still insert it into an Rc or Arc. Combined with a cycle, you can end up with a reference that points to non-valid data, but it's no longer reachable, so it is currently considered safe. (See safe_forget example)

When the whole scoped threads debate happened in 2015 one proposal was to restrict Arc and Rc to have T: 'static which would prevent any non-static references. It was rejected because it would constitute a breaking change. I personally think this was an unfortunate decision - it's why we're in this situation today. However, this was in 2015 and it also doesn't matter what I think, so here we are and I'm hoping we can fix it :slight_smile:

Update: .. what is input lifetime?

I may be using this term wrong (defined here) but what I mean is this:

struct Foo<'a>(&'a u64);
struct Bar;

{
  let n = 12; // 'a begins
  let foo = Foo(&n);
  Rc::new(foo); // input lifetime is 'a
  // create some cycle here..
  let bar = Bar;
  Rc::new(bar); // input lifetime is 'static for types that don't have references
} // 'a ends
Rc::new(Foo)

Note that you could create a safe version of Rc<T: ?Leak> by adding the bound T: 'static.

Also, Leak is slightly misleading, since we're not interested in preventing memory leaks per se, but rather preventing references that outlive a certain lifetime (even when those references are no longer reachable in single-threaded code).

What worries me is that Leak and ImplicitDrop would have to be automatically added to all trait bounds (like Sized) to not break existing code.

So a lot of APIs would have to be modified to support !Leak or !ImplicitDrop types. For example, struct Box<T: ?Sized> would have to be changed to struct Box<T: ?Sized + ?Leak + ?ImplicitDrop>.

3 Likes

What worries me is that Leak and ImplicitDrop would have to be automatically added to all trait bounds (like Sized ) to not break existing code.

That's the case for auto-traits in general though, like UnwindSafe, Unpin, etc. And these traits can be considered separately – Leak delivers more value in the short term, and its contract is trivial.

Btw, the async-scoped crate illustrates perfectly the situation with scoped tasks in async:

  • The scope_and_block API uses the "crossbeam trick", of blocking the calling thread (i.e. should not be used in an async context), which provides safety but consumes an entire thread for each scope.
  • The scope_and_collect API does not, but it is exposed as unsafe (see Reddit discussion).

From the docs:

However, the user should ensure that the returned future is not forgotten [i.e. leaked] before being driven to completion.

A Leak trait would instead allow the compiler to enforce this guarantee.

2 Likes