Asynchronous Destructors

Sure, that's similar to what withoutboats' original blog post said. But what problems does this cause? The tradeoff to me looks like it's:

  • Add poll_drop to the Drop trait
  • Potentially, give Box async destructor support

versus the naive version

  • Add a separate AsyncDrop trait (using an associated type, so it can be implemented today)
  • Accept that Box will not have async destructor support (because there is no nice way to implement the async destructor of Box)

To me, this latter option seems worth considering. People can still implement various types of Box-with-async-destructor, for example using a poll_drop supertrait if they want - this decision would just be made by applications rather than being baked into Drop.

The challenge is in making the various forms of async destructor useful, not just easy to write.

First, whatever we do here has to be implemented in the compiler; in turn, that means that we expect it to work in a no_alloc environment, as not all Rust targets have an allocator. This makes AsyncDrop difficult - the drop glue has to allocate somehow, but we've just said it can't expect an allocator. The way round this is to add more magic allocations to every type that might have an async destructor, but that then means that you write pub struct Wrapped<T> { hidden: T }, and Wrapped is no longer the same size as a T at runtime, but some unpredictable amount larger to allow for drop glue.

poll_drop avoids this by saying that if you want an async fn drop_async(self: Pin<&mut Self>); in your type, you have to explicitly allow space for it - e.g. your struct might have to contain a async_drop: future::Fuse<…> for the destructor, and then poll_drop can be implemented as:

fn poll_drop(self: Pin<&mut Self>, cx: &mut Context) {
    self.async_drop.poll(cx)
}

So, that means that poll_drop is strictly more flexible than AsyncDrop - it can do everything that AsyncDrop can, just requires you to be explicit about storage, plus it allows you to write stateless poll_drop code (e.g. that just forwards to a subfield, for Box, Vec etc).

Second, poll_drop_ready is more useful to the implementor than poll_drop, because the guarantees are simpler to express. Consider the drop glue for both (in pseudo-Rust):

// Using `poll_drop`
// This function called repeatedly until it returns Ready, as per a normal async function
// cx is Some if in async context, None if not
magic fn drop_glue<T>(drop_me: Pin<&mut T>, cx: Option(&mut Context)) -> Poll<()> {
    let async_drop_res = match cx {
        None => { drop_me.drop(); Poll::Ready(()) }
        Some(cx) => drop_me.poll_drop(cx),
    }
    if let Some(Ready) = async_drop_res {
        recurse_drop_glue_members(drop_me, cx) // Defined as Poll::Ready(()) iff `drop_me` has no members, else runs this function on all members
    } else {
        async_drop_res
    }
}
// Using `poll_drop_ready`
// This function called repeatedly until it returns Ready, as per a normal async function
// cx is Some if in async context, None if not
magic fn drop_glue<T>(drop_me: Pin<&mut T>, cx: Option(&mut Context)) -> Poll<()> {
    let async_drop_res = match cx {
        None => Poll::Ready(()),
        Some(cx) => drop_me.poll_drop_ready(cx),
    };
    if let Some(Ready) = async_drop_res {
        drop_me.drop();
        recurse_drop_glue_members(drop_me, cx) // Defined as Poll::Ready(()) iff `drop_me` has no members, else runs this function on all members
    }
    async_drop_res
}

Yes, the glue code is marginally more complex in the latter case, as it runs two user-provided functions in an async context, not just one - however, the user of poll_drop_ready gets a guarantee that drop will also be called, not just poll_drop_ready. This then means that, as the user, you only need implement the memory-safety relevant code once - in drop - and not twice - in poll_drop as async code, and in drop as sync code - which reduces errors. poll_drop_ready is a pure optimization, as it stops you from having to block waiting for a destructor to finish.

This, in turn, means two things to the user of async destructors:

  1. When using poll_drop, the drop method that's called varies according to what context you're in - for code that implements only one of the two drop methods (sync or async), we have to somehow ensure that the other one is called, and we need rules for when to write a poll_drop/AsyncDrop implementation, and when to just write drop. In contrast, the rules are simple for poll_drop_ready - drop is the code you write to ensure that nothing is leaked, poll_drop_ready makes sure that drop never blocks
  2. Non-trivial types need duplicate code between drop and AsyncDrop/poll_drop, because both destructors needs to cleanly release resources you own. poll_drop_ready is just an optimization, so all resources can be cleaned up in drop, and poll_drop_ready just handles pushing async resources to a "done" state.

For example, take a process handling phone calls as part of a cluster of IMS servers - when it shuts down, we want to hand off all the active calls to another server, so that we don't drop calls on a normal restart. This implies that drop already has to hand all calls over, synchronously, to another server, and then drop all in-memory structures that represent those calls. In the poll_drop/AsyncDrop case (since the distinction between the two is just in whether the compiler allocates space for the drop future, or the user does), both chunks of work have to be repeated as part of the poll_drop work; in the poll_drop_ready case, poll_drop_ready has to asynchronously hand over all calls, but does not have to handle dropping the internal state (which cannot be handled asynchronously - there's no blocking involved here) once the calls are handed over, because drop will be called anyway.

TL;DR: poll_drop and AsyncDrop are equivalent in power, modulo who allocates storage for the Future state machine. poll_drop wins on that, because it makes the storage for the state machine explicit, rather than a compiler-generated allocation (which can't be done in a no_alloc world). poll_drop_ready is simpler to explain and involves less duplication since the glue code still calls sync drop, and it's clear what belongs to drop (everything), and what belongs to poll_drop_ready (any work that has to be done so that drop is non-blocking in async terms).

6 Likes

The drop glue is implemented by async syntax, which would reserve space in the Future struct that it is building. It's like awaiting a certain async function at the end of your async block. The relevant "destructor state problem" as stated in the original post is "dropping trait objects" - but you don't drop a trait object directly. You might drop a Box<dyn T>, but not in a no_alloc environment! I am suggesting that in the naive AsyncDrop approach it is natural to bite the bullet and say that dropping a Box<dyn T> in an async block would not call the async destructor.

I would also point that adding an extra virtual function call when you drop a Box<dyn T> in an async block seems significantly non-zero-cost. The original post doesn't discuss whether Box would call async destructors, so I might be attacking a strawman here - but then what is the concern with dropping trait objects?

I don't see how futures could be harder to use - it should just be like writing "my_object.async_drop().await" at the end of your async block (roughly speaking). If recursive async drop is wanted in the future, I don't see why it would be any more complicated than with polling. I imagine something like this:

async fn async_drop_all(x: MyStruct) {
    let MyStruct { field1, field2 } = x;
    async_drop(x);
    async_drop_all(field1);
    async_drop_all(field2);
}

Again, the compiler-generated allocation would just be part of the Future being constructed by async block - not a separate heap allocation. It might be able to save space in general, for example:

async fn f() {
    let a: SmallObjectWithComplicatedAsyncDrop = ...;
    let b: LargeObjectWithNoAsyncDrop = ...;
    a.use(b);
}

Here, the Future only needs space for the biggest of:

  • a + a's async drop future, or
  • a + b

With poll_drop, the space for a's async destructor has to be stored in a (or in a box, but that's adding a heap allocation and requires an allocator). This means the future needs space.for a+a's async drop future+b. This might be fine in practice, I don't know.


I want to clarify that I think poll_drop(_ready) makes sense as a first step. I just think the "destructor state problem" needs more nuance - a realistic "AsyncDrop" approach to compare to is:

  • A new generic AsyncDrop trait using an associated type (potentially an async trait fn when that is implemented).
  • std::boxed::Box<dyn T> doesn't call async destructors.
  • If you want an async-destructor-aware box for dyn T, for example to box futures, then you can choose a polling approach, using a new type impl<T: PollDropReady> PollDropBox<dyn T>. Here PollDropReady is an object-safe trait with fn poll_drop_ready.
  • No automatic recursive drop glue, at least at first.

The ergonomic advantage would be using async blocks in destructors without any change to the struct, though with current Rust this would generally require boxing as in async-trait.

These changes are more or less compatible with the changes suggested in the original post, except that poll_drop(_ready) goes on a new trait rather than Drop.

That then means that if I'm not using async syntax to implement my future, I have to ensure that I manually implement drop glue, which is an extra tax on developers. We can already expect the compiler to generate the right sync drop glue for a type and everything it contains, even if I manually implement drop or don't implement it at all - why should async destruction be any different?

Put differently, in the poll_drop* cases, the drop glue that's generated for any async context correctly handles putting Futures in containers. For an example that is currently allocation free, consider:

enum PossiblyStatic<F, T> where F: Future<Output = T> {
    Fut(F),
    Static(T),
}

Both F and T are stored by value inside the enum, and thus, if I want F's async destructor to run when I drop a value of this type, I need something using this type to correctly handle drop glue. However, because it's not itself a Future (at least, not as described here - you could implement Future on this), it's more ergonomic if the compiler is able to deduce that, if I trigger drop glue for one of these in an async context and F has an async destructor and the enum's current variant is Fut(F), then it needs to run F's async destructor.

Modulo the pointer-ness, this is true also of Box, Vec, Arc and other container and pointer types. It seems wrong to me that it should be OK for does_drops below to run async destructors on its arguments, while does_sync_drop doesn't. This is the sort of refactor that i would expect to not have a significant impact on blocking, and yet itt does - by moving the Futures into a Vec, I've ensured that any blocking in thing2's destructor suddenly blocks the entire execution thread it's using, instead of just the task.

async fn does_drops(thing1: F,
                    thing2: F) -> u32 
where F: Future<Output=u32>
{
    thing1.await
}
async fn does_sync_drop(thing1: F,
                        thing2: F) -> u32 
where F: Future<Output=u32>
{
    let v = vec![thing1, thing2];
    handle_vec(v)
}
fn choose_item<T>(V: Vec<T>) -> T {
    // This is a placeholder, and will be more complex in future
    v[0]
}
async fn handle_vec(v: Vec<impl Future<Output=u32>>) -> u32 {
    choose_item(v).await
}

While I expect does_sync_drop to be marginally slower - it's handling a heap-allocated container, after all, this isn't an unreasonable refactor to do if I'm going from small N to variable N, and having unexpected I/O latencies caused by thing2's destructor is a pain. The alternative is to have async versions of all the common containers and smart pointers, and to accept that I can't use the normal versions in async context, because they introduce surprise long latencies.

This argument still applies even if we make it about futures instead of polls. If I have correctness constraints that require drop, I have to write drop regardless of whether I'm doing something async or not, and it needs to ensure that correctness is held in the case where drop is run from a sync context. The difference then becomes how much duplication needs to exist between async fn drop_ready(self: Pin<&mut Self>) and fn drop(&mut self). In the async fn drop(self: Pin<&mut Self>) case, I need a copy of all parts of drop that apply in the async world in async drop. In a async drop_ready world, I only need consider the parts that could block.

So, using the completion-based I/O idea, you get:

fn drop(&mut self) {
    self.cancel_io();
    self.sync_wait_io_completion();
    // Now do any cleanup that's needed once the I/O is stopped
    // For example, drop reference counts on shared buffers
}

async fn drop_ready(self: Pin<&mut Self>} {
    self.cancel_io();
    self.await_io_completion().await
    // Leave other clean-up to `drop`
}

async fn drop(self: Pin<&mut Self>) {
    self.cancel_io();
    self.await_io_completion().await
    // Now do any cleanup that's needed once the I/O is stopped
    // For example, drop reference counts on shared buffers
}

If I refactor the object, and anything in drop changes, I have to remember to check to see if these changes need reflecting in async drop, but in async drop_ready, I only have to remember to check if I've affected a blocking call (and hopefully, the fact that I've touched a blocking call will remind me about drop_ready.

My personal guess is that drop_ready implementations will be rare and tiny, especially if there's good support for correctly threading drop glue around things like Box, Vec and other container types - you just won't need it very often. But I'm willing to be proven wrong by time.

2 Likes

I didn't mean to express a preference for async drop versus async drop_ready - I was just using the name AsyncDrop to discuss the "destructor state problem" in the original post.

I agree that making Box<dyn T> (and Rc, Arc) call async destructors automatically is convenient. But for an incremental approach convenience is not necessarily a priority. And there is the tradeoff of convenience vs zero-cost, where usually Rust would go zero-cost. Synchronous Drop is a bit different because it is the status quo; it's just the API that Box etc provide. If it is desirable for Box<dyn T> to call async drop futures, it could be supported, it's just a bit more complicated.

For code that doesn't use Box<dyn T> etc, for example no_alloc code, there's not much to do - the async drop code can just build up a big Future struct. As a size optimization, it could use polling wherever possible recursively. In a typical no_alloc encironment, as I understand it, the statically allocated LocalFutureObj would have to either always be run to completion (i.e. not be cancelled somehow by the executor), or have space reserved statically for its async drop future type.

On the other hand, the compiler-generated async destructor of Box<dyn T> (etc) could use the following vtable entry to clean up the contents of the box. The Pending value still allows polling without an allocation. The dyn_async_drop_hook is like poll_drop(_ready) but can return a boxed future instead.

#[cfg(feature = "alloc")]
enum AsyncDropResult {
    Ready,
    Pending,
    BoxedFuture(Box<dyn Future<Output=()>>),
}

#[cfg(feature = "alloc")]
// pseudocode for a compiler-generated vtable hook, similar to sync drop glue
trait DropHook {
    fn dyn_async_drop_hook(self: Pin<&mut Self>, cx: &mut Context<'_>) -> AsyncDropResult;
}

The point of this is to demonstrate it is possible to support futures and have the convenience of them being run by Box<dyn T>, without only a small cost for code that doesn't use futures: some extra branching, and extra Future space required when dropping a Box<dyn T>.

(Just to emphasize again: I agree with using polling at least as a first approach.)


For ergonomics it might be worthwhile to put forward ideas for ensuring async destructors get run as you'd expect - a basis for "docs and lints". Consider:

  1. It is easy to call functions that drop objects without calling the async destructor. For example calling vec.clear() or iterating over vec.into_iter().filter(...) where vec: Vec<T>.
  2. Some types won't call async destructors on their contents. Even with automatic compiler drop glue, there are existing types with custom layout such as in smallvec-1.0.0.

One solution is to add a new widening bound ?TrivialAsyncDestructor, to be used in generic fn definitions. This is similar to what has been suggested for undroppable/unforgettable types (maybe there is a better Pin-like approach?). In the presence of a widening bound T: ?TrivialAsyncDestructor, any x: T parameter cannot be passed to a function that does not specify ?TrivialAsyncDestructor. The standard function std::mem::drop would not have the widened bound, but std::mem::forget would. Any implicit drop would be treated like a call to std::mem:drop. Violating this rule produces an error, which can be disabled by code like an executor (which is ultimately responsible for "consuming" the ?TrivialAsyncDestructor bound).

The rule ought to ensure that an object with a non-trivial async destructor is never dropped without first being async dropped, except perhaps during unwinding from a panic. I think the semantics are something like this: an object can "own" another object, or can "async-own" another object. The latter is a stronger property; a type with custom layout might own but not async-own objects that have been passed to it. An object with a non-trivial async destructor should always be part of a tree of async-ownership, with the root nodes being the heap or stack.

This rule prevents problem 1 directly, and prevents problem 2 because you wouldn't be allowed to pass the object into SmallVec::push for example. There might be a lot of extra ?TrivialAsyncDestructor churn if people want to use lots of standard functions with object with async destructors. But it could be viable to just use it on a few Future implementation like Join - code using async destructors would be limited but reliable.

With a rule like this, it could be quite easy to disallow putting a ?TrivialAsyncDestructor object into a Box<dyn T>, which would make async destructors zero-cost while still being reliable.