Blog post: A formulation for scoped tasks

I just posted A formulation for scoped tasks - Tyler Mandry about scoped async tasks in Rust. From the introduction:

In this post, I attempt to state as precisely as possible the essential features that we might want from scoped spawn APIs and where those run into fundamental language constraints. In particular, by essential here I mean that removing the feature:

  • Makes a meaningful difference in the user experience, but
  • Also makes the problem tractable in Rust today.

My hope is that this creates a useful framework for understanding and reasoning about the problem space.

Let's use this thread for discussion of the post.

16 Likes

(NOT A CONTRIBUTION)

The language changes described as unexplored options here are non-starters for a ton of reasons, but one thing I never see discussed when people suggest the leak decision was a mistake is that futures would need to implement Leak to be spawned. When you spawn a future on a multitasking executor, it stuffs the state of that future into an Arc to form the body of the Waker that will be used to re-enqueue it. To do that safely, that future would need to be leakable, and this whole notion of unleakable types would not even be relevant.

I'm also not sure what "unsafe contract that it will be polled to completion" means in the context of passing the future to a reactor and waiting for that reactor to re-enqueue it. Even if these weren't unworkable for backwards compatibility reasons, I suspect they may not be workable period.


The way I’m imagining it, it would have to go through a couple of control layers that would make it significantly less ergonomic to use than “just capture a reference in your async block”. I’m also not sure yet if it would be sound. But it might be worth tinkering with, if only to see more clearly the limits of what the language can express

I doubt think there's a zero-overhead sound solution for borrowing with parallelism; I think in the end the only sound thing to do would be to put them in Arcs, and then they're 'static so you've just arrived back at spawn. You should explore your ideas though because I could be wrong.


I think the best solution is to distinguish between spawns that introduce parallelism by creating a task that can be stolen separately from this one and spawns that don't but don't have to be 'static. In effect users already do this when we use both spawn and FuturesUnordered. The performance footguns in FuturesUnordered should be investigated, and one should ask if something like a scoped API would be less foot gunny.

3 Likes

Need this be the case? If the future is scoped, then can't its state (and body of the Waker) reside in/be owned by the scope's stack frame, avoiding an Arc?

(NOT A CONTRIBUTION)

Then what do you put into the Waker, so that the task gets re-enqueued? The Waker needs to be 'static. Probably some sort of index to identify which task is getting reenqueued, but if the waker outlives the scope and is awoken later you're getting a runtime panic, because the waker isn't tied to that scope at all. And you need some kind of reference counting there too so that the index doesn't get reused while the waker is still alive, causing the waker to enqueue a totally different task. No one ever talks about how this aspect of the system would work in the magic world where the soundness issue was allegedly resolved by a type system change.

Similarly, the idea that you could have an unsafe poll method which has a contract "will poll to completion" is way too vague to be viable. First of all, you can't have a soundness invariant that something WILL happen, only that it WON'T happen - in extremis, an asteroid could hit the earth and then the poll will not finish. Of course what this requirement actually wants to state is that some other code will not run unless this future's poll method has already returned Ready. Actually trying to specify what code will not run until the poll returns ready and making that work with the underlying task/wake model is where I expect serious problems would re-emerge.

Maybe this would all be possible with a completely redesigned async system or completely redesigned ownership model, but we just don't know. But then because the surface level problem can be resolved by saying that the type system would just work differently, people toss it around as if its a viable alternative we just didn't pursue out of ignorance or poor design. What people don't seem to understand is that its not like we have a working alternative model waiting to go if we just broke compatibility and switched to it.

1 Like

(NOT A CONTRIBUTION)

Sorry to spam a bit but I've thought a bit more about this:

The way I’m imagining it, it would have to go through a couple of control layers that would make it significantly less ergonomic to use than “just capture a reference in your async block”. I’m also not sure yet if it would be sound. But it might be worth tinkering with, if only to see more clearly the limits of what the language can express

It's helpful to relate scoped tasks to io-uring. It's not a coincidence that people think they would both be solved by some sort of linear typing / guaranteed destructors: the soundness issue underlying them is exactly the same. In both cases, you want to share a reference with another process (in a CSP/system model sense) - in io-uring that process is the kernel, in scoped tasks that process is another thread in your program. The fundamental problem is that Rust's static lifetime analysis can't accommodate dynamic process scheduling.

However, io-uring is an instructive example, because there are 3 ways I know of to make that data sharing sound.

The simplest is just passing ownership. You can pass it back when the task is done, too (e.g. by returning it from the future or passing it back through a channel). Of course this means only the parallel task can have access to the data until it's done with it.

Then there's shared ownership. You put the data into an Arc and both tasks hold it. Now they can both access it in parallel. If they need exclusive access you use a runtime coordination primitive like a Mutex.

Finally, there's the trick used to make something like a BufReader work for io-uring. The parent task would hold ownership of the data it was sharing with the child task by reference, but use a runtime check to make sure it never accesses it again until the child returns. Note that this means the scope object on the parent task needs to own the data; even though it passes it to the child(ren) by reference. And it would also need to be at least pinned, or maybe that won't work and it must be heap allocated (the BufReader definitely just relies on the fact the buffer is known to be heap allocated to deal with dropping the owner handle).

The child(red) tasks would give back their lease on the data as their handle to it drops. To get the data back on the parent task, there would be some accessor on the scope that awaits all the childrens' handles dropping. If a child task's handle leaks, too bad the data is leaked as well and this accessor will await indefinitely.

I think this is probably basically what tmandry had in mind.

I have ran into issue recently as well and was surprised to see that this is not possible with Rust today. I'm not sure but it seems there can't be a fundamental problem with scoped async since it is possible with "normal" threads as well, and the underlying problem is the same.

The reason scoped tasks work in non-async Rust is because one can simply use the fact that linear control flow guarantees that a function will not simply stop halfway. In async Rust, due to cancellation, this can happen at any await point. One possible solution (at the language-level) that I wonder could maybe do the trick is some kind of async-compatible defer mechanism. A deferred statement would run at the end of the scope, even if the future is cancelled. This way, there's no need to rely on Drop at all and the solution is more similar to how it is done in non-async Rust. Not sure if I'm missing something, could this work?

Future cancellation is Drop; there's no difference. The guarantee which is needed for scoped async to be sound is exactly that the future isn't forgotten; cancellation/drop is perfectly acceptable.

The following is adapted from the linked blog post.

Scoped futures are fundamentally unsound so long as they can be forgotten.

The fundamental difference with scoped threads is that a thread scope blocks when you exit it.

let data = ..;
thread::scope(|s| {
    s.spawn(|| worker_thread());
    s.spawn(|| { victim(&data); });
    // here we block until all scoped threads have finished
});

The same API would actually be sound for async:

task::scope(|s| {
    s.spawn(worker_task());
    s.spawn(async { victim(&data).await; });
    // here we block until all scoped tasks have finished
});

but only if task::scope is synchronous; this is, in effect, the equivalent of writing

task::block_on(async {
    join! {
        worker_task(),
        async { victim(&data); },
    }
});

and hey, this is sound and allowed in all implementations of block_on! But blocking obviously isn't what we want, but if we return a future from scope, I can forget it:

let tasks = task::scope(|s| {
    s.spawn(worker_task());
    s.spawn(async { victim(&data).await; });
    // tasks continue running until the scope is awaited
});
// oops, I didn't await the scoped tasks
forget(tasks);
// the scope lifetime is over but the tasks are still running
// and now I can cause all sorts of UAF havock, like just
drop(data);

Well, alright, make it a macro and include .await in the macro to ensure it gets awaited. Unfortunately, we've only temporarily deferred the issue:

let tasks = Box::pin(async {
    task::scope!(|s| {
        s.spawn(worker_task());
        s.spawn(async { victim(&data).await; });
        // tasks are awaited here
    });
});
// advance `tasks` far enough to spawn the subtasks
runtime::poll_once(tasks);
// oops I forgot to drive the future to completion
forget(tasks);
// subtasks are still running, lifetime over, mayhem time
drop(data);

The only way in which spawning a scoped task can be sound is if the root future is 'static[1] (thus leaking it is also leaking any resources internal scoped tasks borrow), or if you somehow require the futures to be polled to completion and/or dropped.

Of course, this doesn't matter all that much, actually, because the real scoped task concurrency is just join!. If polls are relatively short and don't block, there's not much difference between join!(spawn(a), spawn(b), spawn(c)) and join!(Box::new(a), Box::new(b), Box::new(c)); that's the entire point of async: concurrency without spawning.

I'm somewhat tempted to implement a task::scope-looking interface around FuturesUnordered to prove a point here[3]. Yeah, it's unfortunate that a single task having an unexpectedly long poll prevents making progress on the others in the same cluster, but that's the ticket price for multiplexing many tasks on one system thread.


  1. This is actually an approach I've rarely seen mentioned, and that was in the scope of async Drop rather than scoped spawned tasks. This would be nearly as invasive to adopt as a Leak auto trait (but only for async-related code), but I do believe an unsafe trait UnsafeFuture supertrait to Future with the requirement of being polled to completion or dropped before its lifetime ends, where futures containing scoped spawns implement UnsafeFuture instead of Future, does make scoped task spawns sound, and UnsafeFuture + 'static can complete back to Future. The justification is basically that since they're 'static, I could safely record all of them that have been started and poll them again whenever, justifying the progress made on the subtasks. (This justification doesn't quite cover Pin<&'short mut UnsafeFuture + 'static>, but even if our loan expires, it's still either there or dropped, theoretically able to be polled by someone.) While you're making UnsafeFuture and getting everyone to thread it around, though, might as well also include a guarantee of async Drop[2] as well, so those use cases can also benefit. ↩︎

  2. And just a bit of a tangential thought: async Drop is fine if not doing it leaks, and it's just a resource optimization over sync Drop; the unsound issues are when Drop (whether async or not) getting executed is relied on for soundness; it's all different versions of the same issue of relying on sub-'static leak freedom. My observation here is purely that leaking specifically 'static values cannot cause any soundness problems caused by lack of leak freedom. ↩︎

  3. Oh, and this API is beneficial for even for 'static tasks which can actually be spawned, if the spawn function is provided by the async Context, since it provides a poll point to pull it out and hand it to the async world. ↩︎

1 Like

Future cancellation is Drop ; there's no difference. The guarantee which is needed for scoped async to be sound is exactly that the future isn't forgotten; cancellation/drop is perfectly acceptable.

How does that work from inside an async context? I was thinking about this from the perspective of already async code spawning an async task, for example:

async pub fn some_async_function(data: &Data) {
    let shared_guard = SharedGuard::new(); // NOTE: when dropped it will wait (blocking) for the lock
    let task = task::spawn_local(async {
        let _scope_guard = shared_guard.lock();
        victim(data).await;
    });
    task.await;
    shared_guard.lock(); // NOTE: this is blocking
}

In the above example, if the caller forgets the some_async_function future, is that unsound? Either the future some_async_function is cancelled, in which case everything is fine, or it is forgotten, in which case it will keep running and eventually reach shared_guard.lock() and the lifetime constraint is upheld. What am I missing here? :thinking:

Okay, after some thinking I realised that the above code is just as wrong. Even though it is guaranteed that some_async_function will reach block on the guard at some point assuming that it is driven to completion, the caller (or the runtime usually) could in theory forget the future without it running to completion at all. The only way to prevent that is to block in place on the task: this effectively locks out the runtime/caller and prevents them from forgetting the future, sidestepping the issue (and defeating the purpose of async tasks).

Of course, this doesn't matter all that much, actually, because the real scoped task concurrency is just join! . If polls are relatively short and don't block, there's not much difference between join!(spawn(a), spawn(b), spawn(c)) and join!(Box::new(a), Box::new(b), Box::new(c)) ; that's the entire point of async : concurrency without spawning.

I ran into the scoped future issue when building a custom Future that offloads all of its work to a single worker thread (posted on r/rust about this couple week ago here: https://www.reddit.com/r/rust/comments/11ivhxk/how_to_deal_with_scoped_closure_in_async_custom/). Since spawning work on the inner worker thread requires a static lifetime bound, I'm running into an issue similar to this scoped task problem (but not entirely the same).

I'm surprised there's no way for getting around this issue. Especially since the unsafety arises only in the pathological case. I opted for simply putting a big fat warning on my code (which is not open source for now anyway) not to forget the future.

I'm really hoping that someone will be able to come up with a solution. If only there was a way to just panic and crash the program when the pathological case comes up, that would be enough (but being able to do that seems as hard as solving the issue itself, if I understand correctly).