Continuing the discussion from Async/Await series:
I’ve been thinking about the questions posed in @withoutboats’ conclusion, of how stable async functions will work. More specifically, how they might handle resumption into nested call stacks. Pin
offers an interesting change we could make there that, I think, would give us a more solid conceptual framework (continuations) and a faster and more straightforward implementation (directly resuming nested futures).
Here’s a simple example using strawman syntax, which I will refer back to:
// spawn a task running f
tokio::run(async || {
println!("{:?}", f("a"));
});
// await three callee async fns
async fn f(s: &str) -> io::Result<T> {
let a = fetch(&format!("https://example.com/{:?}.txt", s)?;
let b = fetch(&format!("https://example.com/{:?}.txt", a))?;
let c = fetch(&format!("https://example.com/{:?}.txt", b))?;
let r = process(a, b, c)?;
Ok(r)
}
// some async fn that presumably suspends several times
async fn fetch(url: &str) -> io::Result<String> { /* ... */ }
The current design, as I understand it, has the executor hold a single allocation for the outer-most generator (the closure passed to run
), which contains the inner generators. Each call to poll
walks into the inner-most generator and then back out. For example, on the first invocation:
- The closure calls
f
,f
callsfetch
. -
fetch
starts the first request and gives theWaker
to the event loop. -
fetch
saves its state and returns;f
saves its state as "awaitingfetch
" and returns; the closure saves its state as "awaitingf
" and returns.
Starting at the outer-most future every time a task resumes may be useful in some cases, but it is unfortunate for these typical straight-line async fn
s:
- The entire call stack is (de)serialized for every call to
poll
, including frames that won’t be used that time. - Every suspension point is a loop in the generator, making the resulting binary larger.
One solution comes from framing this in terms of “continuations.” A continuation represents “the rest of” some suspended program. You invoke it and the program resumes directly from the point it was suspended. A program can be transformed into continuation-passing style (CPS), usually by a compiler, so that functions never return, and instead invoke their caller’s continuation explicitly.
Using continuations, instead of resuming generators from the outside in, the task resumes directly into the inner-most generator, because that’s the continuation from the last suspension. When that generator suspends again, it is the only one that is touched. When it completes, it resumes its caller (via a tail call) because that’s its continuation.
I encountered this fusion of generators and continuation-passing style in Kotlin, where it’s enabled by a slightly different model for “coroutines,” basically a generalization of Rust “generators.” Using those terms:
- Like a generator, a coroutine is compiled into an anonymous type’s
resume
method, which dispatches on its current state and returns at suspension points. - Unlike a generator, a coroutine is initialized with a reference to its caller coroutine, which it saves until it completes, when it resumes it.
- Unlike a generator, suspending a coroutine is done with a call/cc-like intrinsic. Call/cc takes a block, runs it with the current continuation as an argument, and then suspends the coroutine, yielding the value of the block. On resume, call/cc appears to return the passed-in value. This has two uses:
- To yield, the block saves the continuation and returns the yielded value.
- To await another coroutine, the block initializes the callee with the continuation and resumes it via a tail call.
To make this work in Rust, we need a few things:
One is a way to tell the executor where to resume, because now all the resume points are after the calls to the awaited functions, which are no longer resumed in loops. This could perhaps be done by including that information with the Waker
on suspend; or else by storing it in the outer-most generator somehow.
Another is a way for the outer-most generator to complete, since it will have no generator to resume. This might be done with a stub implementation of the continuation interface, that bridges the two worlds by immediately returning to its resumer. There may be a better way, or perhaps some optimizations that can be applied in some situations.
The last thing is a way for callee generators to store their caller’s continuations, as a replacement for the current (de)serialization between a) the caller generators’ state in the loop that resumes the callee and b) the return address and self: Pin<Self>
saved on the call stack. This is where Pin
comes in! The callee generator can now store a pointer to its caller’s state block, for use when it completes and tail-calls into its caller.
This approach does have a few downsides. Of course, it’s still relatively vague as far as actual types and signatures go, so some problems may turn up there. Here is what I’ve thought about so far:
One issue is that generator completion involves dynamic dispatch. This is not actually a performance problem: it’s the same as returning from a function (minus any call/ret prediction the CPU normally does), and with MIR inlining before state machine generation it can be optimized the same way. It is perhaps a type problem, unless it can all be kept post-type-checking somehow.
A bigger issue is that it makes all generators immovable, as opposed to just the ones that keep references to local variables across suspension points. First, because callees have pointers to their callers, which are in the same allocation. Second, because Waker
s are accompanied by references to the generators. This is unfortunate because AIUI avoiding that was part of the reason for that design to begin with.