Pre-RFC: CPS transform for generators

hanna-kruppe · March 28, 2018, 8:16pm

That’s unfortunate because there are some serious implementation challenges (LLVM…) and some serious inherent problems (calling conventions) with the general form of become. Maybe the latter are not relevant here but if we’ll never the general source-level feature because of it, solving all the implementaton challenges just for generators is still a tough pill to swallow.

rpjohnst · March 28, 2018, 8:22pm

One angle is that, if it turns out not to be a big enough perf win to bother fighting LLVM over it, at least we’ll be compatible with a CPS implementation such that another backend or compiler could support it. That would involve keeping the calling convention for await unstable semi-permanently, but I’m not sure that’s an issue.

jmst · March 29, 2018, 3:49am

I think I found a mostly equivalent formulation of this proposal that might be simpler to understand and implement.

The idea is to add a variant of Future::poll to Future that has an additional Async::Nested return value that means “runs this other Future and then resume this one with the results”: this would return a reference to a nested Future and a pointer to a function for resumption.

When this is returned, the executor would push the reference to the Future and to the resume function on a Stack associated with the Future task, and on the next polls, it would start directly with the top of this stack.

Once a Future from the stack returns Ready, the entry is popped from the stack and the returned value is passed to the resume function.

Note that there’s the complication that Future is obviously generic in T, and nested futures can have any type: to address this, the size of T needs to be returned along with the nested future reference, and the future return value needs to be passed with an ABI only dependent on the size of T (e.g. instead of using the function return value, pass an &mut [u8] that is then transmuted to &mut RawOption (a version of Option with layout optimizations disabled) on a “GenFuture” trait that Future would depend on).

There’s also the complication that the reference to the inner Future would be an &'a mut with the same 'a lifetime of the self argument to the poll() variant, which would not be safely storable. However, the executor future task could itself be implemented with an immovable generator, which would allow those references to be stored on the generator stack with safe code. In fact, this code does not need to be tied to the executor, and can be essentially be a “Future combinator” that allows executors that only support the current Future interface to take advantage of Future stacks supporting the new interface without them being directly coupled.

If a Future does not implement this new poll variant, then there would be a default implementation that just calls the current poll() and never returns Async::Nested, giving the current behavior with full compatibility.

To support async/await, a new type of Generators that would be able to take a value when resuming, and also able to provide a function to directly call at resumption (using a CPS transform) would be introduced and the async/await macro would start using those. There’s no need for tail call optimization, since the “tail call” is essentially performed by returning the address of the function to be called and letting the executor run it.

This is slightly inefficient, so this could be addressed by passing a reference to the executor stack to the poll() variant and letting the Future itself push entries to the executor stack and then call directly the child future on the first call (this would ideally be a tail call, but a non-tail call wouldn’t hurt much either).

Furthermore, the Future themselves could potentially drive the stack themselves, which might be slightly more efficient. If the stack were to be implemented as a linked list of slots in each future rather than a central stack, then this should result in the same full CPS-based system described by @rpjohnst (however, this would be a less “idiomatic Rust” approach and would result in the stack slots being present in the layout of unexecuted Futures - on the other hand, this would save the memory allocation for the central stack).

Again, this is essentially equivalent to @rpjohnst 's idea, but I think this formulation might be interesting and also make it clearer that it should be a mostly or fully compatible change in the future if desired.

That said, not sure whether it’s really a good idea, since it’s probably going to reduce performance in simple cases with little or no nesting in exchange for gaining performance in complex cases with a lot of nesting, as well as taking up quite some effort to implement, and I’m not sure if that’s worth it.

vadimcn · March 29, 2018, 9:26am

Returning continuations is easy when you can allocate them on the heap, which is why managed languages usually implement generators this way.
However, when you try to do it without using the heap, as Rust’s zero-overhead philosophy requires, one starts quickly getting into the weeds with dynamically-sized return values, tail calls and whatnot, as you demonstrated above. I am not even sure that such stuff can be supported on all platforms (wasm comes to mind).

If performance of deeply nested futures proves to be a problem, IMO, it will be easier to add an optimization pass that consolidates them into a single state machine.

rpjohnst · March 29, 2018, 3:03pm

There are no “dynamically-sized return values” here, unless you’re referring to *mut dyn Generator, which is nothing but a typical trait object. That is hardly “in the weeds”!

As far as tail calls go, while LLVM isn’t great at providing them, that’s no reason not to go with a design that is future-compatible with them- the actual hardware does support them, LLVM does provide them in controlled circumstances, WebAssembly has a proposal for them, etc.

But more importantly, implementing this transform without them is still a net positive. The maximum amount of stack growth in that case remains the same as today- it doesn’t shrink, but it doesn’t grow either. At the same time, the typical amount of stack growth shrinks, and the number of calls+returns is reduced. Notably the JVM doesn’t do reliable tail calls either, but Kotlin uses this transform anyway.

It would be deeply unfortunate to close off the possibility of this optimization to the language, solely because it’s hard to implement in full on every target using our current backend. And on top of that, the forward-compatible design has other benefits of its own:

No need to go through Future::poll for every await, leaving futures squarely in the library space.
- This also makes debugging nicer- less library code to step through, and a reified generator-side call stack that debuggers could display.
No confusion about the meaning of calling an async function from a sync function- the transition is clearly marked.
- This resolves the concern, which has come up before, about whether async functions should have Future in their signature, in favor of leaving it out.

Nemo157 · March 29, 2018, 4:02pm

Is there any possibility of implementing

async fn join<A, B>(a: A, b: B) -> Result<(A::Item, B::Item), A::Error>
where A: Awaitable, B: Awaitable<Error=A::Error>

and

async fn select<A, B>(a: A, b: B) -> Result<something far too complex for this example>
where A: Awaitable, B: Awaitable

(equivalents of FutureExt::join or FutureExt::select)?

I’ve been trying to think of how they would work, but I think that they would require the ability to pass multiple continuations back that then need to run in parallel.

It seems to me that if these operations are still going to have to go through the continuation <-> futures bridge then there’s likely to be a relatively high number of these transitions in most use cases.

rpjohnst · March 29, 2018, 4:30pm

The key is not so much the futures bridge, but the ability to resume join or select while a and/or b are still in progress. You could implement them without futures just by re-using the standard library’s generator-driving API in their implementation.

The bridge shouldn’t be any more expensive than today’s await! macro, though, so it may be unnecessary unless you’re trying to avoid the Future trait entirely for some reason.

vadimcn · March 29, 2018, 4:40pm

I am still having difficulty picturing in my mind how your proposal would work "at the assembly level". I think you might have an easier time convincing people around here if you could provide more code to go with your explanations.

Does this mean that all futures become immovable? What about iterators? IMO, this would be rather unfortunate, especially the latter.

Well, Rust is a systems language and in the past we did get rid of goroutime-style IO with its segmented stacks, in large part because of implementation difficulties it created for using FFI, being embeddable, running on more exotic platforms, etc.

jmst · March 29, 2018, 5:27pm

I think this change can be made compatibly later if desired without any need for changes now, and with async/await code being able to automatically make use of it without no changes.

In particular, having async functions return a Future doesn’t seem to be a problem, since you can change Future to add a new method that supports the CPS style (either in the direct mode proposed originally or in the externally driven mode I outlined in my comment), and that has a default implementation that calls poll().

rpjohnst · March 29, 2018, 7:19pm

Fair enough! Do you have any specific questions? I attempted some example output here for the full CPS version; here's some more detail on how I see the future-compatible version working:

As today, generators are anonymous types implementing the Generator trait with a method unsafe fn resume(&mut self) -> GeneratorState. Also like today, let x = f(); yield y; g(x) is compiled down to something like this in that resume method:
```
match *self {
    State0 { y } => {
        let x = f();
        *self = State1 { x };
        GeneratorState::Yielded(y)
    }
    State1 { x } => {
        let r = g(x);
        *self = Done;
        GeneratorState::Complete(r)
    }
    Done => panic!("resuming completed generator")
}
```
(Of course *self doesn't have to be laid out like an enum- but the basic idea is the same.)

Unlike today, generators support an additional operator- await. Unlike today's await! macro, which works on futures, this operator works on calls to other generators. For example, await g(x, y, z):

To stick as closely as possible to today's implementation while remaining forwards-compatible, it might be equivalent to this (in the generator source):

let g = G::State0 { x, y, z };
loop {
    match g.resume() {
        GeneratorState::Yielded(x) => yield x,
        GeneratorState::Complete(x) => break x,
    }
}

Which would compile down to something like this in the resume method:

match *self {
    // ... previous states ...
    BeforeG { x, y, z } => {
        // ... previous code ...
        let g = G::State0 { x, y, z };
        *self = AwaitingG { g };
        /* !!! FALLTHROUGH to AwaitingG !!! */
    }
    AwaitingG { ref mut g } => {
        match g.resume() {
            GeneratorState::Yielded(x) => return GeneratorState::Yielded(x),
            GeneratorState::Complete(x) => // ... next code ...
        }
    }
    // ... next states ...
}

The key here is that it's a particular call to the generator g that gets awaited directly- not a future. This way the calling convention for await can be changed later to use CPS:

match *self {
    // ... previous states ...
    BeforeG { caller, x, y, z } => {
        // ... previous code ...
        let g = G::State0 { caller: self as *mut dyn Generator, x, y, z };
        *self = AfterG { caller, g };
        become (self as AfterG).g.resume();
    }
    AfterG { caller, g } => {
        /* get g's return value, ideally by taking it as an argument to `resume` */
        // ... next code ...
    }
    // ... next states ...
}

This version makes the generator state f look like this:

f: F::AfterG { g: G::State0 { caller: TraitObject(&f, &F::resume), x, y, z } }

Also unlike today, generators cannot be constructed directly. In the final version from the above point, they always have a caller member- a trait object with its data pointer to its awaiter's state and its vptr to its awaiter's resume method.

To accommodate this, a CPS-based top-level generator needs to be constructed with a fake awaiter, whose resume method just does this:

let r = /* get the generator's return value as above */;
return GeneratorState::Complete(r);

To be forward-compatible with this change, generators need to be constructed and driven via a standard library API whose internals are not stabilized. Today, it might look like this:

struct RunningGenerator<G: Generator>(G);
impl<G: Generator> RunningGenerator<G> {
    fn new() -> Self { /* construct a dummy generator that will panic on resume */ }
    fn init(self: Pin<Self>) { /* construct a G using a private static method on `Generator` */ }
}
impl<G: Generator> RunningGenerator<G> {
    type Yield = G::Yield;
    type Resume = G::Resume;
    type Return = G::Return;
    fn resume(self: Pin<Self>, value: G::Resume) -> GeneratorState { self.0.resume(value) }
}

Then in the future, it might be changed to look like this:

struct RunningGenerator<G: Generator> {
    generator: G,
    next: *mut dyn Generator, // self-referential
}

struct FinalContinuation;
impl Generator for FinalContinuation<Return> {
    fn resume(&mut self, value: Return) -> GeneratorState { GeneratorState::Complete(value) }
}

impl<G: Generator> for RunningGenerator<G> {
    fn new() -> Self { /* same as above */ }
    fn init(self: Pin<Self>) {
        /* construct a G using a &FinalContinuation<G::Return> */
        /* point `self.next` to `self.generator` */
    }
    fn resume(&mut self, value: G::Resume) -> GeneratorState {
        self.generator.resume(value);
        /* tweak the actual generator ABI to yield the inner-most continuation; save it in self.next */
    }
}

(This is where we hit the typing problem I mentioned in the first downside I listed in the initial proposal- it doesn't come up in the forward-compatible version, only in the CPS version. It's the type of RunningGenerator- as written here, I'm not totally confident of the types of next and resume, and I've left them only partially specified.)

Yes, and agreed. This is the second downside I listed in the initial proposal. Fortunately, they only become immovable after they're first resumed (or rather, in my strawman API above, after inited). It is significantly less common to move iterators after the chain has been constructed.

That's... kind of going in the opposite direction from what I'm proposing, removing requirements from the runtime environment and context. CPS does (sort of) require tail call support, but it doesn't affect FFI or the runtime any more than generators and futures already do.

rpjohnst · March 29, 2018, 7:32pm

Since you mention iterators, another benefit of an await operator that works directly on generator invocations is it can be used in iterators just as easily as futures, because the trait (Iterator or Future) is only involved at the boundary.

Reminds me of Python yield from, somewhat.

system · March 25, 2019, 8:29am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: Allowing async/await in no_std language design	8	2554	March 25, 2019
Recursive async function permitted, but why? language design	9	1023	September 20, 2020
Pre-RFC: Await generators directly language design	11	3101	March 25, 2019
Blog Post: Futures Concurrency III language design	2	917	May 16, 2022
Async/Await series language design	30	8027	March 25, 2019

Pre-RFC: CPS transform for generators

Related topics