Pre-RFC: CPS transform for generators

I think I found a mostly equivalent formulation of this proposal that might be simpler to understand and implement.

The idea is to add a variant of Future::poll to Future that has an additional Async::Nested return value that means “runs this other Future and then resume this one with the results”: this would return a reference to a nested Future and a pointer to a function for resumption.

When this is returned, the executor would push the reference to the Future and to the resume function on a Stack associated with the Future task, and on the next polls, it would start directly with the top of this stack.

Once a Future from the stack returns Ready, the entry is popped from the stack and the returned value is passed to the resume function.

Note that there’s the complication that Future is obviously generic in T, and nested futures can have any type: to address this, the size of T needs to be returned along with the nested future reference, and the future return value needs to be passed with an ABI only dependent on the size of T (e.g. instead of using the function return value, pass an &mut [u8] that is then transmuted to &mut RawOption (a version of Option with layout optimizations disabled) on a “GenFuture” trait that Future would depend on).

There’s also the complication that the reference to the inner Future would be an &'a mut with the same 'a lifetime of the self argument to the poll() variant, which would not be safely storable. However, the executor future task could itself be implemented with an immovable generator, which would allow those references to be stored on the generator stack with safe code. In fact, this code does not need to be tied to the executor, and can be essentially be a “Future combinator” that allows executors that only support the current Future interface to take advantage of Future stacks supporting the new interface without them being directly coupled.

If a Future does not implement this new poll variant, then there would be a default implementation that just calls the current poll() and never returns Async::Nested, giving the current behavior with full compatibility.

To support async/await, a new type of Generators that would be able to take a value when resuming, and also able to provide a function to directly call at resumption (using a CPS transform) would be introduced and the async/await macro would start using those. There’s no need for tail call optimization, since the “tail call” is essentially performed by returning the address of the function to be called and letting the executor run it.

This is slightly inefficient, so this could be addressed by passing a reference to the executor stack to the poll() variant and letting the Future itself push entries to the executor stack and then call directly the child future on the first call (this would ideally be a tail call, but a non-tail call wouldn’t hurt much either).

Furthermore, the Future themselves could potentially drive the stack themselves, which might be slightly more efficient. If the stack were to be implemented as a linked list of slots in each future rather than a central stack, then this should result in the same full CPS-based system described by @rpjohnst (however, this would be a less “idiomatic Rust” approach and would result in the stack slots being present in the layout of unexecuted Futures - on the other hand, this would save the memory allocation for the central stack).

Again, this is essentially equivalent to @rpjohnst 's idea, but I think this formulation might be interesting and also make it clearer that it should be a mostly or fully compatible change in the future if desired.

That said, not sure whether it’s really a good idea, since it’s probably going to reduce performance in simple cases with little or no nesting in exchange for gaining performance in complex cases with a lot of nesting, as well as taking up quite some effort to implement, and I’m not sure if that’s worth it.

2 Likes