An opportunity to improve performance of generators and futures

newpavlov · May 31, 2020, 3:46am

It could be possible to improve performance of async code by reducing amount of indirections associated with boxed futures. Description of the idea and toy benchmarks can be found here:

steffahn · May 31, 2020, 8:46am

Could this also be a more general optimization of the layout of non-public enums that have only a single match statement globally that processes them? Also, you don’t mention generic Futures, but I guess that every monomorphic instance of those would get its own set of function pointers. The condition of “only a single match statement globally” isn’t true anymore in the presence of destructors. Which makes me wonder: How do you envision destructors to work with your implementation?

PoignardAzur · May 31, 2020, 9:59am

In theory it could be done by special-casing Box<dyn Future> and Box<dyn Generator> on compiler level and adjusting code generation accordingly.

That would be a non-trivial change, though. The problem is, Box<dyn Future> has a layout roughly equivalent to:

struct Box<...> {
    data: &FutureData,
    vtable: &__vtable_Future,
}

struct __vtable_Future {
    poll: fn(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output>
}

The problem is, poll in the vtable takes a (pinned) type-erased reference to the future data; it does not take a reference to the entire Box, which means it has no way to modify the vtable data when stopping at a different yield point.

I suspect finding a way to overcome this would require substantial changes.

newpavlov · May 31, 2020, 6:59pm

Yes, essentially FSM transition function takes pointer to a state (it could be on the heap or on the stack) and assumes that state behind the pointer is valid for it. Compiler just has to track the maximum size of the state (and alignment requirements), otherwise it can view states and transition functions from libraries as completely opaque objects. I guess it could significantly reduce amount of analysis which has to be done by compiler, since it does not have to explicitly construct the huge state enum anymore.

Yes, you are correct.

Running destructor is essentially just another state transition dependent on the input data. We could modify the function signature to:

fn drive(f: *mut StateTransFn, state: &mut T, drop_flag: bool);

By calling the state transition function with drop_flag = true, it will handle cleaning objects inside the state (since only transition function can properly interpret data inside the state), while the caller would deallocate the state memory from the heap if needed.

Yes, I understand. And I wrote that I don't have enough knowledge to estimate how complex such change will be in practice. Although note that "special-casing" here means that we will bypass the usual memory layout for boxed traits entirely and in a certain sense will use a specialized calling convention.

system · August 29, 2020, 6:59pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: CPS transform for generators language design	31	5448	March 25, 2019
Recursion for async/.await language design	78	5935	August 3, 2020
Blog series: Dyn async in traits (continues) language design	69	5669	December 24, 2022
A hack to enforce typestate for futures	9	1167	April 27, 2020
When can we have async fn in trait? compiler	11	1024	July 17, 2023

An opportunity to improve performance of generators and futures

Related topics