Idea: async fn in trait objects

All async (and proposed gen) fn's involve an anonymous associated return type to a trait. However, in case of generated type being Sized (so it cannot have unsized locals in state) we can in fact store it at the beginning of a method's stack, before local sized variables.

When such async trait method returns to the caller, we can then retain a future on stack, providing it to a caller. This way we can provide support for most async fns in trait objects.

Downsides:

  • future type of an async fn must be sized;
  • involves creation of unsized local on a function boundary.

One additional issue is that it's an unsized return value, which is far more difficult to implement then an unsized paramter or unsized local. Many abis implement large return values (typically including return values that exceed the register width) by passing a pointer to storage to place that value in. There isn't a way to know in the caller how much storage to allocate for an unsized return value, thus they can't be implemented in this manner.

1 Like

The core idea is that: if called method knows how much space such "unsized" return would take, then it can pre-allocate it at the beginning of it's stack frame; after the work is done it just lefts unsized value in its place.

This way caller faces such situation: |caller's stack frame|"Unsized" future| unallocated stack|. Indeed, caller doesn't know size a future would take before call, but after that call it knows the size from change of RSP register as well as that it's on the top of stack.

That doesn't work without some considerable extra work, at least on targets that don't have a redzone (and if the future exceeds that size as well). The return pointer and (optionally) the frame base address will be at the head of the stack frame. Also it means that either the abi of the trait object function won't match the abi of the concrete impl's function, or the concrete impl in this case would need a whole separate abi that only applies to such cases.

I don't see any problem with it:

  • rust ABI itself is not specified
  • returning unsized value from anywhere is not implemented and has no stable ABI
  • so the difference between a concrete impl ABI and trait object ABI shouldn't be a concern

I understand that this way we aren't giving enough of a controll, but the alternative for providing the feature is to allocate futures (expensive, implicit, not always clear where to allocate)

If somebody wants boxed futures, they can easily mention them in their traits, and once allocator api lands - choose even where to allocate.

Also OOM is the thing which is generally hard to recover from, so I guess that having a convenient and efficient default is good to go.

The problem with a different ABI is subtler than that; you either have the ABI of calling it directly differ from calling it through dyn Trait (in which the dyn Trait version has to codegen a copy of the method with a different ABI), or the method always uses the return-unsized ABI, in which the extern "Rust" function ABI differs based on if this is a trait impl which returns a non concrete type (which has nothing to do with the actual signature of the function).

Both of these are possible solutions. One of these has to be the case, in fact, since the trait impl returns a type with runtime size, and the concrete impl returns a type with static size.

Of course, there's the extra wrinkle of whether LLVM supports function calls that change the stack pointer at all. It does support allocas with runtime size, but I'm fairly certain it doesn't (currently) support a function call ABI that returns a value in a new alloca to the calling stack frame.

I think more generally placement can solve the problem more properly. In short,

  • Allow -> impl Trait in object safe traits,
  • Put the size/align layout of the return object (which is static per trait impl) in the vtable,
  • The caller is responsible for allocating the space per the vtable, and then
  • Call the function with the newly allocated return slot.

This doesn't try to solve the general case of returning truly runtime sized values, and just handles dynamic dispatch to a statically sized return value.

BUT, this does raise another (more frustrating) problem: you can't do this (with a stack held return value) in async fn. Why? Because the witness object (the impl Future) itself needs to have a statically known size. While you can technically do funky things to get dynamically sized stack objects, the "stack" that an async fn has access to that persists over .await points is not actually stack, but instead memory slots in the impl Future. We have "stackless (semi)coroutines", not "green threads" / "stackful coroutines".

So while you could call an async fn in trait that returns its future on the stack, you couldn't .await it, because that would involve putting that dynamically sized future in your own statically sized future, so you'd have to box it to await it anyway.

(And propagating unsizedness makes your size impossible to determine before you request the size at runtime, because nothing limits you from using multiple of these semi unsized returning functions which depend on each other's results, so there's no way to pre-allocate size for your future at all.)

2 Likes