I will start this post with a shoutout to to @dtolnay! async-trait is a great crate, which removes a lot of friction trying to work with "Rust async"!
While async functions alone are nice, we still need some kind of interfaces
to make our software more flexible and to test it. Trait
s for async Rust are
kind of a tricky topic. There exist poll_()
methods which are object-safe,
and there is the hope that GATs at some day might provide true zero-cost async
interfaces. All of those are however harder to understand and use than async-trait
,
which provides traits in a form that resemble the synchronous counterparts.
However up to async-trait
comes with 2 downsides compared to an ideal version of async traits:
- Dynamic dispatch requires an indirect jump and prevents inlining and some further optimizations
- it requires a dynamic allocation for each invocation of an async method on a trait.
As the title of this post hints, there exist some ideas around how to reduce the overhead of the 2nd part:
While thinking about the problem I recalled some optimizations and findings around async methods in other ecosystems - here most notably the .NET ecosystem.
In .NET the return values of async methods initially all had to be heap allocated
(in the form of a Task<T>
object). C#'s async methods are not "lazy" as
Rust's, and therefore the .NET team could already earlier on make an optimization
which reduced the amount of required allocations: If the method had chances to often finish
without suspension, it could return a
ValueTask<T>
object
which avoids the heap allocation for the state-machine. This makes sense in a lot of cases - e.g. when
writing on sockets, which are typically "ready" and do not "block".
The ValueTask
optimization for async operations which do not actually suspend
unfortunately does not carry over to Rust: Due to the lazy nature of Rusts Future
s
we have to allocate the Future
before the first check for completion occurs.
However later on the .NET team made another realization:
- An async method on an object is not concurrently called in most cases. Users
call the method and
await
the returnedTask
s. - This means for each async method on an object there exists typically only one
Task<T>
object in memory. - This object is a good candidate for pooling and reuse!
After investing some more thoughts into this, I figured out that this approach actually can carry over into Rust:
- async methods on Rusts trait objects are typically also only called one at a
time, and the returned
Future
is immediately.await
ed. - if the method takes
&mut self
- this would even be guaranteed! - the implementation behind the async trait always returns the same concrete
Future
which needs to get boxed, so any allocation which can hold the state of the initialFuture
can also hold the state of additionalFuture
s.
==> Based on this I thought it should be possible to reuse allocations for Future
s
returned from async trait objects in Rust, in case those methods are called more than
once.
I implemented a proof of concept of this of the approach here.
The results are rather promising, so I wanted to share them here. E.g. on windows I could observe an up to 5x performance improvement for repeated calls to async functions on trait objects. On Linux the performance improvement was far lower and very different between memory allocators. More details are in the repositories readme.
The whole appraoch definitely needs a bit more evaluation on real applications, but I'm still
rather excited about it. For some applications it might open up the opportunity
to use an easy-to-implement async fn
based Stream
implementation instead of
a manual poll_fn
based version. It could also allow us to rethink what the
best way is to represent async IO traits (which are typically called more than once).
The linked repository contains a bit more description. The code is a bit beyond proof of concept state, it might still need some fixes here and there. If you notice anything broken or undefined behavior feel free to open an issue on Github and/or propose a fix.
Back to async-trait
itself: I think if this is interesting, then a
future version of async-trait
could integrate support for it which is invisible
from an API point ov view: The return type of async methods would need to be
chaned from Box<Pin<dyn Future>>
to DynamicFuture
- as shown in the linked
repository. In this case implementations of the traits could decide whether they
want to return unique heap-allocated Future
s for each call - or whether they
would rather want to reuse Future
s between calls. This could easily be
configured by an attribute on methods. Since DynamicFuture
is fully type erased
the behavior would not impact consumers.