While async functions alone are nice, we still need some kind of interfaces
to make our software more flexible and to test it.
Traits for async Rust are
kind of a tricky topic. There exist
poll_() methods which are object-safe,
and there is the hope that GATs at some day might provide true zero-cost async
interfaces. All of those are however harder to understand and use than
which provides traits in a form that resemble the synchronous counterparts.
However up to
async-trait comes with 2 downsides compared to an ideal version of async traits:
- Dynamic dispatch requires an indirect jump and prevents inlining and some further optimizations
- it requires a dynamic allocation for each invocation of an async method on a trait.
As the title of this post hints, there exist some ideas around how to reduce the overhead of the 2nd part:
While thinking about the problem I recalled some optimizations and findings around async methods in other ecosystems - here most notably the .NET ecosystem.
In .NET the return values of async methods initially all had to be heap allocated
(in the form of a
Task<T> object). C#'s async methods are not "lazy" as
Rust's, and therefore the .NET team could already earlier on make an optimization
which reduced the amount of required allocations: If the method had chances to often finish
without suspension, it could return a
which avoids the heap allocation for the state-machine. This makes sense in a lot of cases - e.g. when
writing on sockets, which are typically "ready" and do not "block".
ValueTask optimization for async operations which do not actually suspend
unfortunately does not carry over to Rust: Due to the lazy nature of Rusts
we have to allocate the
Future before the first check for completion occurs.
- An async method on an object is not concurrently called in most cases. Users
call the method and
- This means for each async method on an object there exists typically only one
Task<T>object in memory.
- This object is a good candidate for pooling and reuse!
After investing some more thoughts into this, I figured out that this approach actually can carry over into Rust:
- async methods on Rusts trait objects are typically also only called one at a
time, and the returned
- if the method takes
&mut self- this would even be guaranteed!
- the implementation behind the async trait always returns the same concrete
Futurewhich needs to get boxed, so any allocation which can hold the state of the initial
Futurecan also hold the state of additional
==> Based on this I thought it should be possible to reuse allocations for
returned from async trait objects in Rust, in case those methods are called more than
The results are rather promising, so I wanted to share them here. E.g. on windows I could observe an up to 5x performance improvement for repeated calls to async functions on trait objects. On Linux the performance improvement was far lower and very different between memory allocators. More details are in the repositories readme.
The whole appraoch definitely needs a bit more evaluation on real applications, but I'm still
rather excited about it. For some applications it might open up the opportunity
to use an easy-to-implement
async fn based
Stream implementation instead of
poll_fn based version. It could also allow us to rethink what the
best way is to represent async IO traits (which are typically called more than once).
The linked repository contains a bit more description. The code is a bit beyond proof of concept state, it might still need some fixes here and there. If you notice anything broken or undefined behavior feel free to open an issue on Github and/or propose a fix.
async-trait itself: I think if this is interesting, then a
future version of
async-trait could integrate support for it which is invisible
from an API point ov view: The return type of async methods would need to be
Box<Pin<dyn Future>> to
DynamicFuture - as shown in the linked
repository. In this case implementations of the traits could decide whether they
want to return unique heap-allocated
Futures for each call - or whether they
would rather want to reuse
Futures between calls. This could easily be
configured by an attribute on methods. Since
DynamicFuture is fully type erased
the behavior would not impact consumers.