Blog series: Dyn async in traits (continues)

In the normal stack, if one function uses a large amount of stack space A, and another uses a small amount B, the total amount of stack space used is A + B, which is not much bigger than A. If an async executor decides to give all dynamically sized futures enough space to do A, then if one dynamically sized future does A and another does B, the total space used is 2A. Either the executor wastes a lot of space by giving every future enough to do the maximum, or you are likely to OOM on some future, because unlike normal stack space which is shared between all functions, each dynamically sized future would get its own "stack" and can't share "extra" with more needy futures. Because of this, it's almost certainly better to just box your dynamically sized values in futures.

1 Like

Is there a fundamental reason "allocate maximal space" couldn't work across crates? Could the ABI be modified so that crates provide a table of maximal space required per trait method? That seems like it mightn't require much analysis.

Edit: Somewhat answering my question, it wouldn't really work in the (currently uncommon) case of dynamically loaded Rust libraries.

So now it seems a lot more appealing to me, and I’m grateful to Olivier Faure for bringing it up again.

Recognized!

There is one complication. Today in Rust, every dyn Trait type also implements Trait. But can dyn AsyncIterator implement AsyncIterator? In fact, it cannot! The problem is that the AsyncIterator trait defines next as returning impl Future<..>, which is actually shorthand for impl Future<..> + Sized, but we said that next would return dyn Future<..>, which is ?Sized. So the dyn AsyncIterator type doesn’t meet the bounds the trait requires. Hmm.

Is the solution you mentioned in this thread, about adding a special Sized if Self is Sized bound to RPITs, not on the table?

Because if it is, then dyn MyTrait would still implement MyTrait.

4 Likes

One as-yet-unmentioned possibility is to have some kind of Storage or similar that has a specialized behavior for Sized values. So Box<T, SizedOnStack>::new would place its contents on the stack iff T: Sized. That way, you can write code that is generic over both impl AsyncTrait and dyn AsyncTrait with no performance penalty for sized cases.

1 Like

That's true, but that's a downside of the current async model rather than my solution. You can get the same with the currently existing futures: hold A bytes over the first await point with a future call of size B, hold B bytes at the second await point with the future call of size A. Boom, your memory usage is 2A rather than A+ B.

In fact, there is not even a lint which would hint at the wasted memory.

We can verify that at the playground. You can see that in "two async fn calls" the size of a future is the sum of the large array crated within and of the called heavy future, even though the large array is dropped before awaiting the large future (there is also a byte per level of nesting, to store the state).

So we get the same behaviour as with dyn futures, except for the dynamic size, of course. But the dynamic size is opt-in both at the level of async functions (you can just don't use dyn futures) and at the level of the executor/toplevel (you can require that all spawned futures are Sized). If you must use only sized futures but need to use dyn dispatch anyway, you always have an option of explicitly boxing the dyn futures within the async fns, just like in the last Niko's post (if you can allow yourself arbitrary heap allocations, which is the default assumption in this discussion anyway).

NB: the behaviour of future size in the example above is actually really weird. Look at the last two sizes (with moved or awaited same future). If we just move a future into async block, the size expectedly increases by the size of that future. But if we await the future in an async block, the size increase is suddenly double that of the future.

Looks like a bug to me, I just can't see why that would happen.

The size of future objects is also a significant footgun, by the way. It is composed of the naive size of the future, without any size&code optimizations. For example, the large array in the examples isn't used in any way, so one would expect it to be removed via dead code elimination - but it isn't! Literally no optimization of future size is applied. I guess this happens because the state machine object is constructed separately from the code optimizations, so it just doesn't have access to any liveness properties. Still, it means that it is easy to write a series of async functions which compiles to a much larger future than expected.

This is even worse with generators, which are fortunately unstable. With async functions, at least we can say "don't do anything heavy in async fn, spawn a thread for any complex computation or large data manipulation". With generators memory- and computationally-heavy code is definitely in scope, so that's a size bomb waiting to go off.

This is actually a weaker restriction than "you can't have dynamically sized values in futures at all"!

async fn dynamically_sized(foo: &dyn AsyncTrait) {
    // Future returned by `some_async_method` is dynamically sized!
    foo.some_async_method().await;
}

The above function could be made to work by returning a dynamically sized future. When the async fn is called, it can inspect the vtable of the trait object it received as an argument, determine how much space it will need from that vtable, and then return a future of that size. The choice of allocation strategy is thereby bumped up to the caller.

Rust would need some way of expressing "to call this function I need the full set of arguments, but I only need a subset to determine the size of the returned value."

1 Like

The bug with async block size is likely related to this old known issue

1 Like

An idea to preserve ergonomics for std users with the unsized proposal:

Why not let await automatically use the global allocator when it is called on a dyn T? Or use the best strategy for the given context, like using the stack if the future can be devirtualized and is known to be small.

In a no_std environment the compiler can produce an error that `dyn Futures must be allocated first.

That doesn't make things worse for no_std, but doesn't make us litter .box everywhere in Std code, and could make some optimisations easier.

1 Like

Nico addresses this in part 8's "The soul of Rust" section. The main struggle is that heap allocations in rust have been explicit so far and not automatically inserted by the compiler. Adding the latter crosses a line that not everyone is comfortable with. Nico analyzed the trade-offs and struggle in more detail in his "What I meant by the soul of Rust" post.

3 Likes