What would be the tradeoffs of using the placement by return RFC for this? Potential advantages:
If Rust alloca ever happens, returning unsized futures can "just work" in non-async contexts.
In contexts where alloca doesn't work, boxing the future is "just" Box::new_with(|| ..), no need for dedicated adapter types.
This also means that, if the storages proposal happens, all the different storage types will automatically be available to store those unsized futures. Box::new_with(|| ..) will support them all, a one-stop shop error messages can point to
You can choose allocation strategy at the point you call the async method. This is more flexible (but potentially less ergonomic) than choosing allocation strategy for all callers when you create the dyn adapter.
This also means you can choose a different allocation strategy for different methods on the same dyn AsyncTrait. For example, one async trait method might return a future that is probably small and fits on the stack, while another might be more likely to return a large future that belongs on the heap.
I am not sure about caller site box caching, can it handle the first function?
async fn print_interleave<'a, T: Debug>(
mut iter: &'a mut dyn AsyncIterator<Item = T>,
mut other: &'a mut dyn AsyncIterator<Item = T>,
) {
while let Some(next) = iter.next().await {
core::mem::swap(&mut iter, &mut other);
println!("{next:?}");
}
while let Some(next) = other.next().await {
println!("{next:?}");
}
}
async fn print_interleave2<T: Debug>(
iters: &mut VecDeque<&mut dyn AsyncIterator<Item = T>>,
) {
while let Some(mut iter) = iters.pop_front() {
while let Some(next) = iter.next().await {
iters.push_back(iter);
println!("{next:?}");
iter = iters
.pop_front()
.expect("we just push_back'ed an iter, there has to be one!");
}
}
}
I am pretty sure that no solution will be able to caller-site cache the second function.
Box it as a default
I would like to give the rust for linux project as an example. Kernel developers would absolutely not like implicit allocations. async is already being used in some experimental drivers.
General thoughts
I think it would be better to select the type of returning at the call site. So the ABI for dynamic dispatch async functions should include a strategy selection stub/multiple functions should be generated with the different strategies.
Most often the caller will have a better idea of the constraints than the callee. It will also prevent the following scenario: What if a dependency suddenly changed to use Boxing instead of InlineAsyncIterator? Implementation details should not leak into my crate!
I would really like to see an attempt at a solution with the placement by return RFC that @Jules-Bertholet alreay mentioned.
In my opinion, the boxing adapter will just be yet another annoying, frustrating, and undiscoverable thing that users in the async ecosystem will have to deal with. This will continue to harm async Rust's reputation unnecessarily.
Pragmatically, the anathema on allocation has never been sensible in my opinion. Rust will happily let you memcpy megabytes with no transparency, but heaven forbid you increment an Rc's refcount silently. These decisions were arbitrary and don't represent a philosophy that actually benefits users, from my perspective. As async is actually used in production, any time you're awaiting you're probably performing network IO of some kind and the allocation is literally orders of magnitude cheaper.
However, I think the fact that this hasn't been decided, nearly 3 years after shipping the MVP, does a lot more harm to async Rust's reputation than requiring some boxing adapter will do. The Rust project (and especially the language team) have an attitude toward discussions and the consensus process that in my opinion is toxic and doing a lot of harm to Rust the product (and I include my past conduct on await syntax in this assessment). I will be happy to see any solution to async trait methods shipped in stable Rust.
The placement by return RFC would make such dangerous memcpys less common and easier to avoid, in addition to allowing total control over heap allocation and unsized returns.
To elaborate, here is an example use case that is not no_std specific (can happen in regular application code) and that only placement-by-return can address:
trait AsyncFoo {
async fn do_lots_of_work_with_a_big_future(&self);
async fn very_simple_function_small_future(&self);
}
async fn do_the_work(foo: &dyn AsyncFoo) {
// We want to box this future because it's really big.
// We don't care if this is slow
Box::new_with(|| foo.do_lots_of_work_with_a_big_future()).await;
// Tight loop! Business critical!
// Every millisecond counts!
loop {
// Heap allocation would be too slow here
StackBox::new_with(|| foo.very_simple_function_small_future()).await;
}
}
Essentially the alloca method but just require boxing (or some inderection) at await points. I think that's the placement new method others above have mentioned.
Specifically I think it requires:
The AsyncIterator trait object to contain the vtable for the returned future type, so that the caller knows its size, alignment, and poll method.
Something like unsized locals and placement new.
This allows the ability to do most (all?) of the patterns described, and to me is simple, transparent, easy to make perrormant, easy to produce understandable errors, and very similar to Boxing in terms of productivity. It also has the benefit of not using any particularly weird features, I'm worried a bit that Boxing would be quite magic and people would want to write similar but subtly different versions but not be able to.
Placement by return is basically this, with one complication. Function arguments are passed on the stack before a function is called, but you can't put unsized values on an async fn's stack. So instead, you pass in a closure to Box::new_with. new_with calls the closure and provides a place on the heap for the closure to return the unsized value (hence "placement by return"). The result looks like:
Turns out this kinda already works (after enabling quite a lot of features), the only thing that's really missing is for dyn async traits to return dyn Futures. It would be quite cool for trait objects to return trait objects instead of associated types where possible, making a bunch of traits object safe.
So I found this document created in the zulip discussion posted above. It states:
async contexts, and generators in general, do not support unsized allocation on the stack (also known as alloca). This is because their stack values that exist across await points are pre-allocated. Futures being awaited always exist across await points, so this approach would not support stack allocation.
This is because the generator has two stacks, one normal function stack and one generator stack that is preserved across yield/await. The function stack is renewed every time.
I think that the limitation then would just be "before the next yield/await, figure out where to store this". There would need to be some support for storing dyn Trait in fixed-size fields, example:
async fn print_all(iter: &mut dyn AsyncIterator<Item = String>) {
while let Some(next: dyn Future<Output = String>) = iter.next() {
if let next: { dyn Future<Output = String>; 24 } = next {
println!("{}", next.await);
} else {
let string = Box::pin(next).await;
println!("{string}");
}
}
}
Here { dyn Trait; $size } stands for a size capped dyn trait object. Using pattern matching one can assign dyn Trait to { dyn Trait; $size }. In code that does not care about this level of control, one can still use the Boxing wrapper:
async fn print_all(iter: &mut dyn AsyncIterator<Item = String>) {
let iter = Boxing::new(iter);
while let Some(next: Pin<Box<dyn Future<Output = String>>>) = iter.next() {
println!("{}", next.await);
}
}
We already have multiple implementations of size-capped dyn trait objects, e.g. stack_dst::Value for just a fixed size allocation or smallbox for a version that automatically promotes to the heap when the size is exceeded. It seems plausible for these to support unsized-fn-params as a way to pass a bare dyn value in. (Or pretty trivial to write a SmallBoxing::<S16> adaptor similar to Boxing (if we look at a TAIT +GAT based approach rather than a dyn* one)).