Blog series: Dyn async in traits (continues)

nikomatsakis · September 18, 2022, 5:55pm

Continuing the discussion from Blog series: Dyn async in traits -- I've posted some more blog posts in this series:

Part 6 -- Oct 15, 2021
Part 7: A design emerges? -- Jan 7, 2021
Part 8: the soul of Rust? -- Sep 18, 2022
What I meant by the soul of Rust
Part 9: call-site selection

As always, I'd love to hear what people think.

Jules-Bertholet · September 18, 2022, 8:18pm

What would be the tradeoffs of using the placement by return RFC for this? Potential advantages:

If Rust alloca ever happens, returning unsized futures can "just work" in non-async contexts.
In contexts where alloca doesn't work, boxing the future is "just" Box::new_with(|| ..), no need for dedicated adapter types.
- This also means that, if the storages proposal happens, all the different storage types will automatically be available to store those unsized futures. Box::new_with(|| ..) will support them all, a one-stop shop error messages can point to
You can choose allocation strategy at the point you call the async method. This is more flexible (but potentially less ergonomic) than choosing allocation strategy for all callers when you create the dyn adapter.
- This also means you can choose a different allocation strategy for different methods on the same dyn AsyncTrait. For example, one async trait method might return a future that is probably small and fits on the stack, while another might be more likely to return a large future that belongs on the heap.

y86-dev · September 18, 2022, 8:30pm

Caller site `Box` caching

I am not sure about caller site box caching, can it handle the first function?

async fn print_interleave<'a, T: Debug>(
    mut iter: &'a mut dyn AsyncIterator<Item = T>,
    mut other: &'a mut dyn AsyncIterator<Item = T>,
) {
    while let Some(next) = iter.next().await {
        core::mem::swap(&mut iter, &mut other);
        println!("{next:?}");
    }
    while let Some(next) = other.next().await {
        println!("{next:?}");
    }
}

async fn print_interleave2<T: Debug>(
    iters: &mut VecDeque<&mut dyn AsyncIterator<Item = T>>,
) {
    while let Some(mut iter) = iters.pop_front() {
        while let Some(next) = iter.next().await {
            iters.push_back(iter);
            println!("{next:?}");
            iter = iters
                .pop_front()
                .expect("we just push_back'ed an iter, there has to be one!");
        }
    }
}

I am pretty sure that no solution will be able to caller-site cache the second function.

`Box` it as a default

As an example of where this might matter, it might be that you are writing some sensitive systems code where allocation is something you always do with great care. It doesn’t mean the code is no-std, it may have access to an allocator, but you still would like to know exactly where you will be doing allocations. Today, you can audit the code by hand, scanning for “obvious” allocation points like Box::new or vec![]. Under this proposal, while it would still be possible, the presence of an allocation in the code is much less obvious. The allocation is “injected” as part of the vtable construction process. To figure out that this will happen, you have to know Rust’s rules quite well, and you also have to know the signature of the callee (because in this case, the vtable is built as part of an implicit coercion). In short, scanning for allocation went from being relatively obvious to requiring a PhD in Rustology. Hmm.

I would like to give the rust for linux project as an example. Kernel developers would absolutely not like implicit allocations. async is already being used in some experimental drivers.

General thoughts

I think it would be better to select the type of returning at the call site. So the ABI for dynamic dispatch async functions should include a strategy selection stub/multiple functions should be generated with the different strategies.

Most often the caller will have a better idea of the constraints than the callee. It will also prevent the following scenario: What if a dependency suddenly changed to use Boxing instead of InlineAsyncIterator? Implementation details should not leak into my crate!

I would really like to see an attempt at a solution with the placement by return RFC that @Jules-Bertholet alreay mentioned.

kornel · September 18, 2022, 9:06pm

I do value the transparency and control aspects. I'd be fine with an explicit Boxing::new() adapter.

PoignardAzur · September 18, 2022, 10:25pm

It feels like every single time Niko writes one of those posts, someone will mention placement return, and every single time Niko ignores it.

We even had a pretty big discussion on the subject on zulip a few months back, where various trade-offs were mentioned.

It's pretty disappointing none of that was mentioned in Niko's post. At that point it feels like arguing in circles.

withoutboats · September 18, 2022, 10:29pm

(NOT A CONTRIBUTION)

In my opinion, the boxing adapter will just be yet another annoying, frustrating, and undiscoverable thing that users in the async ecosystem will have to deal with. This will continue to harm async Rust's reputation unnecessarily.

Pragmatically, the anathema on allocation has never been sensible in my opinion. Rust will happily let you memcpy megabytes with no transparency, but heaven forbid you increment an Rc's refcount silently. These decisions were arbitrary and don't represent a philosophy that actually benefits users, from my perspective. As async is actually used in production, any time you're awaiting you're probably performing network IO of some kind and the allocation is literally orders of magnitude cheaper.

However, I think the fact that this hasn't been decided, nearly 3 years after shipping the MVP, does a lot more harm to async Rust's reputation than requiring some boxing adapter will do. The Rust project (and especially the language team) have an attitude toward discussions and the consensus process that in my opinion is toxic and doing a lot of harm to Rust the product (and I include my past conduct on await syntax in this assessment). I will be happy to see any solution to async trait methods shipped in stable Rust.

Jules-Bertholet · September 18, 2022, 10:44pm

The placement by return RFC would make such dangerous memcpys less common and easier to avoid, in addition to allowing total control over heap allocation and unsized returns.

Jules-Bertholet · September 18, 2022, 11:16pm

To elaborate, here is an example use case that is not no_std specific (can happen in regular application code) and that only placement-by-return can address:

trait AsyncFoo {
    async fn do_lots_of_work_with_a_big_future(&self);

    async fn very_simple_function_small_future(&self);
}

async fn do_the_work(foo: &dyn AsyncFoo) {
   // We want to box this future because it's really big.
   // We don't care if this is slow
   Box::new_with(|| foo.do_lots_of_work_with_a_big_future()).await;

   // Tight loop! Business critical!
   // Every millisecond counts!
   loop {
       // Heap allocation would be too slow here
       StackBox::new_with(|| foo.very_simple_function_small_future()).await;
   }
}

Skepfyr · September 18, 2022, 11:42pm

Edit: Reading the placement-by-return RFC I think this is the same suggestion as the above comments.

The way I would naturally expect this to work is like this:

async fn use_dyn(iter: &dyn AsyncIterator) {
    Box::new(iter.next()).await
}

Essentially the alloca method but just require boxing (or some inderection) at await points. I think that's the placement new method others above have mentioned.

Specifically I think it requires:

The AsyncIterator trait object to contain the vtable for the returned future type, so that the caller knows its size, alignment, and poll method.
Something like unsized locals and placement new.

This allows the ability to do most (all?) of the patterns described, and to me is simple, transparent, easy to make perrormant, easy to produce understandable errors, and very similar to Boxing in terms of productivity. It also has the benefit of not using any particularly weird features, I'm worried a bit that Boxing would be quite magic and people would want to write similar but subtly different versions but not be able to.

Jules-Bertholet · September 19, 2022, 12:45am

Placement by return is basically this, with one complication. Function arguments are passed on the stack before a function is called, but you can't put unsized values on an async fn's stack. So instead, you pass in a closure to Box::new_with. new_with calls the closure and provides a place on the heap for the closure to return the unsized value (hence "placement by return"). The result looks like:

Box::new_with(|| iter.next()).await

JoJoJet · September 19, 2022, 1:52am

I feel this could be made more elegant by having Box::new lazily evaluate its argument. Something like

impl<T: ?Sized> Box<T> {
    pub fn new(lazy val: T) -> Self { ... }
}

Then it really could just be

Box::new(iter.next()).await

Jules-Bertholet · September 19, 2022, 2:03am

Actually, I suppose you could, they just can't be held across an await. Hmmm...

y86-dev · September 19, 2022, 9:38am

I have only been able to find this, was that the whole discussion?

I would like to understand why placement by return cannot be used here.

What follows is, I think, an exhaustive list of the various ways one might handle the situation.

I think it should be listed as an option.

PoignardAzur · September 19, 2022, 9:41am

Yeah, it was.

(I was going to go dig for it, thanks for saving me the time)

Skepfyr · September 19, 2022, 9:59am

Turns out this kinda already works (after enabling quite a lot of features), the only thing that's really missing is for dyn async traits to return dyn Futures. It would be quite cool for trait objects to return trait objects instead of associated types where possible, making a bunch of traits object safe.

y86-dev · September 19, 2022, 10:34am

So I found this document created in the zulip discussion posted above. It states:

async contexts, and generators in general, do not support unsized allocation on the stack (also known as alloca). This is because their stack values that exist across await points are pre-allocated. Futures being awaited always exist across await points, so this approach would not support stack allocation.

As @Jules-Bertholet already pointed out:

This is because the generator has two stacks, one normal function stack and one generator stack that is preserved across yield/await. The function stack is renewed every time. I think that the limitation then would just be "before the next yield/await, figure out where to store this". There would need to be some support for storing dyn Trait in fixed-size fields, example:

async fn print_all(iter: &mut dyn AsyncIterator<Item = String>) {
    while let Some(next: dyn Future<Output = String>) = iter.next() {
        if let next: { dyn Future<Output = String>; 24 } = next {
            println!("{}", next.await);
        } else {
            let string = Box::pin(next).await;
            println!("{string}");
        }
    }
}

Here { dyn Trait; $size } stands for a size capped dyn trait object. Using pattern matching one can assign dyn Trait to { dyn Trait; $size }. In code that does not care about this level of control, one can still use the Boxing wrapper:

async fn print_all(iter: &mut dyn AsyncIterator<Item = String>) {
    let iter = Boxing::new(iter);
    while let Some(next: Pin<Box<dyn Future<Output = String>>>) = iter.next() {
        println!("{}", next.await);
    }
}

Nemo157 · September 19, 2022, 10:47am

We already have multiple implementations of size-capped dyn trait objects, e.g. stack_dst::Value for just a fixed size allocation or smallbox for a version that automatically promotes to the heap when the size is exceeded. It seems plausible for these to support unsized-fn-params as a way to pass a bare dyn value in. (Or pretty trivial to write a SmallBoxing::<S16> adaptor similar to Boxing (if we look at a TAIT +GAT based approach rather than a dyn* one)).

PoignardAzur · September 19, 2022, 10:47am

Wait, no, I continued the discussion in another thread.

programmerjake · September 19, 2022, 10:52am

you ended up with a 9 instead of (

programmerjake · September 19, 2022, 11:02am

also the svg diagram is busted: Baby Steps

Topic		Replies	Views
Blog series: Dyn async in traits language design	36	5413	April 15, 2022
Async-traits - the less dynamic allocations edition	10	4610	January 1, 2021
When can we have async fn in trait? compiler	11	974	July 17, 2023
Support for heterogenous memory systems language design	12	1875	September 3, 2020
Hi, I was thinking if this thing could get better then Rust would really take the charm and be the one to choose: language design	6	479	March 21, 2025

Blog series: Dyn async in traits (continues)

Caller site Box caching

Box it as a default

General thoughts

Related topics

Caller site `Box` caching

`Box` it as a default