Blog series: Dyn async in traits

I’ve been posting a bunch of blog posts on dyn async in traits. I just realized I never made an internals thread to discuss them! Fixed.

24 Likes

I really like the direction taken here.

Firstly, these core capabilities would complement Rust's introspection capabilities, especially if these APIs are const. For example, some of the DI crates that aim at compile-time injection use a dyn Trait type to stand in for the trait since the trait itself isn't a type.

Secondly (a minor point) - instead of hard-coding Box which was used here for simplicity sake, we should be able to generalise to any smart pointer since we know its type at the point of creating the dyn Trait instance. For example:

let object: Box<dyn Trait> =  foo as Box<dyn Trait>; // using Box here

I'd imagine that this as "call" should conceptually de-suggar to the From impl Niko has described with an additional higher-kinded type "Box".

I think (with obviously some bias) the series misses something by not mentioning the placement return RFC.

I'm less confident than I was in the proposed implementation in the RFC (running arbitrary code to compute the size before allocating it), but the "return impl Trait in trait method" use case is actually simpler.

Say you have this:

trait Foo {
    fn get_bar() -> impl Bar;
}

Any type implementing Foo could, in principle, add the size of its version of impl Bar in its vtable. With that information, any code calling <dyn Foo>::get_bar() could look up the size of the return value in the vtable, allocate that size on the stack or in a box, then call get_bar().

No need for imposed boxing.

2 Likes

I actually had the same thought about return placement. Also it seems that in the async iterator case the size must be the same for each call to the async next so the space probably can be reused.

1 Like

On the note of

Because of the rules introduced by RFC 1214, the &’static DynAsyncIterVtable<T::Item> type requires that T::Item: 'static , which may not be true here. This condition perhaps shouldn’t be necessary, but the compiler currently enforces it.

You can work around this with a less type-safe RawVTable (akin to RawWaker). See an example here for how to implement it for Iterator on stable Rust

Note that you also need to make it aligned. This is possible with a Box, but LLVM doesn't seem to support dynamic alignments for allocas.

And anyway the unsized-rvalues RFC seems more relevant here.

It's possible to implement part 5 manually without unsafe right now, except AFAIK you can't specify the type of a GAT right now, so dropping the GATs you get this playground. The important step is inserting a newtype that can do the shimming of the returned future into a Box for you. (And this can be nested to get any number of associated type levels converted into dynamic types).

1 Like

In part 4 you say: "Constructing the vtable: Async functions need a shim to return a Box". I don't think this shim is actually necessary. The vtable could store a vtable representing the return type. If you then force the function to return the value using a return pointer it becomes possible for impl Trait for dyn Trait impl to allocate a correctly sized box using the vtable of the return type and then pass a pointer to the box contents as return pointer. This approach should also work for other smart pointers like Rc or Arc.

Example

Given

trait AsyncIter {
    type Item;

    type Next<'me>: Future<Output = Self::Item> + 'me;
    fn next(&mut self) -> Self::Next<'_>;
}

force it to be equivalent to

trait AsyncIter {
    type Item;

    type Next<'me>: Future<Output = Self::Item> + 'me;
    fn next(ret_ptr: &out Self::Next<'_>, &mut self);
}

as would already be the case if Next is too big to pass in two registers. (this may need a tiny shim if this abi wasn't chosen anyway) Then the impl AsyncIter for dyn AsyncIter would be possible as

impl<I> AsyncIter for dyn AsyncIter<Item = I> {
    type Item = I;

    type Next<'me> = Box<dyn Future<Output = Option<I>> + ‘me>;
    fn next(&mut self) -> Self::Next<'_> {
        type RuntimeType = ();
        type NextType = ();
        let data_pointer: *mut RuntimeType = self as *mut ();
        let vtable: DynMetadata = ptr::metadata(self);
        let ret_val = std::alloc::alloc(Layout::new(vtable.get_next_vtable().size(), vtable.get_next_vtable().align()));
        // omitted oom handling
        let fn_pointer: fn(*mut NextType, *mut RuntimeType) = associated_fn::<AsyncIter::next>();
        fn_pointer(ret_val, data);
        Box::from_raw_parts(ret_val, vtable.get_next_vtable())
    }
}

There's an additional part 6 published on Niko's blog not mentioned in the list above where Niko has another solution without unsafe based on serde-erased.

https://smallcultfollowing.com/babysteps//blog/2021/10/15/dyn-async-traits-part-6/

Putting async aside for a moment, i'd like to consider the more general problem of something like

trait Foo {
    fn bar(&self) -> impl SomeTrait;
}

This is not supported today, but shouldn't it be rather easy to implement when doing static dispatch? ( (it is not really different of an associated function, is it?)

And for the case of dyn Foo, why can't the vtable contains two entries (slots) for such function: the function pointer, and a pointer to the vtable of the returned trait (SomeTrait vtable).

impl Foo for MyStruct {
   fn bar(&self) -> impl SomeTrait {
       struct Blah;
       impl SomeTrait for Blah { ... }
       return Blah;
   }
}

Then the Foo vtable for MyStruct would have two entries for the bar function: the MyStruct::bar function pointer, and a pointer to the SomeTrait vtable for Blah.

Since the vtable contains the layout of the returned object, this is enough information so that the compiler can generate code that allocates a return slot of the right layout with the method of its choice (alloca or heap allocation)

For example, code like this

fn bar(my_foo : &dyn Foo) {
   let xxx = my_foo.bar();
   xxx.some_trait_fn();
}

Would be de-sugared to something like this: (psuedo-code)

fn bar(my_foo: &dyn Foo) {
  let my_foo_data = my_foo as *const ();
  let my_foo_vtable = ptr::metadata(my_foo);
  let bar_fn_pointer: fn(*const (), *mut ())  =  __get_bar_fn_pointer__(my_foo_vtable);
  let bar_vtable: DynMetadata<SomeTrait> =  __get_bar_vtable__(my_foo_vtable);
  let xxx_ptr = SomeAllocator::allocate(bar_vtable.layout());
  bar_fn_pointer(my_foo_data, xxx_ptr);  // xxx_ptr is the address of the return slot
  let xxx : *const dyn SomeTrait = ptr::from_raw_parts(xxx_ptr, bar_vtable);

  // this desugaring is left as an exercise to the reader
  (*xxx).some_trait_fn();

  drop_in_place::<dyn SomeTrait>(xxx);
  SomeAllocator::free(xxx_ptr, bar_vtable.layout())
}

Did I miss something?

2 Likes

How is this implemented? Is it an hidden implicit heap allocation? What about no_std support? Or is it a dynamic stack allocation? This would mean it would be blocked by the unsized_locals feature, which in turn seems to be block on many technical issues.

SomeAllocator::allocate(bar_vtable.layout())

How is this implemented?

Eventually, it should be using the unsized_locals. In the mean time it could be using some runtime function like the global allocator or a specific simpler allocator registered by the user. (something like a bumping allocator could be used since the allocation are always stacked)

I realize that without unsized_locals it is hard to move or do much with the object appart from using reference to it.

unsized_locals can't handle returning unsized values. There is simply no ABI that allows it.

Implicit allocations are a bad thing IMO. In addition that doesn't solve the issue for #![no_std] where there may be no global allocator at all.

unsized_locals can't handle returning unsized values. There is simply no ABI that allows it.

In my pseudo code, i added a pointer to the return slot in the function signature.

Implicit allocations are a bad thing IMO.

Yeah, it is not ideal.

Another thing is that unsized_locals unfortunately don't work accross .await, which was the whole point to support async function. So this makes it extra hard to work without allocations

In addition that doesn't solve the issue for #![no_std] where there may be no global allocator at all.

This must either be disabled for no_std, or require the user to register an allocator. (just like no_std program also need to register a panic handler). We can enable it for no_std later. Just like async did not work with no_std at the beginning because of the TLS usage.

The placement by return RFC seems very relevant to current discussion. IIRC it offered a state machine transform where the first part of the call gives the desired Layout and the second does the emplacement, allowing for unsized return values with e.g. Box::emplace(make_thing).

This can also work in #[no_std] if they have an allocator (e.g. bump allocator). It cannot work without a dynamic allocator (alloca is a dynamic allocator).

2 Likes

It's worth noting that there are certain industrial standards that forbid dynamic allocation all-together, which definitely includes alloca.

It would be nice for the design here not to rule these sorts of use cases out, given that Rust currently otherwise supports them very well.

I don't think any -> T where T: ?Sized can work without dynamic allocation of some kind.

But note that I consider grabbing a static sized amount of stack space, then e.g. bump allocating a single object in that space dynamic allocation, because you still have to dynamically align your value within the reserved space.

If you set both a maximum size and a maximum alignment, you could reserve that much space at that alignment on the stack ("Box<?, ArrayStorage<#[repr(align(M))] [u8; N]>>") and emplace into that, and there'd be no dynamic memory management involved. I doubt that Rust would want to put a maximum size/align on anything beyond the platform/isize::MAX limit that already exists; it'd have to be checked dynamically by the Box::[try_]emplace operation's "allocation" failing.

(And yes I'm heavily in favor of making Box<_, ArrayStorage<Size, Align>> work someday for inline statically-allocated boxes.)

I suppose this doesn't count as dynamic allocation, even by my strict standards where userspace alignment is dynamic memory allocation.