Problems with "dyn* Trait"

Isn't erasing information of whether something is a pointer bad? What it has to do with pinning? Like, Pin<dyn* Future> constructed from SmallBox is correct only if a future is big enough.

Given that, the core idea is cool, but topics of smallbox optimization, and trait views are really separate and deserve their own feature proposals.

This goal can be achieved even by a library abstraction: Box<dyn Trait,&dyn Allocator> is what gives everything: deref_move + ownership + generality over storage types.

What we really want here is a nice syntax for above.

The conversion would instead be Pin<SmallBox<dyn Future>> -> Pin<dyn* Future>. SmallBox::pin would only succeed if T: Unpin (an inline pin is fine) or T is outlined.

With a full dyn* with vtable wrapping, this can even be handled automatically; when creating Pin<dyn* Future> from T: Future, it's only a candidate for inlining if T: Unpin holds, otherwise it's always boxed.

The problem is that that's a whole 4×usize big ((*mut (), <dyn Trait as Pointee>::Metadata, *mut (), <dyn Allocator as Pointee>::Metadata)). (The Allocator API also cannot support small box optimization.) The advantage of dyn* is that it can take advantage of restrictions (that the pointee type is dyn Trait and the allocator is a ZST) to provide a 2×usize storage erased pointer.

My library prototype using the storage API proposal is now complete enough to provide a prototype dyn* (with some limitations due to lack of compiler support).. I recommend reading the implementation's header comment if you're interested in dyn* like functionality; it assumes you know what the Storage trait is, but otherwise covers exactly what is done to provide a storage erased pointer in just 2×usize (and discusses larger solutions as well, along the way).

4 Likes

Hmm, so I don't think that I'm focusing on returning trait objects, but I am focusing on something -- specifically, I'm trying to push to a design it is very easy to convert uses of impl Trait into dynamic dispatch via dyn* Trait. So for example I would like to be able to take code like

fn with_callback(x: impl FnOnce(u32))

and convert it into dyn* FnOnce(u32) (which would work just fine, both inside and outside of a trait).

I think what you are hitting on @withoutboats is that Rust currently has two active conventions:

  • Using pointer types that give restricted access, e.g., &T or Rc<T>, to capture properties like "a shared view on this value.
  • Passing things by value and using trait bounds to abstract over things.

There is definitely tension, both ergonomic and otherwise, between these two patterns. A key example would be hashmap.get(&22). In the design of hashmap, we lean on & to say "shared view to the key". But that is tied to introducing a pointer, which isn't necessarily what you want (it's not particularly useful when the value is a u32).

The intent of the dyn syntax today was to fit well with that first pattern, but my observation has been that it doesn't work that well in practice. This is for a few reasons, but perhaps the biggest is that people will rarely write Box<T> to express ownership, but instead prefer just T: often this is in the form of a function that takes an argument like f: impl Fn() or whatever. In these instances, converting to f: &dyn Fn() is an incompatible change that introduces an ergonomic hurdle to callers to boot (you can't write foo(|| ...) you have to write foo(&|| ...)).

To put it another way, in my head the "classic 3 modes" of Rust are T, &T, and &mut T, but if T is a dyntype, it doesn't fit that model -- it requires a fourth mode, Box<T>.

The Fn traits are an example of the second pattern, actually, in that we don't have like one trait and use & and &mut to "select" from it, but instead we have a few traits. This is because many closures can only implement some subset of the functionality. It's interesting to note that if we had the trait views feature that @tmandry and I were discussing, it's possible we could have just one Function trait with three methods (self, &mut self, and &self) and instead have some closures that only implement &Function`.

I definitely use both modalities, but over time I have found myself moving more and more towards taking a value of type T with suitable bounds rather than taking values like &T. This tends to be more ergonomic on my callers, for example, and it's also more flexible: if I have a impl Clone + Deref<T>, that can be an Rc, an Arc, or an &T, which is pretty nice. (I am not saying I wrote that particular combination a lot, it comes up but rarely I guess.)

I think what might be helpful is to categorize and look at the ways that dyn trait are used and see how well they fit each form and how they would be managed. I'll take a stab at that later today, perhaps, but off the top of my head here are a few obvious patterns:

  • As a parameter that doesn't escape a fn (e.g., the dynamic equivalent of fn foo(x: impl Foo) multiple inputs[1].
  • As a return value from a function (e.g., the dynamic equivalent of -> impl Foo), which is of course the async fn use case -- but it's also very commonly desired in other traits, like Iterator adapters.[2]
  • As a "context object" that is threaded around all over the place -- this is basically a way to achieve dependency injection. For example, a lot of code within AWS uses trait objects to control the access to the network, so that they can inject a faulty network connection during testing that causes random failures all over the place.
    • I'd have to go look, but I suspect that most of the code that does this uses Arc<dyn Network + Send> today, because you want to be able to close the data -- but I suspect that doing trait Network: Clone + Send and dyn* Network would be just as good, if not better.
    • That said, I can imagine code wanting to start out with unique access to the dyn Network that is later shared. That's not particularly easy to do in today's system, though! You could take a Box<dyn Network>, but then later you have to pass around a Arc<Box<dyn Network>> -- not great. dyn* would make that solution more ergonomic, but what you really want is a type like rc-box, where you distinguish "known to be 1 ref count" and allow that to be converted to "maybe N". I think that would work be totally nicely doable with dyn*.

So, I guess the question is, what are the other use cases, where dyn* would not work nicely? I don't doubt they exist, but I don't know what they are off the top of my head.

Potential tangent ahead: The other trend I've found is that if I am going to be parameterizing a lot of things by some borrowed data, I will generally avoid using a large lifetime (like the compiler's 'tcx) and instead prefer to thread a generic type T around. I find it easier to think about as a code author, and it's of course more flexible too, since I can add associated types and functions. The primary downside is that it results in monomorphization. I'd like to see us have some kind of erased T type parameters for this, but I think maybe I mentioned that already?


  1. It's interesting to note that in fn foo(x: impl Debug), it is totally possible for x to include borrowed data (this is not the case for x: Box<dyn Debug>, even though that always takes "by value"). I have thoughts on this but they're for another day. ↩︎

  2. Interesting to note here that -> impl Foo is (a) explicit about what lifetimes it captures, which is good for dyn but also (b) "leaks" things like Send, which dyn cannot (so easily) do. I think we could use some of the capabilities that a-mir-formality offers to address this in interesting ways, but that will have to wait for another post too... ↩︎

3 Likes

This bit here could be received as an accusation. It would be more amiable to use a softer qualitative. eg "my sense is that you may have undervalued how the churn cost [...]". It's always risky to make assertions about others' internal processes, so it's best to give as much benefit of the doubt in your phrasing.

Couldn't you satisfy that use case with unsized fn params?

dyn* in argument position is kind of like unsized function parameters, yes.

At least AIUI, unsized function parameters (without full unsized locals, at least) effectively introduce an implicit &move to the parameter, as the parameter is always passed-by-reference.

I think the only difference between dyn* in arguments and unsized dyn in arguments is with things like Box<T>:

Where Box<T>: Trait, f(boxed) with dyn* would pass the box pointer, whereas dyn would pass a reference to the box.

Where T: Trait but Box<T>: ?Trait, f(boxed) with dyn*would just work, whereasdynwould requiref(*boxed)`.

This applies equally to other reference types, so long as no special autoderef rules are introduced.

2 Likes

I think I agree with most of what your post is saying, but this gets the trait relation backwards. In trait views, each "view" is effectively a supertrait of the original trait because it provides a subset of the trait's functionality. But FnOnce and FnMut are the supertraits in our hierarchy. &Function (which corresponds to Fn) can't be a supertrait.

In other words, in your hypothetical Function trait every type implementing it would have to implement it in terms of &self, which makes it useless for representing FnOnce and FnMut.

I suppose it might be interesting to have trait views that support only self or only self and &mut self methods, but I don't know how to spell that.

1 Like

Unsized values run into a compatibility hazard. You cannot freely convert between dyn Foo and impl Foo because of the implicit Sized bound on impl Foo.

For example, let's say you do the conversion Niko was talking about using an unsized param and now your function signature looks like this:

fn with_callback(x: dyn FnOnce(u32))

Now what if you have to pass that to another function that takes impl FnOnce(u32)? You'd have to change that function signature too, either to take unsized dyn or impl FnOnce(u32) + ?Sized. This is obviously a problem if you don't control the function signature.

If the callback gets stuck in a struct you're also in trouble; you either have to make it the last field or box it at that point.

Of course, I think it's incumbent on the dyn* design to show that it can interoperate cleanly with code that was written using dyn if it's going to effectively replace it. It isn't clear to me that this is the case.

Not necessarily; you could pass along &move dyn FnOnce(u32) (presuming we have surface level &move as well). There could even be an implicit coercion from stack dyn Trait to &move dyn Trait if desired (since the two are basically the same (without unsized locals)).

The point stands for placing the value in a struct, though, as using &move dyn Trait introduces a lifetime bound not there if you were dealing in impl Trait.

1 Like

I have no idea what's the code in AWS is like but from just reading the above:

  1. Rust isn't Java. In Java DI and programming to (dynamic) interfaces is the industry best standard partly also due to this being built into the mechanics of the language which hard-codes certain trade-offs such as a global GC. Rust makes these kinds of trade-offs explicit by the user.
  2. Potential performance cliffs - the user may want to disallow a Box -> Arc conversion in some scenarios when allocating/dropping a Box in costly.

putting aside the specifics of dyn Trait, Rust is a systems PL which has a guiding principle to give the user more fine-grained control over trade-offs involving performance and lower level details such as memory allocation. Ergonomics should be taken to help readability, not save on keystrokes.

Rust is more complex than say Java, and that's okay given it's essential complexity to fulfil the systems' level requirements. E.g we already have both references and pointers in the language: &T, &mut T, *mut T, *const T, etc (with lifetimes).

Therefore the discussion about ergonomics and changing the semantics of dyn T makes no sense to me. For one, as @withoutboats said, this has a huge cost to the language in churn, not to mention it violates the stability guarantee. We just need to fill a gap in the surface language to make dynamic/erased types consistent with the above.

1 Like

Won't such all-allowing owning pointer will be expected to be taller? Also:

  • we can get rid of Allocator VTable pointer and replace it by a direct pointer to it's dealloc function - it's the only thing we do with allocator here afterall. => -1 level of indirection
  • also, if we do a dyn* kind without replacing current dyn, we can go for a C++ solution where vtable is placed right before the data => -1 pointer here / Downside is that coercion to dyn* will be expensive then, but what if we do that for allocator ptr + dealloc routine?.

My feeling is that 4 pointer wide data structure isn't unacceptably expensive.

It's not required for dyn Trait pointees; 2×usize is possible, and the only cost is statically providing a vtable with dealloc appended.

The linked PR gets 90% of the way there in purely library code. Again, I'll heavily recommend reading the write-up comment on how it functions.

This is probably true, but it isn't zero cost (I can and did write it better by hand).

It won't allow this:

fn fun(a: &dyn Allocator) -> dyn* Future<Output=usize> {
   Box::new_in(...,a)
}

Which is beneficial for #[no_std] enviroment: it allows async in traits when no global allocator exists. IOW, it, as you have noted, works only with ZST allocators.

The only thing that can make dyn zero-cost is devirtualzation =\.


Can we reify inline storage optimization as a mechanism for a caller to also provide inline storage to trait methods, so that their impl decide how they return?

  • this is a form small size optimization
  • this allows the compiler to try to return inline, stack value.

But this requires caller to provide a (size of) return slot:

  • a syntax for this?
  • euristic for slot size determination? precise algorithm?
  • maybe an attribute for binding, like this takes up to 64 bytes on stack?

Finnaly, we can take dyn* extreme from the end of the post, and go for dyn*<const STACK_STORAGE_SIZE: usize> with a minimum being set to 0(words) and the default to, say, 2?
Then, the traits themself can allow threading this size from the caller (would object safety require GADTs in this case???).

My prototype also does so, just with &mut [MaybeUninit<u8>] as the backing storage, not &dyn Allocator.

99% of the time, when I say "zero cost", I mean "zero overhead." Obviously, dyn opts into dynamic dispatch. A zero cost dyn* adds no overhead over what's strictly required to support storage-erased pointers to trait objects. For the case of ZST allocators, zero overhead is vtable wrapping to add a dealloc to the vtable.


So the conceit here is whether ZST allocators (or more specifically, ZST deallocators) is "enough" for dyn*. The original vision Niko drafted had this as a restriction, and I copied it. There is a case to be made for supporting full &dyn Allocator, though; this requires no further compiler magic, and trivially allows 3×usize small box optimization. This also makes the #[no_std] async fn in traits story simpler, as they just need to take a &dyn Allocator parameter and forward that through... though doing so precludes the allocation being trivially devirtualized to use Global (deallocation is always virtual).

Note again that my prototype, while it doesn't support &dyn Allocator, does support #[no_std] async fn in traits that provide Dyn<'a, Future> by the caller allocating stack space which is then borrowed.

So I think the real lede here is: what does we actually want #[no_std] (and yes std) async fn in traits to look like?

What we want from that obviously then dictates what dyn* needs to do, since dyn*'s primary purpose is to enable async fn in traits.

Also, a super simple thing I can't believe I've overload so far:

If dyn* uses the 4×usize layout and &dyn Allocator... it's literally just Box<dyn Trait, &dyn Allocator>. It even works today!

3 Likes

One last thing I want to log while I'm thinking about it: ideally, whatever dyn* and async fn in traits ends up looking like, it should support stack pinning. I should be able to give it N bytes and if the future fits in there, it should go in there and not require outlining. (What happens when it doesn't fit? I don't care for this hypothetical.)

All proposals thus need to be careful to guarantee that the owner can't be leaked. I've been focused on the storage/allocator side, as that's where I have prior experience, but supporting stack pinning is crucial, and can't be provided by &dyn '_ + Allocator.

1 Like

Outlining)

If we follow the logic that dyn* is owning pointer, then it should have the same relations with pinning that other pointers have: pinned => not movable; not pinned => movable; safety is ensured by contract of Pin::new_unchecked.

I was about that introducing 2 additional pointers instead of modifying vtables + not supporting non-ZST allocators is good tradeoff.


So, the const generic parameter is about small storage size in words, but making it to count bytes would be better.

In theory, traits using this can be generic over the size of small storage and still be object safe:

trait ObjSafe {
    fn method<const SIZE: usize>(&self) -> dyn*<SIZE> Trait; //note the form
}

As written, ObjSafe::method can only be called with statically known SIZE => sufficient stack space can be prepared for any possible call.

Any method using this recieves a pointer to return slot and concrete value of SIZE as additional arguments, so that the caller can benefit from small storage optimization whenever possible.

Edit: What's about UI of called method?
How unsizing to a dyn*<_> Trait should look like, if the target data is small enough?
Like, the following:

fn demo() -> dyn*<8> Index<usize> {
    Box::new([0;5])
}

is not going to actually allocate...

Problem: coercions of dyn*<N> to dyn*<M> where M < N will cause outlining: where do we allocate memory for this?

  • In the same allocator? - Yes, but this disallow elegant relocating objects. The case:
fn ret_dyn<'a>(alloc: &dyn Allocator) -> dyn*<8> Trait {...}

fn proxy() -> dyn*<4> Trait {
   let arena = ArenaAlloc::new(256);
   ret_dyn(&arena) // there, outlining should happen, however, which allocator to use
}
  • In another one? - How to supply it?
  • In global one? - Unfortunate, and not always available.

Extra question: whats about alloca?

We can add associated constructors:

impl<trait Tr,const SIZE: usize> dyn*<SIZE> Tr {
     /// this function perform relocation of an object into a certain allocator, with a certain size
     fn relocate<const OLD_SIZE: usize>(obj: dyn*<OLD_SIZE> Tr,place: &dyn Allocator) -> dyn*<SIZE> Tr;
          //where OLD_SIZE > SIZE // - we can add this, because if this conditions doesn't hold that operation is strictly unnecessary
     /// explicit way of construction, see below
     fn new<T: Tr,B: CoerceDynStar<SIZE,Tr> >(val: T,f: impl FnOnce(T)->B) -> dyn*<SIZE> Tr; 
}

that solve relocation case I've described above, add a stable and explicit way of creating such values.

The implicit creation mechanism should be similar to the current, so a coercion will be good enough.

The coercion I imagine should be zero cost:

the trait:

trait CoerceDynStar<const SIZE: usize,trait Tr>: Tr {
     /// transform the data, or it's container into dyn star type
     fn coerce(self) -> dyn*<SIZE> Tr;
     /// get a reference to an allocator*
     fn allocator(&self) -> Option<&dyn Allocator>;
}

* the allocator function returns only a two word fat pointer - it's None when data pointer is null.

An impl for all sized types:

/// This one is derived by the compiler
/// `allocator` returns `None`
impl<T> CoerceDynStar<const SIZE: usize,trait Tr> for T
     where T: Tr, {size_of::<T>() <= SIZE } {...}

And impls for other types that can be coerced into a dyn star, like Box are provided separately.

The purpose of dyn*::new constructor is to rescue from problems like: "the coercion stopped working when my future exceeded small storage of N bytes in dyn*<N> Future type " - one can simply use the constructor indicating how outlining can happen.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.