Auto-generate crate-agnostic anonoymous sub-traits for `dyn Foo + ...`

It is not possible to have a dynamic dispatched type implement multiple traits (except for markers) because it is a fat pointer of (*data, *vtable). The suggested workaround by the compiler is to create a super-trait that implements all the traits the dynamic dispatched type ought to implement, but I've always wondered why the compiler is not able to do it by itself. It seems trivial to me to turn something like:

struct Baz;

trait Foo {}
trait Bar {}

impl Foo for Baz;
impl Bar for Baz;

fn qux() -> Box<dyn Foo + Bar> {
    Box::new(Baz)
}

into:

struct Baz;

trait Foo {}
trait Bar {}

impl Foo for Baz;
impl Bar for Baz;

trait FooBarWithCompilerMagicOnTop: Foo + Bar {}

impl<T> FooBarWithCompilerMagicOnTop for T where T: Foo + Bar {}

fn qux() -> Box<dyn FooBarWithCompilerMagicOnTop> {
    Box::new(Baz)
}

When dealing with public traits in foreign crates, there will be name conflicts to resolve, sure; but isn't there a way? That's why FooBar has CompilerMagicOnTop. If two crates make the same dynamic trait "combination", we just transparently deal with it to make them interoperable (compiler magic), but when all the trait bounds are all local (i.e. from the same crate) it is very simple to codegen.

3 Likes

That's the hard bit, and why it's taking so long to get a performant design for this. Check out past discussions for details. If you have a new idea, it would be great to see it.

2 Likes

I will. I guess I should mainly search on the rust GH repo, right? Edit: Going through Zulip is probably a good idea too.

1 Like

I think I got it. This is not something easy, but we can solve this basically overnight with something like Higher-Kinded Traits. We just need to define a blanket impl generic over the trait, not just the type; something that is semantically equivalent to "The FooBarWithCompilerMagicOnTop is implemented for any type that implements a trait, that in turn, implements Foo + Bar".

The compiler then would codegen different traits inside every crate boundary, but they all would be compatible with each other thanks to that blanket impl. Then in the link or pre-link stage, we just clean up the mess of traits that we left, that is if we want to go with the easy solution for getting rid of the repeated traits (but it might take more compile time than if we just cleaned it all up during MIR normalization).

Syntax bikeshed #1

impl<T> FooBarWithCompilerMagicOnTop for T
where
    for<A: Foo + Bar> T: A
{}

Syntax bikeshed #2

impl<T> FooBarWithCompilerMagicOnTop for T
where
    for<trait A: Foo + Bar> T: A
{}

Syntax bikeshed #3

impl<trait T> FooBarWithCompilerMagicOnTop for T where T: Foo + Bar

The syntax bikesheds are ordered by my preference, from (IMO) better to worse.

Edit: I just realized that if this ever gets implemented, we basically get the last thing we need to turn Rust into a fully capable language of Functional Programming, having it be a Haskell that doesn't require a PhD in applied mathematics and allows mutation. I. WANT. THIS. NOW.

How would you're scheme handle function pointers, note that each function may come from different crates. For example

// crate A
fn get_stuff() -> fn() -> &(dyn Foo + Bar) {
    ...
}

// crate B
fn get_other_stuff() -> fn() -> &(dyn Foo + Bar) {
    ...
}

// crate C
fn use_stuff(b: bool) -> fn() -> &(dyn Foo + Bar) {
    if b { crate_a::get_stuff() }
    else { crate_b::get_other_stuff() }
}

The tricky bit here is we can't really inject any code in crate C to reconcile to two potentially different layouts for the vtable given by the other two crates.

We'd need to make vtable generation deterministic between compilation runs when the function pointers are the same and come from exactly the same traits. This is not currently guaranteed, but it is not that difficult to build some sort of deterministic, perfect function pointer sorter that would have every vtable be the same when the function pointers it implements are the same. The Rust ABI isn't stable, but we can still get away with this; if we were to link code generated on different compiler versions or runs, it'd still produce UB if it linked in the first place. I'm not talking about ABI stability, this is deterministic code generation.

To illustrate what I'm saying:

A vtable of functions

  • clone()
  • drop()
  • foo()
  • bar()

that comes from the specific traits (no less, no more) Clone + Drop + Foo + Bar, will produce the same vtable across all crates, by having a deterministic, perfect function pointer sorter.

Is this viable, or am I being stupid?

Not across multiple compiler runs, even within one run. Let me clarify the example. crate C depends on crate A and crate B, but A and B don't have any connection. Both A and B will try to generate a vtable, but each may pick a different layout. And crate C cannot in general reconcile this difference. So how would you ensure that the layout of &(dyn Foo + Bar) is consistent (including the vtable) across all crates and dependencies for a single cargo build

Thanks for the clarification, I actually had some issues understanding your example. What I'm trying to say is that when the compiler generates the vtable for the invisible sub-trait of Foo + Bar, we have to force it to always generate the same vtable (within a compiler run), without changes in the layout or the function pointer order; i.e., the vtable must be deterministic. However, if we add another trait bound to the dyn, it would become incompatible, because the layout isn't guaranteed to be the same anymore (the same would happen with the manual implementation). In other words, we can't downcast a dyn Foo + Bar into a dyn Foo or a dyn Bar, at least without magic. Maybe we can swap the vtable pointer, but I don't know to which extend that's viable.

For named super-traits this is an unstable feature now:

I would assume it should Just Work:tm: for anonymous supertraits too (EDIT: actually re-reading the description, I guess these are anonymous subtraits, Foo + Bar has the supertraits Foo and Bar).

2 Likes

I believe a better approach is to use "fatter" pointers. dyn Foo + Bar would be represented as (data_ptr, foo_vtable_ptr, bar_vtable_ptr) (it could use a separate optimization for marker traits). It immediately resolves all questions around deterministic compilation, potential bloat issues, casting between dyn A + B + C and dyn B, and it's easy to see the associated overhead from reading source code. If you do not want to pay for the additional pointer, you always can create a super trait.

That doesn't really scale at all. Probably I've not expressed myself clearly enough at the start, but one of the majors points of this feature, which I'm now renaming to anonymous sub-traits, is the fact that it allows you to make code across the crate border work flawlessly without worrying about implementing all different sub-traits that mean the same under the hood. Fatter pointers are a sub-par solution that would force this feature to be turned back into a manual sub-trait when the requirements of a project eventually and inevitable increase, creating an API change that is also inevitably a breaking one. It'd be like a ticking bomb waiting to explode; doing it this way would be like setting up a technological debt trap in disguise as a nicer syntax.

That's great actually, seems like the only real blocker is implementing the beast on its own that Higher-Kinded Traits (a.k.a. Ruskell) is. I will try to get the next T-lang meeting to discuss it, I'll see if I can get there through Zulip.

In my practice I very rarely need more than 2-3 combined traits used with dynamic dispatch. If your project needs "increase", I highly doubt you will work with abominations like dyn Foo + Bar + Baz + Zoo + Zaa in your code and will introduce a wrapper around the combination either way. And note that I am not saying you have to implement super-trait wrappers manually, hopefully, we will get trait aliases eventually, which will allow you to write something like this:

trait MySuperTrait =  Foo + Bar + Baz + Zoo + Zaa;

// `dyn MySuperTrait` will use 2 pointers under the hood, not 6
fn f(a: &dyn MySuperTrait) { ... }

You are being overly dramatic. Syntax-wise there is absolutely no difference, the only difference is how the feature is implemented under the hood and associated technical trade-offs.

With nicer syntax, I meant eliding the manual sub-trait, but you're on point with the fact that this is all an implementation detail. The problem is that ideally we'd want them to behave the same way (the best one), so that a manual sub-trait isn't ever necessary; ideally we wouldn't want to make any trade-offs. This feature is as much about crate interoperability as it is about saving a few lines of code, and I don't think that we should throw you under the bus with the one feature that's not just cosmetic because your requirements are odd. If you look at Tokio for example, some of their dynamic dispatched traits are massive. I still would consider your approach because we avoid Higher-Kinded Traits, though.

This is still a great idea and on the table, but it still doesn't scale, and I'd see it as an antipattern in disguise; the price you're paying for every trait that you're adding is enormous, because someday someone will add another trait thinking that the cost grows lineally (which in memory, it does), but suddenly, your dyn pointers will stop fitting into the registers, and that's not a linear cost to pay when it comes to performance. Dyn is slow enough, and I wouldn't want to add something like this when we can do better.

"The best way" implemented implicitly has very non-trivial consequences for compiler. The most basic one: is dyn A+B used in one crate is the same as dyn A+B from another crate? What about dyn A+B and dyn B+A? Think carefully. For a naive implementation of your proposal the answer is no, since both crates can be compiled in parallel, so both will create separate anonymous wrapper traits.

How is it different from adding a field to a struct passed by value? Everyone who cares about stack spilling will know about associated costs. Again, I highly doubt that in practice people will use more than 2-3 traits in one dyn combination, simply because it makes signatures unwieldy.

I will link you both of my responses to that, as it was said earlier, but TL;DR, we only have to make it deterministic between compiler runs, because our ABI isn't stable.

It isn't, but that is not my point. My point is that we can make these extra trait bounds codegen to be as fast as a manual implementation, which is what Rust always has done (Zero-Cost Abstractions), or we can take the easy route. I don't like the precedent of taking this approach, because this wouldn't be a ZCA over manual sub-traits (we wouldn't be creating them in the first place, I know, they would be different things).

It's not about ABI, it's about addresses of generated vtables. dyn A+B from different crates will point to different vtables with the same function pointers inside. De-duplication of those vtables will require either non-trivial changes to linking, or will undermine parallel compilation of crates. And I haven't even touched issues with downcasting dyn A+B to dyn A or dyn B.

It's a common misunderstanding of the ZCA principle. Under your interpretation you proposal is not zero cost as well, at the very least you generate additional vtables which bloat resulting binary. I can name a number of Rust features which are not "zero-cost" under such interpretation (e.g. there is triple indirection with boxed futures). ZCA is about not paying for features you do not use. If you don't explicitly use dynamic trait dispatch combinations, you will not pay for the fatter pointers.

Combination of the fatter pointers and trait aliases give explicit choice to programmers. Arguably, it is more in spirit of Rust as a system programming language.

I said at the start of the discussion that we can deal with that while linking or even during MIR normalization, and I would like more insight on the matter. Would the changes be that dramatic that they're a no-go from the beginning?

I already said that it's problematic, and IMO we really should get someone from T-compiler to explain us our possibilities here, if any. I would like to hear your take, though.

Touché, I will give you that.

1 Like

This is true with named traits today, you can trivially get duplicate vtables for the exact same &() as &dyn Debug expression compiled in different crates.

Shouldn't this already be supported by linkers? After all, they need to merge definitions for C++ to comply with ODR. How hard would it be for rust to use the same mechanism?