Add trait objects with static dispatch

While working on my Rust project, I encountered the need to both store objects that implement the same trait in one collection, and the need to return one of several different iterators from one function, depending on some conditions.

All these problems are solved trivially using Box<dyn Trait>. However, such a dispatch has its drawbacks, the main one being the need to use vtable, which has a very bad effect on inlining and can significantly slow down the program compared to using static dispatch.

Unfortunately, at the moment static dispatch is achieved by explicitly creating enum types, either with pasting boilerplate while impl necessary traits, or by using procedural macros that always have limited flexibilty (i.e. the enum_dispatch crate).

However, this approach cannot be considered universal, since the explicit creation of such enums is difficult for iterator types due to the complexity of their type denotation, and generally impossible for non-denotable closure types.

By creating this topic, I propose to think about adding anonymous trait objects with static dispatch.

Possible notation

// As a parameter
fn foo(_: static Trait) {
	/* Body omitted */
}

// As a return type
fn bar() -> static Trait {
	/* Body omitted */
}

// As a template parameter
fn baz() {
   let vec = Vec::<static Trait>::new();
}

Examples

The following code snippets show the possible examples of their notation and their use.

Example 1

struct A { /* fields omitted */ }
struct B { /* fields omitted */ }

trait Trait { /* body omitted */ }

impl Trait for A { /* impls omitted */ }
impl Trait for B { /* impls omitted */ }

fn main() {
    let mut vec = Vec::<static Trait>::new();
    vec.push(A::new());
    vec.push(B::new())
}

Example 2

As of today, the following code won't compile without Box<dyn Iterator<Item=u64>>.

fn different_iterators(cond: bool) -> static Iterator<Item=u64> {
    if cond {
        [2].into_iter()
    } else {
        [3, 4].into_iter()
    }
}

Possible compiler representation

As a possible representation of statically dispatched trait objects, I suggest using an enum automatically generated by the compiler, each variant of which can store one of the concrete types that are defined from the context of use of the static Trait. This enum should automatically derive the Trait trait.

Pros

Because of inlining, calling the Trait methods are usually faster when using match to unpack such an enum, as opposed to using a vtable.

Cons

Such an enum could consume much more memory than dynamically dispatched trait object, since its size is the sum of the size of the largest type from the context and additional bytes that store information about the concrete type stored in it.

4 Likes

How would that work in libraries? The compiler cannot automatically generate an enum if it doesn't know all trait implementers, so that enum generation would have to be delayed to the very last stages when the final binary is being emitted.

1 Like

Also, just adding a cargo dependency would potentially change the size of the generated enum.

My proposal is not to generate one enum per Trait for the whole project, but generate different enums: one per usage site, and to define the number of variants in them based on the context of use.

Let's consider we have 3 different structs that implement some trait:

struct A { /* fields omitted */ }
struct B { /* fields omitted */ }
struct C { /* fields omitted */ }

trait Trait { /* body omitted */ }

impl Trait for A { /* impls omitted */ }
impl Trait for B { /* impls omitted */ }
impl Trait for C { /* impls omitted */ }

fn main() {
    let mut foo = Vec::<static Trait>::new();
    // static Trait is being translated into enum with 2 variants (A, B)
    foo.push(A::new());
    foo.push(B::new());

    let mut bar = Vec::<static Trait>::new();
    // static Trait is being translated into enum with 3 variants (A, B, C)
    bar.push(A::new());
    bar.push(B::new());
    bar.push(C::new())
}

Here, foo's inner type would be an enum with 2 variants (A and B) and bar's one would have 3 variants (A, B and C). Because of the context.

So the types of foo and bar just look the same to the programmer but are fundamentally different, kind of like existential impl Trait? Or would adding a call like std::mem::swap(&mut foo, &mut bar) cause the types to unify and make the underlying type of foo have three variants (A, B, C)?

Either way, this seems very complicated to me.

2 Likes

Yes, it is something like impl Trait.

Or would adding a call like std::mem::swap(&mut foo, &mut bar) cause the types to unify and make the underlying type of foo have three variants (A, B, C)?

No, it should not compile since vectors have different types.

If that doesn't compile then how does foo.push(A::new()) compile? The call to Vec::push introduces a new variant to the enum, why can't std::mem::swap do that? Both take the vec by &mut.

1 Like

Hm.. You are right. This should lead to a type expansion.

With a quick glance, I see 2 possible solutions:

  1. Prohibit (or vice versa — allow) expansion of a static trait with other static traits. In the latter case, this can lead to a code bloat. On the other hand, a similar mechanism occurs in error handling.

  2. Or add a type to a static trait type if and only if there was an explicit cast to a static trait. Something like this:

fn main() {
    let mut foo = Vec::<static Trait>::new();
    // static Trait is being translated into enum with 2 variants (A, B)
    foo.push(A::new() as static);
    foo.push(B::new() as static);

    let mut bar = Vec::<static Trait>::new();
    // static Trait is being translated into enum with 3 variants (A, B, C)
    bar.push(A::new() as static);
    bar.push(B::new() as static);
    bar.push(C::new() as static)
}

Be that as it may, I do not pretend that my examples are something complete.

If we assume a compiler that is able to automatically determine all possible concrete implementations of a trait at a particular call site, could that compiler use the information to automatically optimize a dyn Trait dispatch into something equivalent to a match (over the type id) and thus achieve a similar benefit without any new syntax?

To me, that analysis (what is the exhaustive set of types possible here) seems like the lynchpin of this idea, but also the part hardest to specify. If it were instead conceived as an optimization done automatically by the compiler then that analysis could potentially be gradually improved over time without any effect on the language specification, rather than having to nail down exactly what combinations are possible right up front.

(I haven't thought exhaustively about whether such an analysis would be possible and whether the optimization would be productive if so, but I'm assuming both to be true for the purpose of this question because it seems like these assumptions are consistent with those of your proposal.)

2 Likes

Devirtualization is in fact an optimization that compilers apply today. I believe these passes typically only apply when devirtualizing a single virtualized implementation, and not a set of possible implementations, but if optimizers don't do the latter, they can theoretically be taught how and when to do so.

This feature has been discussed before, typically as enum impl Trait. I believe it's only been proposed in return position before, but it would theoretically be usable anywhere that impl Trait is (and rather than erroring with conflicting defining uses, those defining uses would define the variants).

The interesting note is that operationally, enum impl Trait is a stack form of dyn Trait. You can even imagine implementing it such that the discriminant is the vtable pointer. With a good devirtualization optimizer[1], the main functional difference now is that enum impl Trait has a statically known and automatically inferred size/align, whereas dyn Trait is unsized.

You can close the gap a little more with e.g. Box<dyn Trait, InlineStorage<[usize; 7]>>, but this still requires the programmer specifying a size/align rather than inferring it.

Personally, I'd rather see inline dyn Trait and improved devirtualization before enum impl Trait/static Trait.


  1. Obviously there'd be performance implications and reliability of optimizations and all that; I'm concerning myself more with the operational language design angle than the optimal codegen angle, though both are worth effort. Also, the ability to niche enum impl Trait's discriminant to less than a full usize can be valuable. ↩ī¸Ž

5 Likes

One thing I look forward to seeing, once we have 2580-ptr-meta - The Rust RFC Book, is a "set of dyn Trait" container that groups by the metadata, storing the vtable only once per type that's actually used. Iterating that would then see runs of the same dynamic type, and thus branch prediction would work almost perfectly, removing most of the runtime overhead from the indirect calls.

Looks like this goes back to at least 2017 (pre-RFC: anonymous enums - #4 by Ixrec), if not earlier.

I do think it'd be nice, since it could also offer the obvious solution to things like if A { my_vec.into_iter() } else { a..b }, rather than manually needing to Either — data structures in Rust // Lib.rs

4 Likes

Given that we would have to do some case based branching anyhow, I am wondering if this approach would actually be faster them the current dynamic dispatch. Did you compare the speed of using trait objects to a manual enum type? Also does this actually need new syntax or could this not be implemented as an internal optimization for the dynamic dispatch case?

No, it can't (beyond devirtualization when the concrete type is knowable) in the general case.

The reason is that dyn Trait is open-world. When you compile the code using dyn Trait, you don't know the set of types which impl Trait, and it has to work for any and all such types, even downstream ones.

What makes enum impl Trait workable is that the set of types allowed is some (hopefully small) number of types which are known by construction. Knowing all types erased into dyn Trait requires program-global knowledge, and isn't even a well-formed question in the face of dynamic linking.

Devirtualization is a really impressive optimization, which can do some of this — effectively, if the vtable pointer is a known one, use inlined code, otherwise fall back to using the vtable — so I guess the correct answer is "yes, but." It's just that the caveats are quite large, and devirtualization beyond the simple case of it's only ever a single concrete type is quite difficult.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.