Allowing multiple disparate return types in impl Trait using unions


#1

This is a simple idea that I’m sure a lot of ya’ll have had that I wanted to present here just to have it out in the open.

Return impl Trait can be optimized to improve spatial locality (i.e. remove the cache miss associated with dynamic dispatch) while being able to return multiple disparate types. This works by always returning an object of the largest size of the disparate return types. If the disparate types are similarly sized, then this is zero overhead. Otherwise memory can be wasted.

I will motivate this with an example. Say we have a trait with methods that only take a reference to self as a receiver (that is to say, none of them take ownership of self. Otherwise they may take any number of arguments):

trait Simple {
    fn method1(&self) -> u32;
    fn method2(&self, arg1: u32) -> u32;
    fn method3(&self, arg1: u32, arg2: u32) -> u32;
}

Assume that we have three types, T1, T2, and T3 that all implement Simple. Consider the following function that we would like to return T1, T2, or T3 from while removing dynamic dispatch cache miss:

fn retSimple(v: u32) -> impl Simple {
    match v {
        1 => T1{ ... },
        2 => T2{ ... },
        3 => T3{ ... },
    }
}

If is possible to implement this with no overhead thanks to unions. To do this, we make an anonymous return type for retSimple as follows:

union DataUnion {
    t1: T1,
    t2: T2,
    t3: T3,
}

struct retSimpleRetType {
    data: DataUnion,
    simpleMethod1: fn (&DataUnion) -> u32,
    simpleMethod2: fn (&DataUnion) -> u32,
    simpleMethod3: fn (&DataUnion) -> u32,
}

retSimple would then be implemented as follows:

fn retSimple(v: u32) -> retSimpleRetType {
    match v {
        1 => retSimpleRetType {
            data: DataUnion {
                t1: T1{ ... },
            },
            simpleMethod1: <T1 as Simple>::method1, // (with necessary transmutes/casts)
            // etc ...
        },
        2 => retSimpleRetType{ ... },
        3 => retSimpleRetType{ ... },
    }
}

If (and this is a big if) the number of methods in Simple is few, and the sizes of the different return types are relatively equal, this method removes the cache miss required by dynamic dispatch for multiple disparate return types in impl Trait with little overhead.

In fact, if the disparate traits all have size equal to that of a pointer, and the trait in question only contains one method, this method is entirely equal to returning a boxed trait object.

Just a thought


#2

Yep, this idea has been floating around for a while and everyone seems to like it. I’ve been calling it enum impl Trait though everyone seems to have their own preferred name for it. I believe the most recent chatter on the subject is happening at https://github.com/rust-lang/rfcs/issues/2414, which includes my most recent thoughts so I won’t bother repeating them here.


#3

Thanks for pointing me to that RFC! Rust internal discussions are a little difficult to navigate around :stuck_out_tongue: I think the fundamental difference between this optimizations and automatic sum types is that we don’t need to carry the discriminant of the sum type in order to determine the desired method each time one is called. The disadvantage is that we need to carry function pointers for the traits we need.


#4

I’m not sure how it “removes dynamic dispatch” then. If we do need indirect calls through function pointers, how is that not dynamic dispatch?


#5

I guess “removing dynamic dispatch” is a result of my tired brain. It substantially reduced overhead. Consider a fat pointer:

struct VTable {
    method1: fn(*const u8) -> u32, // example 
}

struct FPointer {
    data: *const u8,
    vtable: &VTable,
}

Using this requires dereferencing a pointer (VTable) and using that value as a function pointer. VTable may not be in cache, causing a cache miss.

This operation guarantees that the function pointers will be spatially local (i.e. written onto the stack, and thus will not cause a cache miss).


#6

Okay, so it didn’t avoid dynamic dispatch, but it removed a level of indirection. I see, thanks.


#7

Right, it doesn’t remove the costs associated with branch misprediction. Think of it as inlining a vtable.

I edited the proposal to reflect this mistake on my part.


#8

I created a repository with benchmarks to analyse this optimization. In best case scenario that I highlighted, it seems this is twice as fast as Boxing a trait. https://github.com/DataAnalysisCosby/impl-trait-opt