I’ve been looking into different approaches for this and it seems to me that in the general case, especially if you want to support Box<A+B> type pointers, there are two extremes which I’d like to eventually look for a reasonable middle ground. These are:
- Duplicate every super trait’s vtable everywhere a cast might be needed. The size of this can easily become absolutely unreasonable if we implement
A+B pointers this way (imagine listing vtables for A+B, A+C, B+C, etc all with known offsets between them. ouch.) On the other hand, casting is always just an add/sub.
- Pointers to pointers to vtables. This approach is more complex to generate - the order of trait bounds can be rearranged to overlap pointer tables and that algorithm needs to consider every type and trait combination with potential casting paths between them, but there’s minimal size overhead if done correctly, even for multi-trait pointers. Every cast is two dereferences, which is better than
dynamic_cast, but not ideal.
I’ve been looking into some hybrid approaches which I think could be promising. If we take the pointer table approach as the general case, we have vtables, tables of vtable pointers (these are the core casting mechanism), and pointers to vtable pointer tables (to avoid repeating the cast table pointers everywhere). I think in a lot of cases though, we can eliminate some of these pointers. For traits with few requirements, it doesn’t make sense to add an extra level of indirection when we’d just be duplicating a couple pointers everywhere its vtable appears to support casting. And similarly, for traits with few methods, it doesn’t make sense to add any indirection at all when its whole vtable is only a couple of pointers.
So the question for me then becomes, casting can have variable overhead. Should this be decided just based on a size optimization? Should we try to optimize the most common casts in a program? Look for casts inside a loop? Let the user hint at which casts should be optimized for speed?
I’d also like to collect some more data on this during the impl period so that the eventual RFC for this will be well-informed. Looking for the most common and most extreme patterns on crates.io should help clarify a good approach to take, but at the same time I can’t entirely know how adding this will change the way people write code. In the meantime, are there thoughts on different approaches, comments, something I’ve overlooked?