This post is not directly related to Rust, but this place is probably the best technical place I know to discuss this subject. Please correct any errors in case my understanding is wrong.
As far as I know, there are two main ways to manipulate a reference to a dynamically polymorphic type: thin and fat pointers.
Thin pointers are what C++ uses. The reference is not bigger than a machine word. The first field in the pointed structure is itself a pointer that point to the vtable of the type. The vtable is shared by all the instances of that type. This means that when you need to resolve a polymorphic function, you have two indirections, the first to access to the pointed object, and the second to access the vtable. If multiples references points to the same object, this is memory efficient since each reference is a single machine word, and there is only one pointer to the vtable (at the beginning of the layout of the shared object). In practice some ABI make the size of a pointer smaller than a machine word by having a distinction between regular pointer that can access the same segment of memory and far pointer that can access any memory address, but the idea is the same
Fat pointers are what Rust uses (I’m not 100% sure about this statement, AFAIK Rust references are fat pointers while Rust pointers are thin pointers). The size of a reference is two machine words: a pointer to the object and a pointer to the vtable. Like in C++ the vtable itself is shared for all instances of that type. When resolving a call to a polymorphic function, you only need one indirection since the vtable can be directly accessed. However if you have multiple references to the same object, the memory consumption will be less efficient since all references will have a pointer to the vtable.
In both cases a polymorphic function is called using a function pointer, stored in a vtable. A vtable is nothing more than an array of function pointers. When resolving a polymorphic function, the offset of the function is know statically, but the the address of the vtable isn’t. It’s why in both cases (thin and fat pointers) the address of the vtable must be stored. The main advantage of using a vtable is that the set of polymorphic types is open. If you create a new type that implement the same interface (a Rust trait, or a base class in C++), this will have no impact on the code generated by a function manipulating a reference and calling a polymorphic function. This is especially useful when using dynamic libraries. Loading a library can bring new types implementing a public trait (or a base class in C++) to the set of types implementing that trait.
I recently had an idea of an alternative design, and I’d like to get some feedback. I don’t plan to use this idea in any project so far, so this discussion is purely theorical.
Instead of having a vtable, unique per type, I would use an identifier unique per type. I think it’s easier to undersdtand with some code:
// some boilerplate, implementing Trait for A and B, you can skip it ;)
trait Trait {
fn foo(msg: &str);
}
struct A;
impl Trait for A {
fn foo(msg: &str) {
println!("calling from A: {}", msg);
}
}
struct B;
impl Trait for B {
fn foo(msg: &str) {
println!("calling from A: {}", msg);
}
}
//////////////////////////////
fn caller(my_ref: dyn Trait) {
// this line:
my_ref.foo("polymorphic call");
// would be desugared into:
polymorphic_foo(my_ref.id, "polymorphic call");
}
// this function is generated
fn polyomorphic_foo(id: u64, msg: &str) {
// exhaustive match over all the types implementing the trait Trait
match id {
typeid(A) => A::foo(msg),
typeid(B) => B::foo(msg),
}
}
The function polymorphic_foo
would be generated. If we don’t use any dynamic library, all the types that implements the trait Trait
are known at compile time. Generating polymorphic_foo
is therefore doable at compile time. This function can be inlined and all other standard optimizations can be applied, and we get de-virtualization nearly for free!
If we use any dynamic libraries, the list of types that implements the trait Trait
can be known at compile time only if the trait is not re-exported. If it’s not re-exported, we can be sure that it cannot be implemented by any types in any dynamic library. In that case, be can also generate this list at compile time, and do exactly the same thing than explained above.
If Trait
is public and we use dynamic libraries, the set of types known at compile time would change each time a library is loaded. I can see many possible implementation (like re-generating the code of polymorphic_foo
, use a vtable, …). It would be more expensive (at minimum, since we can’t inline it, we can’t use further optimization) to use a polymorphic function of an exported type, but still possible.
Why do I think that using an id instead of a vtable would be a good idea:
- Object-safe trait is no longer a concept needed (as you may have see in the example above, the function
foo
wasn’t a method, but just a trait function, and could still be called polymorphically). - Devirtualization is nearly free to implement
- The code should be (really slightly) faster since we don’t need to load a vtable if the trait is not exported or if dynamic libraries aren’t used.
- Most operations (like
sizeof
) that can be done only after monomorphisation today could be also done polymorphically.sizeof
would just match on the id of the object. I have not explored this idea enough, but I think that this could remove most need for monomorphisation. - The size of the id could be smaller than the size of the pointer to the vtable. Because of alignment, this could have no-impact. A naive implementation could use an
u16
as the discriminant if there is less than 2¹⁶ types. A more complex implementation could look at the size of the biggest set of types implementing (and using) a given trait (probablyDebug
) and uselog2(size_of_the_set)
bits.
Given all of those advantages, I’m surprised that I never heard of such technique. Is it implemented in some languages, does it have disadvantages that I don’t see?