Dynamic polymorphism – alternative designs discussion

This post is not directly related to Rust, but this place is probably the best technical place I know to discuss this subject. Please correct any errors in case my understanding is wrong.

As far as I know, there are two main ways to manipulate a reference to a dynamically polymorphic type: thin and fat pointers.

Thin pointers are what C++ uses. The reference is not bigger than a machine word. The first field in the pointed structure is itself a pointer that point to the vtable of the type. The vtable is shared by all the instances of that type. This means that when you need to resolve a polymorphic function, you have two indirections, the first to access to the pointed object, and the second to access the vtable. If multiples references points to the same object, this is memory efficient since each reference is a single machine word, and there is only one pointer to the vtable (at the beginning of the layout of the shared object). In practice some ABI make the size of a pointer smaller than a machine word by having a distinction between regular pointer that can access the same segment of memory and far pointer that can access any memory address, but the idea is the same

Fat pointers are what Rust uses (I’m not 100% sure about this statement, AFAIK Rust references are fat pointers while Rust pointers are thin pointers). The size of a reference is two machine words: a pointer to the object and a pointer to the vtable. Like in C++ the vtable itself is shared for all instances of that type. When resolving a call to a polymorphic function, you only need one indirection since the vtable can be directly accessed. However if you have multiple references to the same object, the memory consumption will be less efficient since all references will have a pointer to the vtable.

In both cases a polymorphic function is called using a function pointer, stored in a vtable. A vtable is nothing more than an array of function pointers. When resolving a polymorphic function, the offset of the function is know statically, but the the address of the vtable isn’t. It’s why in both cases (thin and fat pointers) the address of the vtable must be stored. The main advantage of using a vtable is that the set of polymorphic types is open. If you create a new type that implement the same interface (a Rust trait, or a base class in C++), this will have no impact on the code generated by a function manipulating a reference and calling a polymorphic function. This is especially useful when using dynamic libraries. Loading a library can bring new types implementing a public trait (or a base class in C++) to the set of types implementing that trait.


I recently had an idea of an alternative design, and I’d like to get some feedback. I don’t plan to use this idea in any project so far, so this discussion is purely theorical.

Instead of having a vtable, unique per type, I would use an identifier unique per type. I think it’s easier to undersdtand with some code:

// some boilerplate, implementing Trait for A and B, you can skip it ;)

trait Trait {
    fn foo(msg: &str);
}

struct A;
impl Trait for A {
    fn foo(msg: &str) {
        println!("calling from A: {}", msg);
    }
}
struct B;
impl Trait for B {
    fn foo(msg: &str) {
        println!("calling from A: {}", msg);
    }
}

//////////////////////////////

fn caller(my_ref: dyn Trait) {
    // this line:
    my_ref.foo("polymorphic call");
    // would be desugared into:
    polymorphic_foo(my_ref.id, "polymorphic call");
}

// this function is generated
fn polyomorphic_foo(id: u64, msg: &str) {
     // exhaustive match over all the types implementing the trait Trait
     match id {
        typeid(A) => A::foo(msg),
        typeid(B) => B::foo(msg),
    }
}

The function polymorphic_foo would be generated. If we don’t use any dynamic library, all the types that implements the trait Trait are known at compile time. Generating polymorphic_foo is therefore doable at compile time. This function can be inlined and all other standard optimizations can be applied, and we get de-virtualization nearly for free!

If we use any dynamic libraries, the list of types that implements the trait Trait can be known at compile time only if the trait is not re-exported. If it’s not re-exported, we can be sure that it cannot be implemented by any types in any dynamic library. In that case, be can also generate this list at compile time, and do exactly the same thing than explained above.

If Trait is public and we use dynamic libraries, the set of types known at compile time would change each time a library is loaded. I can see many possible implementation (like re-generating the code of polymorphic_foo, use a vtable, …). It would be more expensive (at minimum, since we can’t inline it, we can’t use further optimization) to use a polymorphic function of an exported type, but still possible.


Why do I think that using an id instead of a vtable would be a good idea:

  • Object-safe trait is no longer a concept needed (as you may have see in the example above, the function foo wasn’t a method, but just a trait function, and could still be called polymorphically).
  • Devirtualization is nearly free to implement
  • The code should be (really slightly) faster since we don’t need to load a vtable if the trait is not exported or if dynamic libraries aren’t used.
  • Most operations (like sizeof) that can be done only after monomorphisation today could be also done polymorphically. sizeof would just match on the id of the object. I have not explored this idea enough, but I think that this could remove most need for monomorphisation.
  • The size of the id could be smaller than the size of the pointer to the vtable. Because of alignment, this could have no-impact. A naive implementation could use an u16 as the discriminant if there is less than 2¹⁶ types. A more complex implementation could look at the size of the biggest set of types implementing (and using) a given trait (probably Debug) and use log2(size_of_the_set) bits.

Given all of those advantages, I’m surprised that I never heard of such technique. Is it implemented in some languages, does it have disadvantages that I don’t see?

1 Like

About reference/pointer sizes:

1 Like

The way you use an ID and a compile time exhaustive list of all impls reminds me of the ideas about an “enum impl TRAIT” feature I’ve seen. I don’t remember the details but I would imagine it working like:

// implementing Trait for A and B

trait Trait {
    fn foo(self, msg: &str);
}

#[derive(Copy, Clone)]
struct A;
impl Trait for A {
    fn foo(self, msg: &str) {
        println!("calling from A: {}", msg);
    }
}

#[derive(Copy, Clone)]
struct B;
impl Trait for B {
    fn foo(self, msg: &str) {
        println!("calling from A: {}", msg);
    }
}

//////////////////////////////

fn caller(my_id: enum impl Trait) {
    // this line (statically) calls the trait impl below
    // (see the desugaring of `enum impl Trait`)
    my_id.foo("polymorphic call");
}

// this enum is generated
enum TraitEnum {
    TypeA(A),
    TypeB(B),
}
// desugaring: enum impl Trait == TraitEnum

// and implementing `Trait` itself:
impl Trait for TraitEnum {
    fn foo(self, msg: &str) {
        match self {
            TypeA(a) => a.foo(msg),
            TypeB(b) => b.foo(msg),
        }
    }
}

Where in this example, the enum impl Trait value would just be u8-sized, since all its variants contain zero-sized types. Also, A and B would coerce to enum impl Trait similar to how a &Foo coerces to &dyn SomeTrait.

1 Like

@phlopsi Thanks for the reference.

@steffahn I remember this discussion, but totally forgot it when writting this. Good parallel.

Linear search is unacceptable perfromance-wise. An obvious fix would be to turn it into a lookup table by ID, but tables add an extra layer of indirection, which again is undesirable performance-wise. That in turn could be fixed making table unnecessary by storing lookup result in the ID, turning the ID into a pointer itself. And we're back to C++'s solution.

You can have C++-like polymorphism in Rust today, with some unsafe hacks. Put the vtable pointer together with allocation (same way Arc inserts refcount) and implement a Deref to dyn fat pointer.

6 Likes

I think this is the sticking point. Not only dynamic libraries, but also just any form of separate compilation. So you'd have to do this in the linker somehow, which seems impractical. (This wouldn't just be LTO, but full link-time-monomorphization as well.)

There are managed languages that monoporphize+JiT their generics at runtime (like C# with structs), but I think all the infrastructure needed to do that means at least a somewhat different language space than Rust is in.

If you're looking for a place to discuss programming languages, consider

3 Likes

I know of two libraries that do this with proc-macros (if I understand it correctly):

I've actually wanted sealed traits to be a language-level feature for exactly this reason. If a trait is #[sealed] (that is, no one but the declaring crate may write implementations for it), and all of its blanket impls are "sufficiently bound" by sealed traits, it is possible for the compiler to enumerate all types which implement the trait.

I feel like this should be an optimization rather than a relaxation of the object-safety rules, though. I believe there has been a separate discussion of whether or not it should be possible to form dyn Trait for all traits, and only have dyn Trait: Trait when Trait is object-safe. This "partial trait object" probably wants a separate syntax like dyn? Trait.

You can basically already do this with Sized bounds, but only for functions at the moment, and the trait author needs to opt-in.

Hello

It's possible I'm not getting the gist of your idea, but from what I understand, you can not get rid of the concept of object-safe. What object safe prohibits is dynamic dispatch of trait like this:

trait CallStuff {
    type Parameter;
    fn call_stuff(&self, p: Parameter) -> Result<(), Error>;
}

The problem is, if you have an object of this trait and want to call that method (or free-standing function with the same parameter), you have no clue what parameter to pass to it, because each variant of that enum needs a different one. Obviously, in a strong type system, you want to be sure you call it with the right one at compile time and if you wanted to check at runtime, you'd have to deal with what happens if it is the wrong one.

This could help solving problems of passing Self (owned) in or out, so it could help allow some more things to be object safe. But that might be approached from being able to pass dynamically sized things on stack too (I'm not sure if this is actually planned or it's out of scope, but you could eg. alloca the space for the return value and prepend it with the size so the caller can know how much to pop…)

Dynamic size queries are already possible: https://doc.rust-lang.org/std/mem/fn.size_of_val.html

That being said, it sounds like an interesting alternative approach. I believe compiler may try to optimize it to something like this on occasion, but having a way to control it (without manually doing the whole enum exercise) might not be a bad idea. Not sure if one should do it on the language level (added complexity and stuff) or if crates are „good enough“.

2 Likes

I think you are inventing multi-methods. I haven't heard about multi-methods with closed-world assumption, but, for an awesome open-world implementation, take a look at Julia.

1 Like

I don't see how, as we would still not know the size of a dyn Trait object statically, which is needed to stack allocate it.

But overall, @scottmcm and @vorner's comments are spot on. This is essentially constructing a vtable per trait, which each type does a lookup in to find its method call, instead of a vtable per type x trait. This is strictly worse than the current method: it does not allow any separate compilation at all, it results in an extra indirection when calling a virtual method, and it would not be semantically different.


EDIT: I just noticed the examples talk about associated functions, instead of methods. This also does not work.

Your example is not correct. You show foo being called as a method on dyn Trait, but foo is not a method. Associated functions are not object safe because they would actually be called like this:

fn caller() {
     <dyn Trait>::foo("polymorphic call");
}

There' s no where to get a type id, or vtable pointer, or anything else, because no dynamic trait object is passed into this function.

1 Like

As far as I can tell, this is somewhat similar to using vtables assuming that the compiler will turn the match statement into a jumptable/calltable.

With a vtable, you have code like this on x86:

call [vtable_register + TRAIT_METHOD_OFFSET_IN_VTABLE]

With ids, you have code like this on x86:

call [TRAIT_METHOD_TABLE + type_id_register * 8]

Or if you are ok with having ids that are multiples of 8:

call [TRAIT_METHOD_TABLE + type_id_register]

Either way it's a call instruction with a constant and a register.

Here are the differences:

  • With vtables, the fat pointer metadata is 32/64-bit, while with object ids, it can be 8/16-bit (unless you have more than 8K types implementing a trait, then it's 32-bit)
  • With vtables, the immediate is 8/16-bit (the method offset in the vtable), but with type ids the immediate is 32/64-bit (the global address of the call jump table). On RISC architectures (and even x86-64 since there are no 64-bit immediates, unless you force vtables in the low 4G address space), the vtable call is probably simpler.
  • Vtables are most cache-efficient if you call many virtual methods on the same object type, while type ids are most cache-efficient if you call the same methods on several object types
  • Vtables work trivially with static and dynamic linking; object ids require extra work for static linking (you need to create a section for each trait method, and make sure the linker orders them correctly), and significant extra work to support dynamic linking (you need to dynamically allocate type ids, and either need to assign a lot of virtual address space to the tables, or need a way to be able to reallocate the tables and patch the code and stack).

In either case, the compiler can devirtualize and inline.

It would be nice to have the option to compile a dyn trait with type ids, but not sure how important it is.

2 Likes