Another OOP-like inheritance scheme: explicit vtable bindings


#1

I’m writing this as a sort of pre-RFC, to gauge interest in a possibly novel OOP-like inheritance scheme. I know this is well-trod ground, but I think I have a different approach than most, and one that might feel more “rusty” than the others I’ve read.

Goal

The fundamental aim is to make how the OOP stuff works obvious to both readers and writers of OOP code. It is explicitly not to hide the mechanisms of dynamic dispatch, because (in my opinion) users of an API, in a systems language like rust, will often want to know whether static or dynamic dispatch will be used for a method call, as the performance of the different types of dispatch can be significantly different, and a significant performance difference will often be design-relevant. As such, I want to expose some of the plumbing, to force maintainers to confront the form of method dispatch that will be used as they work with an API, and to make it obvious to readers what form of dispatch will be used when a function is called.

Approach

  1. Provide direct means for defining vtable structures, and for binding them to structs.
  2. Define a constructor syntax, which allows some separation of concerns for allocating storage, and initializing data.

Detailed Design

Define an unbound attribute that can be applied to trait definitions.

When a trait is marked as unbound, trait references will not be fat pointers, but rather thin pointers to an invasive vtable reference in the struct for which the trait is implemented.

    #[unbound]
    trait t1 {
        fn get_t1() -> uint;
    }
    #[unbound]
    trait t2 {
        pub fn get_t2(&self);
    }

The invasive vtable reference will consist of two words: the first word is an offset to apply to the thin pointer to obtain the self that will be used on a virtual method invocation; the second is a pointer to the virtual function table itself. (TBD: it may be that the offset does not need to live here at all, but could possibly be calculated at the target invocation site. Also, it will often have better performance to directly embed the virtual function table, rather than needing to chase a pointer to find it.)

Structs explicitly declare the vtable reference as a member.

    struct a {
        // says this structure uses a vtable of type `t1`.
        pub a_t1: t1;
        // possible to define multiple vtables in a given structure.
        pub a_t2: t2;

        // field, for demonstrating "cheap field access"
        pub f: uint;
    }

As said before, these are “fat” references, each takes two words in the containing structure, one of which is an offset from the beginning of the structure, so that method invocation against the vtable can provide the correct self pointer to the called function.

Define an instance of the unbound vtable.

For this, I’ve modified the impl statement so that it becomes an expression, where the expression returns a vtable suitable for binding to a defined field in a structure.

    // defines a `vtable` operating against a struct `a`, named `a_t1`.
    const a_t1 = impl t1 for a.a_t1 {
        pub fn get_t1(&self) -> uint {
            self.f
        }
    }

Bind the vtable to the struct.

The text here uses a constructor idea, which I haven’t bothered to call out separately, since I don’t have a great idea about how it might work. Consider the syntax a sort of let rec against the structure’s fields.

    impl a {
        fn constructor(f: uint) -> self {
            self { a_t1.bind(self), a_t2.bind(self), f: f }
        }
    }

Allow the vtable to be overridden in sub-classes.

    struct b {
        pub base: a;
    }

    impl b {
        fn constructor(f: uint) -> self {
            self {
                base(f),
                base.a_t1.rebind(b_t1)
            }
        }
    }

    const b_t1 = impl t1 for b.base.a_t1 {
        pub fn get_t1(&self) -> uint {
            b_t1.prev(self).get_t1() * 3
        }
    }

Notes

I’ve avoided the typical OO pattern of making fields defined in parent classes look as though they’re immediately accessible in child classes. The b_t1 implementation of get_t1 shown, if it were to access the f field in the base object, would have to do so as self.base.f, as opposed to a more traditional self.f. This is deliberate on my part, since part of my goal is to keep the code as traceable as it had been without this facility being added, but I acknowledge that it increases the verbosity.

I’m not very comfortable with how I’ve got vtables defined and manipulated… It feels as though I shouldn’t need an explicit bind or rebind operation to be invoked on each object construction. On the other hand, the explicit operation makes the code, again, more traceable, and the explicitness also enables multiple vtables per object (which I can imagine being useful, e.g. for interface segregation).

I don’t believe this will address all the requirements so far described (though I do think it presents a framework in which those requirements can be addressed). As I said, right now I’m interested in gauging receptiveness to the basic idea, and I wanted to put it out while this part of the language design is still being considered. (I’m coming late to the community, and it feels like I’ve nearly missed the boat on contributing to this important discussion, so I’m rushing the presentation a little bit.) If there is interest, I can try to flesh this out further, or work with someone to develop this idea more fully…

Thanks in advance for any comments.


Meta RFC - Fat Objects and Intrusive Data Structures
#2

+1 to the overall approach no matter what the finalized design would be. Rust excels at providing modern zero-cost abstractions to systems and embedded programmers everywhere. The other side of this coin should be that, when an abstraction is not zero-cost, it should be explicit and pluggable.