Meta RFC - Fat Objects and Intrusive Data Structures

This post is intended to motivate the use of a kind of “fat objects” style virtual-structs/inheritance model, but with some tweaks to support a locally addressable memory layout. The goal of intrusive data structures in general is to improve performance through aggregation. It is the author’s goal to streamline the use of this style of aggregation. It turns out that the proposed Fat Objects design can achieve this if some minor constraints are met.

The goal of this is not to define a complete specification for an inheritance model, but to supply additional requirements for an inheritance model from the point of view of a systems programmer. There is an opportunity here to meet the needs of both high level API designers and embedded/realtime programmers with the same enhancement to Rust.

I have read through several of the latest RFCs on the subject, and they address logical layout, vtable, and typing concerns (as programming language design discussions should) I have not seen any discussion from this angle. Apologies if this is redundant.

First, a brief description of intrusive data structures and why they are necessary.

##1. Intrusive Data Structures

Intrusive data structures is a style of embedding the bookkeeping components of data structures, such as the prev/next ptrs in a list, or the child and parent nodes in a tree, into an object which is being stored in said data structures. e.g, an object stored in a priority queue of sorts, which uses a linked list of nodes at every rb tree node, might look like:

struct MyObject {
    biz_item1 : isize,
    biz_item2 : [u8; 5],
    list_node : LinkedListNode,
    tree_node:  RBTreeNode
}

struct LinkedListNode {
    next : *const LinkedListNode,
    prev : *const LinkedListNode
 }

struct RBTreeNode {
    children: [*const  RBTreeNode;2],
    parent: *const RBTreeNode
}

The data structures then directly reference the addresses of their respective node:

let myobj = MyObject::new(); 

let mylist = LinkedList::new(); 

mylist.append(&myobj.list_node); 

So the List has no idea that it is holding on to a MyObject, it is just holding on to a ptr to the LinkedListNode. When it is time to retrieve the object, we call something like:

let myref : &LinkedListNode = mylist.head(); 
let myobj : &MyObject = get_offset_of!(myref, MyObject, list_node); 

where get_offset_of just calculates the address of the head of the MyObject node based on the offset of the member list_node within the struct and returns a pointer.

###Why?

There is often a need to keep data structures in contiguous chunks of addressable memory. The reasons are usually two-fold:

One is that the region of memory is shared between multiple processes, so all “pointers” to objects must be accounted for since they will represent different actual offsets in the address space of each process which maps the shared memory. This is for systems such as shmem IPC, as well as schemes that manage memory on disk, such as b-tree indices. The goal, then is to references as few offsets as possible, as bookkeeping becomes arduous. This means the goal should be to create as few objects in this region as possible. (It is helpful in these cases to use an allocator which serves memory from mmap’d regions, which, itself would live in shared mem and likely be built with intrusive data structures )

The second reason for the use of intrusive data structures is performance improvements due to cache locality. When the cost of main memory access matters. You want to retrieve as much of an object at one time from memory and place it into (preferably) L1 cache. When objects are structured in this way, you can typically retrieve the entire object from memory in a single load, regardless of whether you’re fetching it from the LinkedList or from an RBTree.

2. Intrusive Data Structures and Polymorphism

If it isn’t already obvious, intrusive data structures, achieve a sort of polymorphism through the use of a statically sized struct and known offsets. In this example, get_offset_of! uses knowledge of the layout of the struct to inform object access via a “smart function” approach rather than a vtable approach.

In the above example, we hand out pointers directly to members of a struct, and we convert back using static knowledge of that struct.

This scheme could benefit from a vtable. This would allow a developer to pass &MyObject to LinkList::append or RBTree::insert while achieving the same semantics. But there is a catch.

###VTable Layout### This approach comes with one very large caveat, and that is, in order to support shared memory schemes, the vtable can only reference relative offsets, and not absolute pointers. This is because the absolute pointers will change for every process which uses the shared memory region. This also means that there cannot be a pointer to a vtable to this object (unless we can exactly dictate the location of this vtable object (e.g. relative offset, not absolute). The easiest implementation, then, is for this object to carry its own table, and not reference an external table. For this example, this should be fine, however for other intended uses, this might be difficult. I haven’t thought through all of the cases here.

Obviously having a physical layout that looks like C is preferred, but there is something about cake and eating… As long as the vtable could be declared and used as a formal member of a struct (perhaps via a FatObjectRaw transmute, so that the size and location of all items could be accounted for, that’s sufficient for me.

In summary, I guess what I’m asking for is that when we are implementing this, or some other preferred approach, that we keep these ideas in mind: physical (and possibly shared) memory layout, as well as an interface which is oriented as much towards aggregation as it is inheritance, these are certainly not mutually exclusive.

Note that this model doesn’t play well with Trait objects or dynamically sized types, because the per-object information cannot point directly back to function pointers that are the object/trait methods. It would lend itself better to providing reflection information that would allow itself to be cast back into a concrete object. (which with the local information, it could then be used as a trait object)

I wouldn't approach this in precisely the same way, today, but I wrote something several months ago that seems relevant to this discussion:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.