For quite some time I've been working on a proof-of-concept of an abstraction for memory management for
Vec and other collections to allow SoA and inline memory for fun on my free time. After many iterations I settled on this one. I think it may be useful as it is a different implementation of the ones I've seen.
When I was helping with soa_derive I noticed that we were duplicating the entire
Vec functionality for each struct which it should be possible to avoid. We were also managing multiple
len, which shouldn't be needed because it should be the same for each array. I also think that it would be useful to be able to have a
Vec on the stack.
After trying different solutions I realized that the main concern is where the values will reside once saved. Which begs the question: could we do an abstraction for it? This is similar to the effords in another post titled "Is custom allocators the right abstraction?" which I didn't know existed until recently (I started before it was posted). I settled with a different abstraction which I called
Buffer (ideas for better names are welcomed). You can find it in the proof-of-concept repo.
A buffer is where the data will be saved. Its positions are numbered sequentially (from 0 to
capacity - 1). This means that it excludes collections where the underlying memory is held by pointers (like a linked list). The collection is the one keeping track which positions are full since it depends on the collection behaviour. For isntance
Vec only needs a
len, but a
VecDeque also needs a
head. The buffer fulfills the
RawVec's job, making it not necessary.
The basic trait has the following methods:
capacity: how many values can the buffer hold.
take: extracts a value from the selected position (emptying it).
put: saves the value in the selected position (filling it).
try_shrink: tries to change the capacity.
Other methods –like getting a reference– are in other traits, but it is possible that a buffer does not implement them or not returning the same types (eg.
&T isn't correct when using a SoA). Another trait is
ContiguousMemoryBuffer, it tags buffers where the underlying data is in a single well-aligned contiguous memory slice.
This interface turned out to be really flexible and allows for a lot of different implementations: inline, on the heap, specifically for ZSTs, using an allocator, etc. More exotic buffers like a buffer on the GPU or on a file should be possible but haven't tried to implement them yet.
It's possible to make a buffer that depends on another one. This makes it follow the composite pattern. Using this fact I made the optimizations and other behaviour changes in different composite buffers, which allows the user to build a stack of desired behaviours. Zero-sized type optimization, small vector optimization, how it grows (eg. exponentially), and structure of arrays are examples of such buffers. Here is an example on what it looks like in my PoC (I called
Vector to avoid collisions with
type ExampleBuffer<T> = ZstoBuffer< // Optimization for types where T is a Zero-Sized Type ExponentialGrowthBuffer< // Make the buffer to grow exponentially SvoBuffer< // Add Small Vector Optimization 128, HeapBuffer<T>, // Save the values on the heap (base buffer) > > >; let example_vector: Vector<u32, ExampleBuffer<_>> = Vector::new();
DerefMut also is a buffer that forwards everything to the derefed one. So
Box<InlineBuffer<T, 100>> would be an fixed-sized buffer on the heap. It also enables using mutable references to buffers to be used as temporary buffers and even to reuse them.
The collection may use
try_shrink to try to resize the buffer. They return a result type with an enum –
ResizeError– for the error type. The possible errors are:
UnsupportedOperation: for example trying to resize
TheoreticalLimitSurpassed: for example trying to grow more than
isize::MAXon the heap.
UndistinguishableError: buffer has no information on why it failed (eg. allocators have no information on why they failed).
The PoC is unnoptimized. I tried but I was struggling to properly measure performance. I also saw some layout shifts seemingly overpowering any changes I made. I expect a lot of inlining should be added (since there are a lot of indirections) and maybe even make a composite that its only job is to make the resizing functions cold. But for this I need reliable measurements. Layout randomization seems a possible path, but that means making a new tool and that's a lot of work.
That being said, it allows the final user to tune colelctions exactly as needed and enables non-contiguous memory layouts (like SoA), so my guess is that it should be able to enable performance gains.
Due to the ammount of types the PoC has, I would expect the compilation times to be larger, but I don't even know how to benchmark it.