For quite some time I've been working on a proof-of-concept of an abstraction for memory management for Vec
and other collections to allow SoA and inline memory for fun on my free time. After many iterations I settled on this one. I think it may be useful as it is a different implementation of the ones I've seen.
Background
When I was helping with soa_derive I noticed that we were duplicating the entire Vec
functionality for each struct which it should be possible to avoid. We were also managing multiple len
, which shouldn't be needed because it should be the same for each array. I also think that it would be useful to be able to have a Vec
on the stack.
Mental model and basic trait
After trying different solutions I realized that the main concern is where the values will reside once saved. Which begs the question: could we do an abstraction for it? This is similar to the effords in another post titled "Is custom allocators the right abstraction?" which I didn't know existed until recently (I started before it was posted). I settled with a different abstraction which I called Buffer
(ideas for better names are welcomed). You can find it in the proof-of-concept repo.
A buffer is where the data will be saved. Its positions are numbered sequentially (from 0 to capacity - 1
). This means that it excludes collections where the underlying memory is held by pointers (like a linked list). The collection is the one keeping track which positions are full since it depends on the collection behaviour. For isntance Vec
only needs a len
, but a VecDeque
also needs a head
. The buffer fulfills the RawVec
's job, making it not necessary.
The basic trait has the following methods:
capacity
: how many values can the buffer hold.take
: extracts a value from the selected position (emptying it).put
: saves the value in the selected position (filling it).try_grow
andtry_shrink
: tries to change the capacity.
Other methods –like getting a reference– are in other traits, but it is possible that a buffer does not implement them or not returning the same types (eg. &T
isn't correct when using a SoA). Another trait is ContiguousMemoryBuffer
, it tags buffers where the underlying data is in a single well-aligned contiguous memory slice.
This interface turned out to be really flexible and allows for a lot of different implementations: inline, on the heap, specifically for ZSTs, using an allocator, etc. More exotic buffers like a buffer on the GPU or on a file should be possible but haven't tried to implement them yet.
Composability
It's possible to make a buffer that depends on another one. This makes it follow the composite pattern. Using this fact I made the optimizations and other behaviour changes in different composite buffers, which allows the user to build a stack of desired behaviours. Zero-sized type optimization, small vector optimization, how it grows (eg. exponentially), and structure of arrays are examples of such buffers. Here is an example on what it looks like in my PoC (I called Vec
Vector
to avoid collisions with std
):
type ExampleBuffer<T> = ZstoBuffer< // Optimization for types where T is a Zero-Sized Type
ExponentialGrowthBuffer< // Make the buffer to grow exponentially
SvoBuffer< // Add Small Vector Optimization
128,
HeapBuffer<T>, // Save the values on the heap (base buffer)
>
>
>;
let example_vector: Vector<u32, ExampleBuffer<_>> = Vector::new();
Any DerefMut
also is a buffer that forwards everything to the derefed one. So Box<InlineBuffer<T, 100>>
would be an fixed-sized buffer on the heap. It also enables using mutable references to buffers to be used as temporary buffers and even to reuse them.
Resizing
The collection may use try_grow
and try_shrink
to try to resize the buffer. They return a result type with an enum –ResizeError
– for the error type. The possible errors are:
UnsupportedOperation
: for example trying to resizeInlineBuffer
orZstBuffer
TheoreticalLimitSurpassed
: for example trying to grow more thanisize::MAX
on the heap.OutOfMemory
UndistinguishableError
: buffer has no information on why it failed (eg. allocators have no information on why they failed).
Performance
The PoC is unnoptimized. I tried but I was struggling to properly measure performance. I also saw some layout shifts seemingly overpowering any changes I made. I expect a lot of inlining should be added (since there are a lot of indirections) and maybe even make a composite that its only job is to make the resizing functions cold. But for this I need reliable measurements. Layout randomization seems a possible path, but that means making a new tool and that's a lot of work.
That being said, it allows the final user to tune colelctions exactly as needed and enables non-contiguous memory layouts (like SoA), so my guess is that it should be able to enable performance gains.
Due to the ammount of types the PoC has, I would expect the compilation times to be larger, but I don't even know how to benchmark it.