Is custom allocators the right abstraction?

As someone working in high-performance systems, I am all too keen on the ability to avoid heap allocations wherever possible, whether for cache-locality or lower-overhead.

The news that custom allocators where being rolled-out, in nightly, was therefore of great interest. See Box and Vec.

Unfortunately, the design of custom allocators prevents inline storage -- that is, a Vec<T, ???> cannot be self-contained (only pointing within itself) because if you move it its internal pointer points to the previous location.

This is quite a stark limitation.


As mentioned I am working in high-performance systems, in C++. The custom allocators in C++ suffer from the same fundamental issue, and therefore over the last few years I have implemented a variety of Vec-like structures:

  • InlineVec<T, N>: a Vec-like structure with inline storage and a constant capacity of N.
  • SmallVec<T, N>: a Vec-like structure with either inline storage (up to N elements) or heap-allocated storage.
  • ScratchVec<T>: a Vec-like structure receiving a slice of uninitialized memory as storage.

And of course I also use std::vector<T> as provided by the standard library.

Up until recently, this was seen as an unfortunate, though inevitable, situation. After all custom allocators do not allow inline storage.

Recently, however, when a colleague complained that one of the Vec-like we had did not support the full breadth of functionality that ScratchVec did, rather than duplicate the functionality, we decided to take a step back.

Out of our brainstorming RawVec<T, Storage> was born, where Storage defines the raw memory storage backing RawVec, and our 3 previous implementations were entirely redefined in terms of RawVec:

  • InlineVec<T, N> is essentially RawVec<T, [MaybeUninit<T>; N]>.
  • SmallVec<T, N> is essentially RawVec<T, union { [MaybeUninit<T>; N], Box<[MaybeUninit<T>> }>.
  • ScratchVec<T, N> is essentially RawVec<T, &mut [MaybeUninit<T>]>.

And there was much rejoicing.

This massively cut down on duplication, and the code for the Storage parameter is relatively simple -- though there's still trimming to do. In fact, it's so good that as we speak I am in the process of applying the same pattern to a RawVecDeque implementation.


And so, as I was looking at roll-out of custom allocators in Box, and Vec, and realized that they would suffer from the same harm-stringing limitation that they have in C++, I realized that maybe Rust could avoid making the same mistake as C++, and go for a more flexible abstraction instead.

You're welcome to have a look at the PoC or to have a look at @RustyYato's similar crate generic-vec or at the latest release of coca (0.2) which directly uses a ContiguousStorage trait for its various containers and the explorations of its author.

I do insist on potentially more flexible abstraction. I believe it would work well for any collection with contiguous and homogeneous storage (Box, Vec, VecDeque), and probably for any collection with contiguous but heterogeneous storage (HashMap?).

It may possibly be adapted for BTreeMap (non-contiguous) by handing out custom handles that could be converted to pointers (Storage::ref(&self, handle) -> &T?), which is necessary for the nodes to be able to store the handles without them being invalidated on moves. The Global allocator -- or any heap allocator -- could simply use *mut T as handle and not suffer from any conversion drawback.


@TimDiekmann : Triggered by your comment on reddit about the impossibility to inline the allocator within Vec due to pointer invalidation on moves.

35 Likes

Wow, I was just working on a similar abstraction. One limitation I found: it's difficult to encode SmallVec<T, N> using this method because smallvec needs access to the length, but otherwise it seems like a nice abstraction. It should be possible to retrofit this onto std.

edit: this doesn't seem as useful to Box because if you are using Box, you want to be able to move the Box without moving the contents, but for other collections like Vec, VecDeque, HashMap, and HashSet this could be a viable way to generically swap out the buffer. This would also allow you to use all of these collections on no_std, by using either [MaybeUninit<T>; N] or &mut [MaybeUninit<T>] as the raw storage, which would be amazing.

I think this doesn't add much over an allocator parameter and dedicated structs for the small/inline/etc. use cases.

The main advantage of an allocator parameter is that it can be changed by the caller for a whole tree of data structures, since its choice doesn't necessarily depend on the particular use of it (e.g. you may want to allocate in shared memory, or on a specific NUMA node, etc.).

On the other hand whether to use Vec, SmallVec or ArrayVec very much depends on the expected size distribution of that specific vec and so it's not really useful to decide that outside the data structure.

There could be some use in being generic over those classes, but this could also be accomplished by a trait encapsulating Vec's interface.

So while this approach seems good for code reuse in a crate implementing those data structures, it doesn't seem clear that it should be added to std and used in std's Vec implementation.

4 Likes

It may possibly be adapted for BTreeMap (non-contiguous) by handing out custom handles that could be converted to pointers ( Storage::ref(&self, handle) -> &T ?)

I have been thinking about something similar. My use case was slightly different. Right now allocators deal with [u8] however let's say I want to write a gpu vec allocator. The main problem with the current allocator API is that GPU allocator APIs don't return a pointer, but rather some object that can be converted to a pointer.

Ideally, the allocator trait would have an associated type that can be converted to a [u8] rather than the [u8] itself.

3 Likes

I disagree that not moving the content is the primary motivation; I mainly see Box as going from unsized (trait / slice) to sized.

With that definition, a InlineBox<Trait, 64> is a Box that holds up to 64 bytes of data (on top of the trait pointer), and a SmallBox<Trait, 64> holds up to 64 bytes of data without memory allocation, or otherwise allocates.

I see both as useful as their InlineVec and SmallVec counterparts.

I think this doesn't add much over an allocator parameter and dedicated structs for the small/inline/etc. use cases.

The problem of dedicated structs is that you duplicate the code, again and again.

Vec is admittedly simple, and yet given all the functions it's already pretty massive. VecDeque is already slightly more complicated, HashMap is a beast.

The main advantage of an allocator parameter is that it can be changed by the caller for a whole tree of data structures, since its choice doesn't necessarily depend on the particular use of it (e.g. you may want to allocate in shared memory, or on a specific NUMA node, etc.).

This is not an either / or. There's no reason not to have a generic AllocatorStorage<A> parameterized by an Allocator.

Storages are a higher-level of abstractions than Allocators, and should be built on top of Allocators.

On the other hand whether to use Vec, SmallVec or ArrayVec very much depends on the expected size distribution of that specific vec and so it's not really useful to decide that outside the data structure.

I disagree.

I regularly deal with network protocols that have tight specifications on the maximum length of their fields, for example InlineString<24> (ie, at most 24 bytes).

It's very useful to be able to create a HashMap<InlineString<24>, MyData> without extraneous heap allocations for each and every key.

There could be some use in being generic over those classes, but this could also be accomplished by a trait encapsulating Vec's interface.

I see genericity as a completely different requirement, and indeed one for which traits are a better approach.

Here, I am for code reuse. The code in Vec is for the most part completely agnostic to the storage of the elements -- it needs some storage, and that's it.

Hence the idea of abstracting the Storage, not the collection, so that the massive amount of code in Vec can be reused "instantly" in InlineVec, SmallVec, ScratchVec, ... and whatever other storage you have need of.

So while this approach seems good for code reuse in a crate implementing those data structures, it doesn't seem clear that it should be added to std and used in std's Vec implementation.

I think it's good for code reuse even outside the crate.

The question of whether to use in std or not, I'll leave to others to decide.

I first want to raise awareness that the current path taken by std (parameterizing with an Allocator) does not lead to as much flexibility as it could, and maybe before committing ourselves to it further, we should examine whether as long as we're parameterizing all collections we might not as well extract as much flexibility as we can.

In my experience, working in high-performance systems, and I suspect in the experience of embedded engineers, the extra flexibility of Storage is definitely worth it.

13 Likes

(drive by commend by someone who is touching allocators from a distance)

I think the most powerful API for allocator would be to pass it in at the call-site, similar to how OnceCell is more powerful than Lazy, and how hashbrown's raw_entry API is more powerful than K: Hash+Eq API (we really need a name for this pattern/idea... (provisional name: Call Site Dependency Injection)).

That is, the API like

impl Vec<T, A> {
    fn alloc_push(&mut self, alloc: &mut A, elem: T) -> Result<T, A::Err>;
}

This will allow, for example, to use the same non-zst allocator for the bunch of different Vecs, without storing a redundant pointer in every Vec.

An interesting case study here would be Zig. It's standard library provides, eg, Vec in two flavors: one stores allocator as a field, another needs an allocator argument for each method call:

7 Likes

I don’t see how that would work. What if the next push uses a different allocator instance? Do you then store an allocator instance pointer per element? Do you only use the new allocator if a capacity increase is necessary? If yes then you still need the old allocator instance to free the previously allocated elements

3 Likes

That would be a programmer's error (similarly to how supplying wrong equals/hashcode in hashbrown's raw API would be programmer's error). You'll need to store an allocator somewhere, but you can re-use the same allocator instance with multiple containers:

struct Stuff<A: Allocator> {
  alloc: &mut A,
  vec: Vec<i32>,
  map: HashMap<i32, f32>,
}

as opposed to

struct Stuff<A: Allocator> {
  vec: Vec<i32, A>, 
  map: HashMap<i32, f32, A>,
}

where both vec and map need to store an instance of A, and can't use a &mut for that.

1 Like

First of all: Thanks for the effort, I really like the idea! It seems very flexible and with something like AllocatorStorage<A: AllocRef> we'd have the best of both worlds.

I like to remind, that the AllocRef trait is by far not set in stone and every method or even generic parameter is gated behind #[feature(allocator_api)]. This means, if this idea is mature enough, it would be possible to replace AllocRef for collections. However, we'd need AllocRef for #[global_allocator] (GlobalAlloc will be depracated and the current AllocRef trait will replace it in a backwards compatible way. The AllocRef trait simply has more possibilities than GlobalAlloc) so it would make sense, to provide AllocatorStorage in std (if we decide to go this path).


I don't really see, how InlineVec solves this problem. Using your provided implementation, this could look like this:

struct Vec<T, S> { length: _, storage: S };
type RawInlineVec<T, N> = RawVec<T, [MaybeUninit<T>; N]>;
type InlineVec<T, N> = Vec<T, RawInlineVec<T, N>>;

When retrieving a pointer to the data of InlineVec, it would also be invalidated after moving the vector. Would you simply say, that dereferencing a pointer is unsafe, so the user has to care?


I like to invert your statement: What is not possible with Storage, but is possible with AllocRef when using in collections and smart pointers? (I think nothing, as AllocatorStorage could be a wrapper around AllocRef)

2 Likes

AllocRef now uses &self instead of &mut self: #76993.

However, you would save some space.

1 Like

dealloc requires the correct allocator to be used, so the API would have to be unsafe. This would also preclude the vector from being dropped, and I guess it'd have to panic? I'm sure there's a place for such an API, but it seems too niche for std

3 Likes

This kind of API is well suited for a backend implementation for the collections (like alloc::raw_vec::RawVec (feature gated) for Vec) as it's the same logic. I don't see, why this shouldn't be exposed. Maybe similar to hashbrown, which exposes the raw-API in a raw module.

In case it wasn't clear, I was talking about the Vec::alloc_push API :slight_smile:

I'd love to see the Storage trait backing std collections, and think it'd be especially great for "no_alloc" compatible crates, which would simply error out if they ran out of space (Of course, the collections would need to be moved to core). I don't have much to add to the proposal :smiley:

Me too :sweat_smile:

I see! I don't think RawVec is generic enough to be public API - ime, it needs to be tweaked a little for most custom collections. Maybe there are more crates that would benefit from it than I realise :woman_shrugging:

1 Like

Allocators are 112% the right abstraction but we need to learn from C++ here.

Namely, allocators should really only be handles to a "heap".

I'm not sure what "inline storage" means but with C++'s allocator model, it is possible to have an allocator that pulls all of its storage from a stack-allocated buffer.

We'd also need something like C++'s std::pointer_traits as well.

For std use, could possibly be unsafely impl'ed for ManuallyDrop<Vec>, along with a corresponding dealloc method.

That would be a programmer's error (similarly to how supplying wrong equals/hashcode in hashbrown's raw API would be programmer's error).

Right. But you still need to store the allocator, in order to free in the destructor, and also to verify that a newly passed allocator is still the same. Since that allocator reference needs to be retained I'm not sure if there is really a huge benefit in passing it as an argument too.

With zig things are a bit different, because it doesn't have destructors, and can require the user to pass the same allocator also to the deinit method.

I also don't think the comparison with hashing fully holds up: While both would be bugs, supplying a different hasher would mainly mess up the content of the the hashmap. It would not lead to a memory safety issue. Freeing or reallocating data with the wrong allocator is definitely a memory safety concern.

4 Likes

It's an interesting take indeed to treat the allocation and the allocator reference as an inseparable type. That would absolutely solved a few problems in the requirements for the current alloc trait. However, on the topic of learning from C++ I find it absolutely bewildering that the approach would publicly denote the type which it will internally allocate.

For Vec this is straightforward and okay as there are no details added by the data structure, but for the HashMap it's way less clear to me what the post actually proposes? Would this require leaking the bucket type since it allocates contiguous buckets? That would obviously not be workable. What about structures that perform multiple allocations? The rebind mechanism that C++ uses, which allows gettin a differently typed allocator with an associated type, is a massive PitA and seems opposed to the idea that memory itself is untyped. And I don't want to have to realloc when (un-)wrapping a type via a transparent wrapper, e.g. Box::assume_init.

4 Likes

It may makes sense to link this topic in the allocator-WGs issue tracker. Do you want to add the full topic there or do you want me to open an issue with a short description and a link? I think IRL.org has a greater range, but the idea should be tracked somewhere.