Is custom allocators the right abstraction?

The storage have to return a pointer or a reference to the current value. The trait is looking similar to this:

pub trait Storage<T: ?Sized> {
    fn as_ptr(&self) -> *const T;
    fn as_mut_ptr(&mut self) -> *mut T;
}

For clarification, I'll use AllocatorStorage<T: ?Sized, A: AllocRef>. Storage<T> is implemented for AllocatorStorage<T>, so it returns a poitner to T. Box's storage parameter defaults to AllocatorStorage<T, Global>, so calling as_ptr on the storage returns T. When creating a Box with new_uninit, the returned type is Box<MaybeUninit<T>, AllocatorStorage<MaybeUninit<T>, A>> so calling Box::assume_init requires to change the storage type. For AllocatorStorage this is basically casting the underlying pointer, but for other storages like [MaybeUninit<T>; N] a mem::transmute (actually a workaround is required) is needed. Generally, this requires memcpy: https://godbolt.org/z/47sxre

I don't know what's the plan on this issue, maybe this case can be optimized out somehow. I'd go that route for now, but I'm not super happy with it.

This may be a solution. Currently I'm using associative types instead of generics anyway, so Storage may not need a type at all. For "real" collections, we could use sth. like ContiguousStorage<Value=T>.

Does the new_uninit apply to anything else than Box, Rc and Arc? In other words: does it occur in any place, where contiguous storage is a thing?

:laughing:

1 Like

Is there a reason that [MaybeUninit<T>; N] can't just be used as the storage for T? In general, my understanding is that storages would deal in MaybeUninit anyway, and the container would be in charge of tracking initialization. This makes working with storages unsafe, but that's not that big of a deal, since they replace working with allocation, which is always unsafe.

This is working, you simply pass [MaybeUninit<T>; N] to the box, and it returns Box<T, [MaybeUninit<T>; N]>. However, how the new_uninit API would look like? Let's assume, we call it

fn new_uninit_in(storage: S) -> Box<MaybeUninit<T>, S>

What do you pass?

  • [MaybeUninit<T>; N] derefs to T, so it can't be used
  • [MaybeUninit<MaybeUninit<T>>; N] (urgh) derefs to MaybeUninit<T> (fine), but when calling Box::assume_init it requires the conversion from
    Box<MaybeUninit<T>, [MaybeUninit<MaybeUninit<T>>; N]>
    
    to
    Box<T, [MaybeUninit<T>; N]>
    
    which requires the storage to be changed.

Regardless, what is passed to new_uninit, it derefs to one specified type. Changing the type of the Box also requires the Storage to be changed to deref to the new type.

I don't know if the storage-API will work for Box (and similar) in general. As you noticed, those types can be used to store dynamically sized types. Since impl Trait in return position, it's probably not strictly needed anymore, but it's still have to be supported. However implementing CoerceUnsized for Box fails, as

The trait CoerceUnsized may only be implemented for a coercion between structures with one field being coerced

but Box<T, S> don't store a field, which needs coercion, the data lives in S. Maybe I get something wrong here, I'm not a compiler expert, but it appears, that this won't work as of now.

Not necessarily, if you look on the main branch for generic-vec, you'll see that impl<T, U, const N: usize> Storage<U> for UninitArray<T, N> { ... } (here) where T and U could have very different sizes, but as long as the alignment of UninitArray<T, N> is larger than U it should be fine. For Box you'll also want the restriction that that the storage size is at least the same as size_of::<U>().

(note: currently I have a bug where I don't check the alignments, but that will soon be fixed) now fixed

I'll look into your implementation tomorrow. I think I have a solution to this, too but let's see...

Hopefully I'll find enough spare time to push this forward, it's by far the most promising proposal for collections I have seen!


I filed an issue in the WG repository.


Update:

I updated my crate to v0.1.0. It's may worth a look at RawBox and Box. Main features:

  • typed buffers with only two traits (Buffer and UnmanagedBuffer)
  • AllocatedBuffer provided
  • native arrays are supported
  • Two APIs: raw API with call-side-dependency-injection and non-raw wrapper API, which behaves like the current api
  • Coercion is working
  • Unsized types are working

@TimDiekmann looking at the Storage trait, maybe we could consider adding AsPtr and AsMutPtr traits to the std. It feels cleaner than the Storage trait.

I am not sure if it is worth it. Considering that pointer dereferencing is generally unsafe and Storage (or Buffer as it is called in my crate) is just a backend that the usual user will rarely interact with, I think a Buffer trait is more flexible. Of course, an AsPtr-trait could make sense in other places too, but I don't know how well it is backwards compatible.

I would stick with the Buffer-trait for now.

How does this decide whether to use internal storage or the heap storage?

I do fee likel @matklad's idea is a reason to switch back to &mut self or even self for allocators. With the allocators passed in at the last moment rather than stored, there is no longer the issue of storing allocators for multiple collections in a single threaded context. The dreaded AllocOnce AllocMut Alloc hierarchy can also come up based on whether it's a flat or non-flat data structure.

The main issue is it is also nice to pass in at deallocation time, but how to we prevent dropping without the allocator being passed in? This was my #1 reason for wanting linear types, years ago when I clamored about these things.

3 Likes

This is massively invasive, requiring to change every single API, and requiring users to follow a very strict discipline since the same allocator must be passed to every single call -- turning every single function into an unsafe function OR mandating that allocators be able to recognize their own allocations. Honestly, it seems completely impractical.

Note that even in Zig, where there is no global allocator, collections tend to capture the allocator in their constructor and use it throughout, including for destruction.

If you want to start exploring the possibility, I would kindly ask that you open a separate topic as your proposal is at the moonshot stage (not even a mini-draft/prototype, no experience that I can see) which means it'll require significant brainstorming to go anywhere.

2 Likes

Another interesting point here is that in Zig it's easier to handle "bring your own allocator"-style collections, because it uses value-based defer, rather than type-based destructors. So you, eg, can close over a single allocator local var to drop to local vars holding vectors.

@TimDiekmann, have you been able to make headway with your reference implementation? It seems like an interesting concept to explore.

Thanks to @matthieum for raising this before we finalised the allocator API. It's always valuable to explore new avenues, at any point in the development process :grinning_face_with_smiling_eyes:

1 Like

I wasn't able to spend much more time recently so the repository is at the latest state. I'll investigate this further as soon as I find some spare time. :slightly_smiling_face:

1 Like

I've been battling with trait representations.

It's easy enough to get the size and pointer of a slice, store the size and content separately, then reassemble them on the fly, however I could not find a (good) API for traits -- using transmute_copy on the unstable TraitObject representation is rather error-prone -- and more importantly I could not find a generic way to take a ?Sized object, break it down into meta-data + data, and reassemble it.

The closest I've found is the thin crate, and by default it only handles a couple of traits (by implementing a custom trait for them).

I'm thinking we may be lacking a more fundamental API on which RawBox<dyn Trait, 48> can be built.

1 Like

There isn't one! That's what all these open RFCs are trying to achieve

Even some closed ones

5 Likes

I left a comment on RFC 2984 which I think is the most minimal.

A straightforward extension to the RFC is all it takes to be able to break down a pointer into its raw parts (meta-data and pointer to data) and stitch them back together afterwards.

1 Like

I'd like to point to you Coca's author experiments.

Of particular interest is the use of the set_ptr_value feature to reset the data-pointer of a fat-pointer. By using it, @teryror manages to handle Sized, slices, and traits in a generic manner (preserving their meta-data).

It's not ideal as the original pointer is dead-weight, adding up an extra 8 bytes, however it's working right now which is nice to further explore the API design.

1 Like

Oh....shit. I spent ages haranguing people about how we would need linear types for this issue. But linear types that simply require all bindings be defer-ed rather than only working with out unwinding would probably get much less push-back, and are exactly what this needs.

(To be precise, it's the unwinding that requires the defer, not the linear-typed values themselves, but people were and presumably are uncomfortable with no-unwinding changing the language that deeply, so I won't go there.)

I most certainly do not have the bandwidth to write this RFC, but someone else probably should. It will make custom allocator stuff 100x better.

1 Like