Allocation-free conversion from String/Vec to Arc<str>/Arc<[T]>

nmilo · September 23, 2023, 7:19pm

I was reading a blog post complaining about conversion from String to Arc<str> isn't free and was like, why isn't it? We can just use the excess capacity to build the control block in-line!

Pseudo-code for Vec::into_arc_slice: (String can work the same way)

fn into_arc_slice(mut self) -> Arc<[T]> {
    shrink allocation to hold 2 * sizeof(AtomicUsize) + self.len() * sizeof(T) bytes
    shift vector data by 2 * sizeof(AtomicUsize)
    transmute to pointer to ArcInner<[T]>
    return Arc::from_raw(that pointer)
}

(Definitions of Arc and ArcInner here:)

struct Arc<T: ?Sized> {
    ptr: NonNull<ArcInner<T>>>
}

struct ArcInner<T: ?Sized> {
    strong: AtomicUsize,
    weak: AtomicUsize,
    data: T
}

But what about alignment?

Yes, that is a bit of a concern; the pointer to ArcInner needs to be 8-byte aligned, while a str/[T] is not necessarily. BUT, the global allocator uses malloc/HeapAlloc, which returns 8-byte aligned allocations anyways, so this is not technically a concern. We can check the Vec's pointer to see if it's aligned, and if not, just fall back to Arc::new. Then when Arc<T>'s allocation gets dropped we pass T's alignment to the allocator instead of the ArcInner alignment.

kpreid · September 23, 2023, 7:50pm

This would help with small strings, and so might be worth doing, but "shift vector data by 2 * sizeof(AtomicUsize)" certainly isn't “free”; for large strings it will be just as expensive as copying to a new allocation.

pitaj · September 23, 2023, 7:50pm

Yeah, I wouldn't call needing to shift the entire contents of the string "free"

pitaj · September 23, 2023, 8:06pm

An interesting approach that would be free:

impl Arc<str> {
    /// Size of the metadata information that is placed
    /// at the beginning of the `Arc` allocation, in bytes.
    const METADATA_SIZE: usize = 2*size_of::<usize>();

    /// Converts the given `String` into an `Arc<str>`,
    /// reusing the same allocation.
    /// Overwrites the beginning of the string contents 
    /// with the `Arc` metadata
    fn from_string_overwrite(s: String) -> Self
}

Example usage:

let mut s = " ".repeat(Arc::METADATA_SIZE);
s.push_str("Hello, world!");
let a = Arc::from_string_overwrite(s);

Nemo157 · September 23, 2023, 8:25pm

This is not known to be true here (and isn't true if the global allocator was changed to one that doesn't give such a minimum alignment guarantee, like jemalloc). It might be possible to do a realloc that just changes the alignment not the size, which could be a noop if the allocation was already aligned enough.

the8472 · September 23, 2023, 8:28pm

Arc<[T]> has no spare capacity while Vec<T> likely does. So it's unlikely that the allocation will fit, which is necessary to be able to deallocate it correctly later since Allocator requires the correct layout to be passed during deallocation.

Hypothetically, if it were to fit, then a better approach to avoid the shifting would be to specialize the arc layout for those types and put the metadata at the end and then use negative offsets to to access the payload. Though that'd probably cost an additional instruction when derefing the Arc.

nmilo · September 23, 2023, 9:56pm

But practically, it is true. We could check if it is with ptr::is_aligned_to and fall-back to Arc::new in the rare case that it isn't.

nmilo · September 23, 2023, 10:04pm

Arc<[T]> has no spare capacity while Vec<T> likely does. So it's unlikely that the allocation will fit

That's why we shrink_to_fit, so the Arc can safely deallocate later.

Specializing Arc to avoid shifting is a better solution, actually. We could also set the pointer to the start of the array and jump to the end to update the refcount, this would make deref cheaper since it's probably more common than updating the count.

nmilo · September 23, 2023, 10:06pm

I meant "free" as in avoiding an allocation for free. The bytes will need to be copied either way. (Though with @the8472's suggestion that too can be avoided)

Nemo157 · September 24, 2023, 6:56am

No, you can’t, the API guarantees of GlobalAlloc do not allow that, it would be unsound.

SkiFire13 · September 24, 2023, 7:42am

The allocator contract also requires to pass the original alignment with which the allocation was requested to dealloc, and Arc has no way to know that.

system · December 23, 2023, 7:42am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Mini RFC: allow transmute Arc<T> => Arc<ManuallyDrop<T>> libs	9	1192	July 25, 2020
Alloc: Vec::as_slice, String::as_str -> const libs	8	381	September 5, 2024
Byte-aligned Vec language design	13	2087	March 25, 2019
Small string optimization: remove as_mut_vec libs	71	13890	March 25, 2019
Expose a way to create an Arc fallibly libs	5	696	March 11, 2021

Allocation-free conversion from String/Vec to Arc<str>/Arc<[T]>

Related topics