Why doesn't Vec use extra capacity given by the allocator?

While testing my allocator, I was quite surprised that this failed:

#[test]
fn test_vec_capacity() {
	// Only ever allocates a single block of 1024 bytes
	let alloc = Stalloc::<1, 1024>::new();

	let v: Vec<u8, _> = Vec::with_capacity_in(1, &alloc);
	assert!(v.capacity() == 1024);
}

I went digging through the standard library to figure out how Vec allocates, and I found this:

// Allocators currently return a `NonNull<[u8]>` whose length
// matches the size requested. If that ever changes, the capacity
// here should change to `ptr.len() / mem::size_of::<T>()`.
Ok(Self { ptr: Unique::from(ptr.cast()), cap: unsafe { Cap(capacity) }, alloc })

In this code, capacity is the capacity chosen by the user, not the capacity actually returned by the allocator. This seems quite inefficient, and actually causes an OOM here:

// Only ever allocates a single block of 1024 bytes
let alloc = Stalloc::<1, 1024>::new();

let mut v: Vec<u8, _> = Vec::with_capacity_in(9, &alloc);
for _ in 0..1024 {
	v.push(7);
}

Because the Vec doubles its capacity each time, it eventually tries to allocate 1152 bytes, exceeding the capacity of the allocator. This wouldn't have happened if the Vec had realized that its capacity was exactly 1024 bytes all along. But what I'm really curious about is the comment, which references the behaviour of certain unnamed "allocators" but suggests that their behaviour might change at some point. To my knowledge, Rust doesn't have any official allocators and simply uses whatever allocator the system or the user provides.

This comment is also hard to square with the documentation for Allocator::allocate, which specifically states: "The returned block may have a larger size than specified by layout.size()". That seems to suggest that over-allocation is a supported use case.

The comment seems to also be contradicted by the documentation for Vec::reserve_exact: "Note that the allocator may give the collection more space than it requests. Therefore, capacity can not be relied upon to be precisely minimal." The documentation neglects to mention that extra capacity offered by the allocator is entirely ignored.

I would recommend that Vec be changed to use the capacity returned by the allocator rather than the user-provided capacity. This would apply to both the growing and shrinking methods. I don't think this would have any significant performance impact for typical applications, but this would enable improved performance and memory efficiency when using a custom allocator that supports over-allocation.

6 Likes

It might be because Allocator is unstable?

Also: Does dealloc need to be called with the layout used to allocate, or should it match the allocation returned? The docs state that:

unsafe fn deallocate(&self, ptr: NonNull<u8>, layout: Layout)

Deallocates the memory referenced by ptr.

Safety
  • ptr must denote a block of memory [currently allocated](core::alloc - Rust> trait.Allocator.html#currently-allocated-memory) via this allocator, and
  • layout must fit that block of memory.

also does the size of the returned allocation need to be a multiple of the align used?

for example if Vec<u16> asks for 4 * 16 = 64 bytes with alignment of 2 is it valid to return layout of size 65? since Vec doesn't store size in bytes but in cap * size_of::<T>() its impossible to recover the original layout.

2 Likes

From what I can see, link in your quote (fit) is accepting Vec's behavior:

  • The provided layout.size() must fall in the range min ..= max, where:
    • min is the size of the layout most recently used to allocate the block, and
    • max is the latest actual size returned from allocate, grow, or shrink.
3 Likes

The API specs are designed to allow Vec to use that capacity, but iirc the last time it was tried it didn't optimize well (I don't have a link at hand though).

5 Likes

One intuition for why this isn't quite as inefficient as it seems is that vec is using realloc. So the extra allocated capacity is maybe hopefully being mostly used as we fill the vec, although via several layers of indirect bookkeeping. This could be slightly better, as you note, but it isn't bad now.

The other aspect is that most of the time in performance critical sections Vec::with_capacity/Vec::reserve is being called at the start, either by some sort of iterator/constructor or by, well, me.

1 Like

The biggest reason is that 99.9% of Vec use the global allocator, and the GlobalAlloc cannot report any surplus allocated space. Because of the massive number of times the Vec allocation code is monomorphized, reusing the already calculated needed capacity instead of grabbing the (unknown) allocated byte length and transforming it back into capacity makes a significant impact to compile time and effectively no impact to performance.

6 Likes