`VecDeque`: allow `len == cap` to make full usage of memory in the `RawVec`

In the current implementation of VecDeque, the length is always less than actual capacity because we cannot distinguish an empty deque with a full one when head == tail. It wastes double space when the length is exact a power of 2. And it also causes unnecessary allocation of VecDeque::new() (see #rust/68990).

I'd like to make a change to introduce a start: usize and a len: usize field instead of the old tail: usize and head: usize, which do not change the memory layout and can easily distinguish empty from full with len == 0 and len == cap.


BTW, I wonder why tail corresponds to the front end and head corresponds to the back end? It is kind of counterintuitive. I used to always use head corresponds to front and tail corresponds to back when talking about queues. I also found a similar convention in the wikipedia:

In computer science, a double-ended queue (abbreviated to deque , pronounced deck , like "cheque"[1]) is an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail).

To avoid such ambiguity, I'd prefer to using start to represent the start index of the deque instead of head or tail.

1 Like

One reason to have them separate is so that the pops only need to update one field, not both. That's also why the slice iterators are two pointers instead of just pointer + length the way normal slices are.

I wouldn't couple those; supporting cap == 0 should be completely doable, just would need a good test suite and careful analysis to ensure every corner is covered.


What I really wish we had, though, was a representative set of VecDeque benchmarks. Because it's really not obvious to me that the power-of-two restriction is even all that important -- sure, it make modular indexing faster, but we don't actually do full modular indexing. It's only ever - if i >= cap { cap } else { 0 }.

And opening up "yes, you can always convert a Vec into a VecDeque without it needing to allocate" would be a way bigger ecosystem boost, in my estimation, than saving the occasional cmov.

3 Likes

I've found these benchs in VecDeque.

And I also found a benchmark among std::deque, std::vector, std::list in C++:

We can build a Rust version to bench our VecDeque, and the boost::circular_buffer in C++ could also be a reference to compare since it has a similar implemention to VecDeque.

Related classical post: I've been writing ring buffers wrong all these years

11 Likes

Is that such a big performance boost, though?

Updating two fields is just a second read+write on an address that's probably in L1 cache anyway. I guess it would increase pressure on the register allocator.

On the other hand, I'd expect wasted allocations to be a lot more expensive.

(Obviously that's idle wondering, and the right answer is "we'd need a benchmark to check")

...huh. Thinking about these 2 things in close proximity to each other, I'm not sure the blog post's "third solution" even needs to the power-of-2 based either! You might be able to get away with making the indices be wrapped into the range 0..len()*2, so you don't need %, just trivial if-else blocks

This would add a constraint that a VecDequeue can't have a capacity larger than usize::MAX / 2.

Obviously not a problem on 64-bits targets, probably fine on 32-bits targets, but it might be a breaking change on 16-bits targets? I dunno.

It's already forbidden to allocate more than isize::MAX (aka usize::MAX / 2) bytes anyway, so that would only affect ZSTs.

So that's either likely-unimportant, or could we dealt with by using a completely different implementation strategy for ZSTs, like slice iterators do.

6 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.