In the current implementation of VecDeque, the length is always less than actual capacity because we cannot distinguish an empty deque with a full one when head == tail. It wastes double space when the length is exact a power of 2. And it also causes unnecessary allocation of VecDeque::new() (see #rust/68990).
I'd like to make a change to introduce a start: usize and a len: usize field instead of the old tail: usize and head: usize, which do not change the memory layout and can easily distinguish empty from full with len == 0 and len == cap.
BTW, I wonder why tail corresponds to the front end and head corresponds to the back end? It is kind of counterintuitive. I used to always use head corresponds to front and tail corresponds to back when talking about queues. I also found a similar convention in the wikipedia:
In computer science, a double-ended queue (abbreviated to deque , pronounced deck , like "cheque"[1]) is an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail).
To avoid such ambiguity, I'd prefer to using start to represent the start index of the deque instead of head or tail.
One reason to have them separate is so that the pops only need to update one field, not both. That's also why the slice iterators are two pointers instead of just pointer + length the way normal slices are.
I wouldn't couple those; supporting cap == 0 should be completely doable, just would need a good test suite and careful analysis to ensure every corner is covered.
What I really wish we had, though, was a representative set of VecDeque benchmarks. Because it's really not obvious to me that the power-of-two restriction is even all that important -- sure, it make modular indexing faster, but we don't actually do full modular indexing. It's only ever - if i >= cap { cap } else { 0 }.
And opening up "yes, you can always convert a Vec into a VecDeque without it needing to allocate" would be a way bigger ecosystem boost, in my estimation, than saving the occasional cmov.
We can build a Rust version to bench our VecDeque, and the boost::circular_buffer in C++ could also be a reference to compare since it has a similar implemention to VecDeque.
Updating two fields is just a second read+write on an address that's probably in L1 cache anyway. I guess it would increase pressure on the register allocator.
On the other hand, I'd expect wasted allocations to be a lot more expensive.
(Obviously that's idle wondering, and the right answer is "we'd need a benchmark to check")
...huh. Thinking about these 2 things in close proximity to each other, I'm not sure the blog post's "third solution" even needs to the power-of-2 based either! You might be able to get away with making the indices be wrapped into the range 0..len()*2, so you don't need %, just trivial if-else blocks