`VecDeque`: allow `len == cap` to make full usage of memory in the `RawVec`

frank-king · March 23, 2022, 2:46am

In the current implementation of VecDeque, the length is always less than actual capacity because we cannot distinguish an empty deque with a full one when head == tail. It wastes double space when the length is exact a power of 2. And it also causes unnecessary allocation of VecDeque::new() (see #rust/68990).

I'd like to make a change to introduce a start: usize and a len: usize field instead of the old tail: usize and head: usize, which do not change the memory layout and can easily distinguish empty from full with len == 0 and len == cap.

BTW, I wonder why tail corresponds to the front end and head corresponds to the back end? It is kind of counterintuitive. I used to always use head corresponds to front and tail corresponds to back when talking about queues. I also found a similar convention in the wikipedia:

In computer science, a double-ended queue (abbreviated to deque , pronounced deck , like "cheque"[1]) is an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail).

To avoid such ambiguity, I'd prefer to using start to represent the start index of the deque instead of head or tail.

scottmcm · March 23, 2022, 3:12am

One reason to have them separate is so that the pops only need to update one field, not both. That's also why the slice iterators are two pointers instead of just pointer + length the way normal slices are.

I wouldn't couple those; supporting cap == 0 should be completely doable, just would need a good test suite and careful analysis to ensure every corner is covered.

What I really wish we had, though, was a representative set of VecDeque benchmarks. Because it's really not obvious to me that the power-of-two restriction is even all that important -- sure, it make modular indexing faster, but we don't actually do full modular indexing. It's only ever - if i >= cap { cap } else { 0 }.

And opening up "yes, you can always convert a Vec into a VecDeque without it needing to allocate" would be a way bigger ecosystem boost, in my estimation, than saving the occasional cmov.

frank-king · March 23, 2022, 6:59am

I've found these benchs in VecDeque.

github.com

rust-lang/rust/blob/c3cff0a75412f18971a325640a5dd7bd67b28972/library/alloc/src/collections/vec_deque/tests.rs#L3-L88

      
        
            #[bench]
            #[cfg_attr(miri, ignore)] // isolated Miri does not support benchmarks
            fn bench_push_back_100(b: &mut test::Bencher) {
                let mut deq = VecDeque::with_capacity(101);
                b.iter(|| {
                    for i in 0..100 {
                        deq.push_back(i);
                    }
                    deq.head = 0;
                    deq.tail = 0;
                })
            }
            
            
#[bench]
            #[cfg_attr(miri, ignore)] // isolated Miri does not support benchmarks
            fn bench_push_front_100(b: &mut test::Bencher) {
                let mut deq = VecDeque::with_capacity(101);
                b.iter(|| {
                    for i in 0..100 {
                        deq.push_front(i);

This file has been truncated. show original

And I also found a benchmark among std::deque, std::vector, std::list in C++:

github.com

wichtounet/articles/blob/master/src/vector_list/bench.cpp

//=======================================================================
// Copyright (c) 2014 Baptiste Wicht
// Distributed under the terms of the MIT License.
// (See accompanying file LICENSE or copy at
//  http://opensource.org/licenses/MIT)
//=======================================================================

#include <random>
#include <array>
#include <vector>
#include <list>
#include <algorithm>
#include <deque>
#include <thread>
#include <iostream>
#include <cstdint>
#include <typeinfo>
#include <memory>

#include "plf_timsort.h"

This file has been truncated. show original

We can build a Rust version to bench our VecDeque, and the boost::circular_buffer in C++ could also be a reference to compare since it has a similar implemention to VecDeque.

matklad · March 23, 2022, 8:50am

Related classical post: I've been writing ring buffers wrong all these years

PoignardAzur · March 23, 2022, 10:20am

Is that such a big performance boost, though?

Updating two fields is just a second read+write on an address that's probably in L1 cache anyway. I guess it would increase pressure on the register allocator.

On the other hand, I'd expect wasted allocations to be a lot more expensive.

(Obviously that's idle wondering, and the right answer is "we'd need a benchmark to check")

elidupree · March 23, 2022, 1:50pm

...huh. Thinking about these 2 things in close proximity to each other, I'm not sure the blog post's "third solution" even needs to the power-of-2 based either! You might be able to get away with making the indices be wrapped into the range 0..len()*2, so you don't need %, just trivial if-else blocks

PoignardAzur · March 23, 2022, 10:46pm

This would add a constraint that a VecDequeue can't have a capacity larger than usize::MAX / 2.

Obviously not a problem on 64-bits targets, probably fine on 32-bits targets, but it might be a breaking change on 16-bits targets? I dunno.

scottmcm · March 23, 2022, 11:08pm

It's already forbidden to allocate more than isize::MAX (aka usize::MAX / 2) bytes anyway, so that would only affect ZSTs.

So that's either likely-unimportant, or could we dealt with by using a completely different implementation strategy for ZSTs, like slice iterators do.

system · June 21, 2022, 11:09pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
`VecDeque`: `pop_front+push_back`	22	1846	July 7, 2024
VecDeque: binary search fns	3	817	December 25, 2020
[Pre-RFC] Fixed-capacity view of Vec libs	51	10547	April 27, 2019
Pre-RFC: add `contains` to `VecDeque` libs	3	1558	March 25, 2019
Make Vec::set_len enforce the len <= cap invariant	25	2018	April 14, 2019

`VecDeque`: allow `len == cap` to make full usage of memory in the `RawVec`

Related topics