[Pre-RFC] Fixed-capacity view of Vec

As far as I understand, it get the lower bound on the amount of elements yielded by the iterator, and reserves space for lower_bound+1 elements. I may be reading that wrong, though.

Either way, these are just details of the current implementation and are explicitly not guaranteed to remain that way. So an implementation of a fixed capacity vector cannot use any Vec methods that insert more than one element at a time.

Which is a shame, because when implemented as a crate it’ll be missing out on specialization, so if we want e.g. extend_from_slice() to be optimized into a memcpy() we can’t do that and will have to use a separate function until specialization is stabilized.

I was figuring that if someone did want an arbitrary part to be appended and repeated, they could append it (which I guess would need another method) then repeat it. The repeating code takes advantage of the tail being the tail in its doubling.

It’s already read one when it asks for the size_hint, so it needs to add one for the element it already has. Or, more directly, it’s about to ptr::write to get_unchecked(len), which will only be safe if it’s reserved space for at least one element, and size_hint could have returned zero (as many iterators always do).

Thanks for pointing out the possible invalidation. I pushed some changes to the crate that fix this (at the cost of .extend_from_slice only being available for Copy types: This is most likely not inhibitive for multimedia uses, but still not ideal).

I also added some benchmarks based on the above decode_rle code samples.

running 4 tests
test bench_decode_rle_lib_naive ... bench:       7,465 ns/iter (+/- 739)
test bench_decode_rle_lib_opt   ... bench:       1,298 ns/iter (+/- 602)
test bench_decode_rle_naive     ... bench:       5,822 ns/iter (+/- 223)
test bench_decode_rle_vuln      ... bench:       3,160 ns/iter (+/- 284)

The other two are taken more or less verbatim from the initial post.

I’ve got even bigger gains on my machine for repeating fragment of size 256, as much as 3x:

running 4 tests
test bench_decode_rle_lib_naive ... bench:       9,606 ns/iter (+/- 167)
test bench_decode_rle_lib_opt   ... bench:       1,617 ns/iter (+/- 634)
test bench_decode_rle_naive     ... bench:       9,128 ns/iter (+/- 196)
test bench_decode_rle_vuln      ... bench:       5,361 ns/iter (+/- 106)

I’ve fiddled a bit with the benchmarking harness, tried changing const into black-boxed variables to see if that affects performance (it didn’t).

On the other hand, bench_decode_rle_lib_opt is 2x slower than vuln for size 2. It only reaches performance parity on size 4 for me.*

I’ll try to add capacity(), fill() and fill_with() methods shortly and see how that works.

Also, I’m still wondering if such a structure would be useful for replacing slices in multimedia. So far the results with push() vs unsafe + copying by index are not encouraging, with push() being 2x slower across the board. I’ll toy with resize_with() and see if I can get better results with it.

*after I’ve annotated functions #[inline], before that it reached performance parity only on 5 and up. I’ve opened a PR to add inlining and a unit test.

1 Like
fn decode_rle_lib_optim(
    buffer: &mut Vec<u8>,
    repeating_fragment_len: usize,
    num_bytes_to_fill: usize,
) {
    // Initial copy so that vec looks like
    // o o o o o [ slice ] # # # # #
    let mut left_to_fill = num_bytes_to_fill;
    let mut fill_size = std::cmp::min(left_to_fill, repeating_fragment_len);
    let copied_data_start = buffer.len();
    {
        let (src, mut append) = buffer.with_fixed_capacity(left_to_fill);
        let slice_to_repeat = &src[(src.len() - fill_size)..]; // figure out what to repeat
        append.extend_from_slice(slice_to_repeat);
    }
    left_to_fill -= fill_size;
    
    // Now we can double the items we copy each time we call extend_from_slice
    // #1: o o o o o [ slice ] # # # # # # # # # # # # # # # # # # ...
    // #2: o o o o o [ slice ] [ slice ] # # # # # # # # # # # # # ...
    // #3: o o o o o [ slice ] [ slice ] [ slice ] [ slice ] # # # ...
    // #4: ....
    while left_to_fill > 0 {
        fill_size = std::cmp::min(left_to_fill, fill_size);
        let (src, mut append) = buffer.with_fixed_capacity(left_to_fill);
        let slice_to_repeat = &src[copied_data_start..(copied_data_start + fill_size)]; // figure out what to repeat
        append.extend_from_slice(slice_to_repeat);
        left_to_fill -= fill_size;
        fill_size *= 2;
    }
}

With this I get consistent results for all buffer sizes:

repeating_fragment_len == 1

test bench_decode_rle_lib_naive ... bench:       7,336 ns/iter (+/- 1,211)
test bench_decode_rle_lib_opt   ... bench:       1,426 ns/iter (+/- 642)
test bench_decode_rle_naive     ... bench:       6,291 ns/iter (+/- 636)
test bench_decode_rle_vuln      ... bench:       4,221 ns/iter (+/- 631)

repeating_fragment_len == 512

test bench_decode_rle_lib_naive ... bench:       7,234 ns/iter (+/- 951)
test bench_decode_rle_lib_opt   ... bench:       1,518 ns/iter (+/- 557)
test bench_decode_rle_naive     ... bench:       6,642 ns/iter (+/- 433)
test bench_decode_rle_vuln      ... bench:       3,543 ns/iter (+/- 306)
3 Likes

So, who else read that source code snippit carefully enough to spot the unsoundness? :wink:

4 Likes

I forgot to link to it, but yes, that was the main inspiration :wink:

Oh. Ooooooh.

“Good catch” doesn’t even begin to describe it. Kudos!

This makes me want to have a QuickCheck testsuite for the entire Rust stdlib even more than I did before. Work on it has been started but it sure could use more attention!

2 Likes

I had my doubts wrt performance of copies of exponentially increasing size, but this is excellent news!

The only downside is that I really don’t want to reimplement that in every crate, especially in light of the Rust PR linked above. So including something like repeat(slice_to_repeat: &[T], times: usize) in FixedCapacityVec would help a great deal.

@scottmcm while we’re at it, I don’t think repeat_tail() is a good idea from the API standpoint because to repeat an arbitrary subslice it requires the API user to call extend_from_slice() once and then repeat_tail with times-1 as a parameter, which requires the user to deal with the underflow (and to notice that it’s possible in the first place). Especially in light of the PR linked above.

1 Like

On the subject of property based testing, checkout https://github.com/altsysrq/proptest.

Now that the vulnerability discovered by @scottmcm has been announced, I can talk about it in less vague terms :smile:

Fascinatingly, this vulnerability in the standard library existed because… drum roll somebody wanted an efficient way to append contents of vector to the same vector, and had to use unsafe to make it possible. And messed up math, because unsafe is really, really hard.

If they had a way to borrow existing vector contents and append to it at the same time, safely, it would not have happened.

So now we have yet another precedent for the need for this kind of abstraction, and it comes from stdlib itself.

Note, of course, that this has also demonstrated that having it in std isn’t necessarily any safer. It could live in a carefully-written crate just as well.

2 Likes

One interpretation of a security advisory for std: well, std can have soundness bugs too.

Another interpretation: maybe std gets more scrutiny, so the bugs get found.

(And it’s not an either-or; both can be true, and likely are.)

2 Likes

Awesome stuff. I love how we are gathering a group of people that cares about safe abstractions and comes up with new ones when the existing ones don’t suffice :heart:. I’m sure y’all will come up with enough of them that I will never run out of projects to verify their safety formally :wink: .

That issue mentions that using xargo to compile your own libstd works. Have you tried that? xargo isn’t hard to use, feel free to ping me if you need assistance.

4 Likes

I suggested in Zulip making this a VecViewMut<'a, T>(*mut T, usize, usize, PhantomData<'a>) or similar instead, and allowing this to be constructed not only from &mut Vec<T>, but also &mut ArrayVec<[T; N]>, &mut SmallVec<[T; N]>, etc. via a VectorLike trait. That might make this type more useful, since now you can use it to take “growable slices” of all these vector types without having to use a generic function (e.g. before one might had to use fn foo<V: VecLike<Item=u8>>(v: v) {} but with this one can write fn foo<'a>(v: VecViewMut<'a, u8>) { }). This type of view screms “Custom DSTs” though.

1 Like

An interesting consequence of a fixed-capacity view of a vector, is that, because the vector cannot grow through the view, inserting element through it never invalidates references to elements in the vector. That is, one could iterate over the elements of such a view, while inserting elements through it, something that cannot be done with just Vec.

4 Likes

This could also be expanded to a fixed-capacity view of a String.

Additionally, there could be two variants: An owned and a borrowed one: FixedVec/String and FixedVec/StringView. The owned variant is constructed with a fixed capacity while the borrowed one is a view into another non-fixed one.

I wrote a similar abstraction in the buffer crate. It allows you to pass uninitialized buffers of bytes around. This allows things like

let mut vec = Vec::with_capacity(1024);
if reader.read_buffer(&mut vec)?.len() != 0 {
    if vec[0] == 0 {
        // ...
    }
}

in safe Rust. It actually does have one reverse dependency now. :open_mouth: But I use it extensively in crates not published on crates.io, it makes for a very good abstraction for 1) networking, 2) compression/decompression, 3) building protocol packets, all with no or minimal allocations. (You can use an uninitialized ArrayVec as well as a Vec with spare capacity.)

This all works without (or at least with little) monomorphisation cost, because each function taking a B: Buffer can immediately convert it into a BufferRef, so the function won’t be instantiated for each possible buffer type. The conversion makes sure that the underlying buffer gets resized before the function returns.

What do you think about this?

2 Likes

Can’t OP problem be solved by something like VecCapView? It will provide opaque view into capacity space and will allow to push data into it. Something like this:

let mut buf = Vec::with_capacity(12);
buf.extend_from_slice(b"hi");
// `cap_view` is somewhat similar to `split_at`, we also can add `cap_view_mut`,
// which will return `&mut [T]` instead of `&[T]`.
// Under the hood `VecCapView` contains mutable reference to the parent `Vec`
let (data: &[u8], mut cap_view: VecCapView) = buf.cap_view();
// values are written into memory, but not committed yet (i.e. `len` field
// is left unchanged, so if we'll forget the view, vector will not change),
// each method checks if there is enough capacity is left in the view
// returning an error otherwise
cap_view.push(b':')?;
cap_view.extend([b'f', b'o', b'o', b'='].iter().cloned())?;
cap_view.extend_from_slice(data)?;
// now it's time to "commit" data, this method consumes the view
// and simply bumps `len` field of the parent vector
cap_view.commit();

// assuming `data` is out of scope
assert_eq!(&buf[..], b"hi:foo=hi")

I like the simplicity of the VecCapView idea. Why not automatically commit the length on dropping the view though?