[Pre-RFC] Fixed-capacity view of Vec


#21

Could you elaborate how a fixed-capacity view of vector would help with that?

I don’t think out-of-order writes can be made safe without incurring significant runtime overhead.

Edit: ah, I think you meant you’ve run into the “copy from earlier in the slice” problem too. It’s good to know that I’m not wrong about this being a common pattern in multimedia decoding.


#22

I’ve just started implementing a prototype crate that extends Vec with a with_fixed_capacity(usize) method.

It behaves a bit like slice::split_at_mut, but instead of two slices, you get one slice (“the already existing content”) and a struct that acts as a safe proxy to the Vec. It should ensure that there is no way for the Vec to reallocate.

@Shnatsel Is this what you imagined?


#23

Wow! Yes, that’s pretty much it!

I’ve started pretty much the same thing myself, but the following line from Vec documentation stopped me in my tracks:

Bulk insertion methods may reallocate, even when not necessary.

So according to the documentation, this works:

    pub fn push(&mut self, item: T) {
        assert!(self.additional_cap() > 0);
        self.buffer.push(item)
}

But this may reallocate and invalidate our precious slice:

    pub fn extend_from_slice(&mut self, other: &[T]) {
        assert!(other.len() <= self.additional_cap());
        self.buffer.extend_from_slice(other)
}

This is not the case in practice for slices of Copy types because it specializes into an exact space reservation, but the generic implementation does in fact allocate needlessly.

However, that’s nothing a bit of custom unsafe code can’t fix.


#24

Ouch, I’ve accidentally submitted the post early.

Anyway, those issues can be fixed with a custom unsafe implementation of extend_from_slice. So we’re probably good!

I look forward to toying with the API and actually running benchmarks of this vs other approaches in real code.

Thank you so much for doing this!


#25

What about that is needless? reserve “Does nothing if capacity is already sufficient”.


#26

As far as I understand, it get the lower bound on the amount of elements yielded by the iterator, and reserves space for lower_bound+1 elements. I may be reading that wrong, though.

Either way, these are just details of the current implementation and are explicitly not guaranteed to remain that way. So an implementation of a fixed capacity vector cannot use any Vec methods that insert more than one element at a time.

Which is a shame, because when implemented as a crate it’ll be missing out on specialization, so if we want e.g. extend_from_slice() to be optimized into a memcpy() we can’t do that and will have to use a separate function until specialization is stabilized.


#27

I was figuring that if someone did want an arbitrary part to be appended and repeated, they could append it (which I guess would need another method) then repeat it. The repeating code takes advantage of the tail being the tail in its doubling.

It’s already read one when it asks for the size_hint, so it needs to add one for the element it already has. Or, more directly, it’s about to ptr::write to get_unchecked(len), which will only be safe if it’s reserved space for at least one element, and size_hint could have returned zero (as many iterators always do).


#28

Thanks for pointing out the possible invalidation. I pushed some changes to the crate that fix this (at the cost of .extend_from_slice only being available for Copy types: This is most likely not inhibitive for multimedia uses, but still not ideal).

I also added some benchmarks based on the above decode_rle code samples.

running 4 tests
test bench_decode_rle_lib_naive ... bench:       7,465 ns/iter (+/- 739)
test bench_decode_rle_lib_opt   ... bench:       1,298 ns/iter (+/- 602)
test bench_decode_rle_naive     ... bench:       5,822 ns/iter (+/- 223)
test bench_decode_rle_vuln      ... bench:       3,160 ns/iter (+/- 284)

The other two are taken more or less verbatim from the initial post.


#29

I’ve got even bigger gains on my machine for repeating fragment of size 256, as much as 3x:

running 4 tests
test bench_decode_rle_lib_naive ... bench:       9,606 ns/iter (+/- 167)
test bench_decode_rle_lib_opt   ... bench:       1,617 ns/iter (+/- 634)
test bench_decode_rle_naive     ... bench:       9,128 ns/iter (+/- 196)
test bench_decode_rle_vuln      ... bench:       5,361 ns/iter (+/- 106)

I’ve fiddled a bit with the benchmarking harness, tried changing const into black-boxed variables to see if that affects performance (it didn’t).

On the other hand, bench_decode_rle_lib_opt is 2x slower than vuln for size 2. It only reaches performance parity on size 4 for me.*

I’ll try to add capacity(), fill() and fill_with() methods shortly and see how that works.

Also, I’m still wondering if such a structure would be useful for replacing slices in multimedia. So far the results with push() vs unsafe + copying by index are not encouraging, with push() being 2x slower across the board. I’ll toy with resize_with() and see if I can get better results with it.

*after I’ve annotated functions #[inline], before that it reached performance parity only on 5 and up. I’ve opened a PR to add inlining and a unit test.


#30
fn decode_rle_lib_optim(
    buffer: &mut Vec<u8>,
    repeating_fragment_len: usize,
    num_bytes_to_fill: usize,
) {
    // Initial copy so that vec looks like
    // o o o o o [ slice ] # # # # #
    let mut left_to_fill = num_bytes_to_fill;
    let mut fill_size = std::cmp::min(left_to_fill, repeating_fragment_len);
    let copied_data_start = buffer.len();
    {
        let (src, mut append) = buffer.with_fixed_capacity(left_to_fill);
        let slice_to_repeat = &src[(src.len() - fill_size)..]; // figure out what to repeat
        append.extend_from_slice(slice_to_repeat);
    }
    left_to_fill -= fill_size;
    
    // Now we can double the items we copy each time we call extend_from_slice
    // #1: o o o o o [ slice ] # # # # # # # # # # # # # # # # # # ...
    // #2: o o o o o [ slice ] [ slice ] # # # # # # # # # # # # # ...
    // #3: o o o o o [ slice ] [ slice ] [ slice ] [ slice ] # # # ...
    // #4: ....
    while left_to_fill > 0 {
        fill_size = std::cmp::min(left_to_fill, fill_size);
        let (src, mut append) = buffer.with_fixed_capacity(left_to_fill);
        let slice_to_repeat = &src[copied_data_start..(copied_data_start + fill_size)]; // figure out what to repeat
        append.extend_from_slice(slice_to_repeat);
        left_to_fill -= fill_size;
        fill_size *= 2;
    }
}

With this I get consistent results for all buffer sizes:

repeating_fragment_len == 1

test bench_decode_rle_lib_naive ... bench:       7,336 ns/iter (+/- 1,211)
test bench_decode_rle_lib_opt   ... bench:       1,426 ns/iter (+/- 642)
test bench_decode_rle_naive     ... bench:       6,291 ns/iter (+/- 636)
test bench_decode_rle_vuln      ... bench:       4,221 ns/iter (+/- 631)

repeating_fragment_len == 512

test bench_decode_rle_lib_naive ... bench:       7,234 ns/iter (+/- 951)
test bench_decode_rle_lib_opt   ... bench:       1,518 ns/iter (+/- 557)
test bench_decode_rle_naive     ... bench:       6,642 ns/iter (+/- 433)
test bench_decode_rle_vuln      ... bench:       3,543 ns/iter (+/- 306)

#31

So, who else read that source code snippit carefully enough to spot the unsoundness? :wink:


#32

I forgot to link to it, but yes, that was the main inspiration :wink:


#33

Oh. Ooooooh.

“Good catch” doesn’t even begin to describe it. Kudos!

This makes me want to have a QuickCheck testsuite for the entire Rust stdlib even more than I did before. Work on it has been started but it sure could use more attention!


#34

I had my doubts wrt performance of copies of exponentially increasing size, but this is excellent news!

The only downside is that I really don’t want to reimplement that in every crate, especially in light of the Rust PR linked above. So including something like repeat(slice_to_repeat: &[T], times: usize) in FixedCapacityVec would help a great deal.

@scottmcm while we’re at it, I don’t think repeat_tail() is a good idea from the API standpoint because to repeat an arbitrary subslice it requires the API user to call extend_from_slice() once and then repeat_tail with times-1 as a parameter, which requires the user to deal with the underflow (and to notice that it’s possible in the first place). Especially in light of the PR linked above.


#35

On the subject of property based testing, checkout https://github.com/altsysrq/proptest.


#36

Now that the vulnerability discovered by @scottmcm has been announced, I can talk about it in less vague terms :smile:

Fascinatingly, this vulnerability in the standard library existed because… drum roll somebody wanted an efficient way to append contents of vector to the same vector, and had to use unsafe to make it possible. And messed up math, because unsafe is really, really hard.

If they had a way to borrow existing vector contents and append to it at the same time, safely, it would not have happened.

So now we have yet another precedent for the need for this kind of abstraction, and it comes from stdlib itself.


#37

Note, of course, that this has also demonstrated that having it in std isn’t necessarily any safer. It could live in a carefully-written crate just as well.


#38

One interpretation of a security advisory for std: well, std can have soundness bugs too.

Another interpretation: maybe std gets more scrutiny, so the bugs get found.

(And it’s not an either-or; both can be true, and likely are.)


#39

Awesome stuff. I love how we are gathering a group of people that cares about safe abstractions and comes up with new ones when the existing ones don’t suffice :heart:. I’m sure y’all will come up with enough of them that I will never run out of projects to verify their safety formally :wink: .

That issue mentions that using xargo to compile your own libstd works. Have you tried that? xargo isn’t hard to use, feel free to ping me if you need assistance.


#40

I suggested in Zulip making this a VecViewMut<'a, T>(*mut T, usize, usize, PhantomData<'a>) or similar instead, and allowing this to be constructed not only from &mut Vec<T>, but also &mut ArrayVec<[T; N]>, &mut SmallVec<[T; N]>, etc. via a VectorLike trait. That might make this type more useful, since now you can use it to take “growable slices” of all these vector types without having to use a generic function (e.g. before one might had to use fn foo<V: VecLike<Item=u8>>(v: v) {} but with this one can write fn foo<'a>(v: VecViewMut<'a, u8>) { }). This type of view screms “Custom DSTs” though.