[Pre-RFC] Fixed-capacity view of Vec

Shnatsel · September 17, 2018, 4:36pm

Could you elaborate how a fixed-capacity view of vector would help with that?

I don’t think out-of-order writes can be made safe without incurring significant runtime overhead.

Edit: ah, I think you meant you’ve run into the “copy from earlier in the slice” problem too. It’s good to know that I’m not wrong about this being a common pattern in multimedia decoding.

WanzenBug · September 17, 2018, 7:55pm

I’ve just started implementing a prototype crate that extends Vec with a with_fixed_capacity(usize) method.

It behaves a bit like slice::split_at_mut, but instead of two slices, you get one slice (“the already existing content”) and a struct that acts as a safe proxy to the Vec. It should ensure that there is no way for the Vec to reallocate.

@Shnatsel Is this what you imagined?

Shnatsel · September 17, 2018, 10:05pm

Wow! Yes, that's pretty much it!

I've started pretty much the same thing myself, but the following line from Vec documentation stopped me in my tracks:

Bulk insertion methods may reallocate, even when not necessary.

So according to the documentation, this works:

    pub fn push(&mut self, item: T) {
        assert!(self.additional_cap() > 0);
        self.buffer.push(item)
}

But this may reallocate and invalidate our precious slice:

    pub fn extend_from_slice(&mut self, other: &[T]) {
        assert!(other.len() <= self.additional_cap());
        self.buffer.extend_from_slice(other)
}

This is not the case in practice for slices of Copy types because it specializes into an exact space reservation, but the generic implementation does in fact allocate needlessly.

However, that's nothing a bit of custom unsafe code can't fix.

Shnatsel · September 17, 2018, 10:09pm

Ouch, I’ve accidentally submitted the post early.

Anyway, those issues can be fixed with a custom unsafe implementation of extend_from_slice. So we’re probably good!

I look forward to toying with the API and actually running benchmarks of this vs other approaches in real code.

Thank you so much for doing this!

scottmcm · September 17, 2018, 11:03pm

What about that is needless? reserve "Does nothing if capacity is already sufficient".

Shnatsel · September 17, 2018, 11:37pm

As far as I understand, it get the lower bound on the amount of elements yielded by the iterator, and reserves space for lower_bound+1 elements. I may be reading that wrong, though.

Either way, these are just details of the current implementation and are explicitly not guaranteed to remain that way. So an implementation of a fixed capacity vector cannot use any Vec methods that insert more than one element at a time.

Which is a shame, because when implemented as a crate it’ll be missing out on specialization, so if we want e.g. extend_from_slice() to be optimized into a memcpy() we can’t do that and will have to use a separate function until specialization is stabilized.

scottmcm · September 18, 2018, 12:32am

I was figuring that if someone did want an arbitrary part to be appended and repeated, they could append it (which I guess would need another method) then repeat it. The repeating code takes advantage of the tail being the tail in its doubling.

It's already read one when it asks for the size_hint, so it needs to add one for the element it already has. Or, more directly, it's about to ptr::write to get_unchecked(len), which will only be safe if it's reserved space for at least one element, and size_hint could have returned zero (as many iterators always do).

WanzenBug · September 18, 2018, 7:30pm

Thanks for pointing out the possible invalidation. I pushed some changes to the crate that fix this (at the cost of .extend_from_slice only being available for Copy types: This is most likely not inhibitive for multimedia uses, but still not ideal).

I also added some benchmarks based on the above decode_rle code samples.

running 4 tests
test bench_decode_rle_lib_naive ... bench:       7,465 ns/iter (+/- 739)
test bench_decode_rle_lib_opt   ... bench:       1,298 ns/iter (+/- 602)
test bench_decode_rle_naive     ... bench:       5,822 ns/iter (+/- 223)
test bench_decode_rle_vuln      ... bench:       3,160 ns/iter (+/- 284)

bench_decode_rle_lib_naive uses full-on iterators ( which is pretty slow (mostly because the implementation in the library is naive)
bench_decode_rle_lib_opt uses .extend_from_slice which leads to still pretty readable code

The other two are taken more or less verbatim from the initial post.

Shnatsel · September 19, 2018, 12:46am

I’ve got even bigger gains on my machine for repeating fragment of size 256, as much as 3x:

running 4 tests
test bench_decode_rle_lib_naive ... bench:       9,606 ns/iter (+/- 167)
test bench_decode_rle_lib_opt   ... bench:       1,617 ns/iter (+/- 634)
test bench_decode_rle_naive     ... bench:       9,128 ns/iter (+/- 196)
test bench_decode_rle_vuln      ... bench:       5,361 ns/iter (+/- 106)

I’ve fiddled a bit with the benchmarking harness, tried changing const into black-boxed variables to see if that affects performance (it didn’t).

On the other hand, bench_decode_rle_lib_opt is 2x slower than vuln for size 2. It only reaches performance parity on size 4 for me.*

I’ll try to add capacity(), fill() and fill_with() methods shortly and see how that works.

Also, I’m still wondering if such a structure would be useful for replacing slices in multimedia. So far the results with push() vs unsafe + copying by index are not encouraging, with push() being 2x slower across the board. I’ll toy with resize_with() and see if I can get better results with it.

*after I’ve annotated functions #[inline], before that it reached performance parity only on 5 and up. I’ve opened a PR to add inlining and a unit test.

WanzenBug · September 20, 2018, 1:20pm

fn decode_rle_lib_optim(
    buffer: &mut Vec<u8>,
    repeating_fragment_len: usize,
    num_bytes_to_fill: usize,
) {
    // Initial copy so that vec looks like
    // o o o o o [ slice ] # # # # #
    let mut left_to_fill = num_bytes_to_fill;
    let mut fill_size = std::cmp::min(left_to_fill, repeating_fragment_len);
    let copied_data_start = buffer.len();
    {
        let (src, mut append) = buffer.with_fixed_capacity(left_to_fill);
        let slice_to_repeat = &src[(src.len() - fill_size)..]; // figure out what to repeat
        append.extend_from_slice(slice_to_repeat);
    }
    left_to_fill -= fill_size;
    
    // Now we can double the items we copy each time we call extend_from_slice
    // #1: o o o o o [ slice ] # # # # # # # # # # # # # # # # # # ...
    // #2: o o o o o [ slice ] [ slice ] # # # # # # # # # # # # # ...
    // #3: o o o o o [ slice ] [ slice ] [ slice ] [ slice ] # # # ...
    // #4: ....
    while left_to_fill > 0 {
        fill_size = std::cmp::min(left_to_fill, fill_size);
        let (src, mut append) = buffer.with_fixed_capacity(left_to_fill);
        let slice_to_repeat = &src[copied_data_start..(copied_data_start + fill_size)]; // figure out what to repeat
        append.extend_from_slice(slice_to_repeat);
        left_to_fill -= fill_size;
        fill_size *= 2;
    }
}

With this I get consistent results for all buffer sizes:

repeating_fragment_len == 1

test bench_decode_rle_lib_naive ... bench:       7,336 ns/iter (+/- 1,211)
test bench_decode_rle_lib_opt   ... bench:       1,426 ns/iter (+/- 642)
test bench_decode_rle_naive     ... bench:       6,291 ns/iter (+/- 636)
test bench_decode_rle_vuln      ... bench:       4,221 ns/iter (+/- 631)

repeating_fragment_len == 512

test bench_decode_rle_lib_naive ... bench:       7,234 ns/iter (+/- 951)
test bench_decode_rle_lib_opt   ... bench:       1,518 ns/iter (+/- 557)
test bench_decode_rle_naive     ... bench:       6,642 ns/iter (+/- 433)
test bench_decode_rle_vuln      ... bench:       3,543 ns/iter (+/- 306)

scottmcm · September 20, 2018, 7:56pm

So, who else read that source code snippit carefully enough to spot the unsoundness?

github.com/rust-lang/rust

[stable] std: Check for overflow in `str::repeat`

rust-lang:stable ← alexcrichton:fix-bug-stable

opened 04:38PM - 20 Sep 18 UTC

alexcrichton

+42 -2

This commit fixes a buffer overflow issue in the standard library discovered by… Scott McMurray where if a large number was passed to `str::repeat` it may cause and out of bounds write to the buffer of a `Vec`. This bug was accidentally introduced in #48657 when optimizing the `str::repeat` function. The bug affects stable Rust releases 1.26.0 to 1.29.0. We plan on backporting this fix to create a 1.29.1 release, and the 1.30.0 release onwards will include this fix. The fix in this commit is to introduce a deterministic panic in the case of capacity overflow. When repeating a slice where the resulting length is larger than the address space, there’s no way it can succeed anyway! The standard library and surrounding libraries were briefly checked to see if there were othere instances of preallocating a vector with a calculation that may overflow. No instances of this bug (out of bounds write due to a calculation overflow) were found at this time. Note that this commit is the first steps towards fixing this issue, we'll be making a formal post to the Rust security list once these commits have been merged.

WanzenBug · September 20, 2018, 8:03pm

I forgot to link to it, but yes, that was the main inspiration

Shnatsel · September 20, 2018, 8:13pm

Oh. Ooooooh.

“Good catch” doesn’t even begin to describe it. Kudos!

This makes me want to have a QuickCheck testsuite for the entire Rust stdlib even more than I did before. Work on it has been started but it sure could use more attention!

Shnatsel · September 20, 2018, 8:37pm

I had my doubts wrt performance of copies of exponentially increasing size, but this is excellent news!

The only downside is that I really don’t want to reimplement that in every crate, especially in light of the Rust PR linked above. So including something like repeat(slice_to_repeat: &[T], times: usize) in FixedCapacityVec would help a great deal.

@scottmcm while we’re at it, I don’t think repeat_tail() is a good idea from the API standpoint because to repeat an arbitrary subslice it requires the API user to call extend_from_slice() once and then repeat_tail with times-1 as a parameter, which requires the user to deal with the underflow (and to notice that it’s possible in the first place). Especially in light of the PR linked above.

Centril · September 20, 2018, 10:10pm

On the subject of property based testing, checkout GitHub - proptest-rs/proptest: Hypothesis-like property testing for Rust.

Shnatsel · September 22, 2018, 1:57am

Now that the vulnerability discovered by @scottmcm has been announced, I can talk about it in less vague terms

Fascinatingly, this vulnerability in the standard library existed because… drum roll somebody wanted an efficient way to append contents of vector to the same vector, and had to use unsafe to make it possible. And messed up math, because unsafe is really, really hard.

If they had a way to borrow existing vector contents and append to it at the same time, safely, it would not have happened.

So now we have yet another precedent for the need for this kind of abstraction, and it comes from stdlib itself.

scottmcm · September 22, 2018, 2:33am

Note, of course, that this has also demonstrated that having it in std isn't necessarily any safer. It could live in a carefully-written crate just as well.

glaebhoerl · September 22, 2018, 8:19am

One interpretation of a security advisory for std: well, std can have soundness bugs too.

Another interpretation: maybe std gets more scrutiny, so the bugs get found.

(And it’s not an either-or; both can be true, and likely are.)

RalfJung · October 8, 2018, 3:04pm

Awesome stuff. I love how we are gathering a group of people that cares about safe abstractions and comes up with new ones when the existing ones don't suffice . I'm sure y'all will come up with enough of them that I will never run out of projects to verify their safety formally .

That issue mentions that using xargo to compile your own libstd works. Have you tried that? xargo isn't hard to use, feel free to ping me if you need assistance.

gnzlbg · November 4, 2018, 8:50pm

I suggested in Zulip making this a VecViewMut<'a, T>(*mut T, usize, usize, PhantomData<'a>) or similar instead, and allowing this to be constructed not only from &mut Vec<T>, but also &mut ArrayVec<[T; N]>, &mut SmallVec<[T; N]>, etc. via a VectorLike trait. That might make this type more useful, since now you can use it to take “growable slices” of all these vector types without having to use a generic function (e.g. before one might had to use fn foo<V: VecLike<Item=u8>>(v: v) {} but with this one can write fn foo<'a>(v: VecViewMut<'a, u8>) { }). This type of view screms “Custom DSTs” though.

Topic		Replies	Views
[RFC] Vec::append_from_within() language design	2	989	September 20, 2019
Pre-RFC: Constant slicing language design	3	978	March 25, 2019
Pre-RFC: RawSlices	9	2161	March 25, 2019
Creating a vector with reserved space language design	7	1020	November 15, 2023
Stacked Borrows: An Aliasing Model For Rust Unsafe Code Guidelines	49	7658	March 25, 2019

[Pre-RFC] Fixed-capacity view of Vec

repeating_fragment_len == 1

repeating_fragment_len == 512

Related topics