I’ve got even bigger gains on my machine for repeating fragment of size 256, as much as 3x:
running 4 tests
test bench_decode_rle_lib_naive ... bench: 9,606 ns/iter (+/- 167)
test bench_decode_rle_lib_opt ... bench: 1,617 ns/iter (+/- 634)
test bench_decode_rle_naive ... bench: 9,128 ns/iter (+/- 196)
test bench_decode_rle_vuln ... bench: 5,361 ns/iter (+/- 106)
I’ve fiddled a bit with the benchmarking harness, tried changing const into black-boxed variables to see if that affects performance (it didn’t).
On the other hand, bench_decode_rle_lib_opt is 2x slower than vuln for size 2. It only reaches performance parity on size 4 for me.*
I’ll try to add capacity(), fill() and fill_with() methods shortly and see how that works.
Also, I’m still wondering if such a structure would be useful for replacing slices in multimedia. So far the results with push() vs unsafe + copying by index are not encouraging, with push() being 2x slower across the board. I’ll toy with resize_with() and see if I can get better results with it.
*after I’ve annotated functions #[inline], before that it reached performance parity only on 5 and up. I’ve opened a PR to add inlining and a unit test.