Zipping slices sucks

cgaebel · February 5, 2015, 3:32am

fn sum(x: &[i32], y: &[i32], z: &[i32]) -> i32 {
  fn repack<'a>(((x, y), z): ((&'a i32, &'a i32), &'a i32)) -> (&'a i32, &'a i32, &'a i32) { (x, y, z) }
  let mut s = 0;
  for (&x, &y, &z) in x.iter().zip(y.iter()).zip(z.iter()).map(repack) {
    s += x + y + z;
  }
  s
}

The inner loop has 3 comparisons per iteration, when the equivalent in C would have 1. I have no idea how to solve this nicely, but it’s a problem. I’d love to hear some ideas.

Aatch · February 5, 2015, 3:39am

I’m curious as to how the equivalent C could could manage only 1 comparison per iteration, since you have to make sure you haven’t reached the end of any of the slices. Which means checking each slice individually.

Gankra · February 5, 2015, 3:43am

I imagine the equivalent C would do a one-time min on the lengths, and then every iteration would just be comparing against the min.

I’m hopeful that there’s a fast-path to be had with trusted_len here. Basically if the lengths are trusted take the min and apply assume or something.

Aatch · February 5, 2015, 4:26am

Ok, so see what was possible without using zip, I did this:

pub fn sum(x: &[i32], y: &[i32], z: &[i32]) -> i32 {
    let n = min(x.len(), min(y.len(), z.len()));
    let mut i = 0;
    let mut s = 0;
    while i < n {
        unsafe {
            s += *x.get_unchecked(i)
                + *y.get_unchecked(i)
                + *z.get_unchecked(i);
        }
        i += 1;
    }
    s
}

fn min(a: usize, b: usize) -> usize {
    if a < b { a } else { b }
}

Which LLVM helpfully vectorizes and indeed only does a single comparison for each iterator. I’m not sure if the zip iterator could manage it though. I’ll do some more investigation.

DanielKeep · February 5, 2015, 6:01am

Out of curiosity, I benchmarked the unsafe sum function and a naive version:

pub fn naive(x: &[i32], y: &[i32], z: &[i32]) -> i32 {
    x.iter().zip(y.iter()).zip(z.iter())
        .map(flatten) // ((a, b), c) -> (a, b, c)
        .fold(0, |a, (x, y, z)| a + *x + *y + *z)
}

And a safe version of the loop:

pub fn naive_loop(x: &[i32], y: &[i32], z: &[i32]) -> i32 {
    let n = min(x.len(), min(y.len(), z.len()));
    let mut s = 0;
    for i in 0..n {
        s += x[i] + y[i] + z[i];
    }
    s
}

Results were:

test tests::naive       ... bench:      2094 ns/iter (+/- 106)
test tests::naive_loop  ... bench:      1412 ns/iter (+/- 44)
test tests::unsafe_loop ... bench:       360 ns/iter (+/- 16)

I can understand the unsafe one being faster… but the iterator version, which is supposed to avoid bounds checks, being slower than a naive loop? That’s… really less than ideal.

Edit: “compressed” test case on playpen if you want to reproduce. Make sure to read comment on the test data set.

bluss · February 5, 2015, 9:35am

Remember that when we boast about Rust’s iterators and how they behave exactly as efficiently as C++ iterators, it’s about how rust managed to get the slice’s iterators to compile to the exact same code, and that even that effort hasn’t reached the finish line just yet.

bluss · February 7, 2015, 12:09pm

I’m extremely excited, because exactly that works! (At least in the two-iterator case… three or more has improvements possible)…

Edit: Just let’s experiment with that a bit. It seems I’m worse at benchmarking than at implementing this… didn’t get everything exactly right in that commit (iter by value vs ptr) etc. Looks like triple iteration maybe is equally fast already…

cgaebel · February 7, 2015, 6:08pm

It’s a shame that only works for the 2-zip case, and falls down for the k-zip case.

bluss · February 7, 2015, 7:48pm

It doesn’t! That was just my benchmarks being inconsistent. It’s fixed in the next commit and both the 2-slice and 3-slice cases are just as fast with numerical loop + unchecked indexing as using ZipTrusted.

Gankra · February 7, 2015, 7:49pm

Niiiiice

TrustedLen ftw

bluss · February 7, 2015, 8:00pm

llvm::assume is magic.

DanielKeep · February 8, 2015, 8:38am

The solution to all optimisation problems:

let magic: f64 = get_magic();
llvm::assume(magic == magic);

system · March 25, 2019, 8:23am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
.zip() optimization riddles	3	1916	March 25, 2019
Missed optimization opportunity for bounds checks	14	1098	April 5, 2022
Missing optimizations with slices? compiler	7	796	March 25, 2019
More musings about slices and arrays language design	13	1443	January 3, 2021
Faster PartialEq for small arrays of non 2^n length compiler	25	729	September 14, 2024

Zipping slices sucks

Related topics