fn sum(x: &[i32], y: &[i32], z: &[i32]) -> i32 {
fn repack<'a>(((x, y), z): ((&'a i32, &'a i32), &'a i32)) -> (&'a i32, &'a i32, &'a i32) { (x, y, z) }
let mut s = 0;
for (&x, &y, &z) in x.iter().zip(y.iter()).zip(z.iter()).map(repack) {
s += x + y + z;
}
s
}
The inner loop has 3 comparisons per iteration, when the equivalent in C would have 1. I have no idea how to solve this nicely, but it’s a problem. I’d love to hear some ideas.
I’m curious as to how the equivalent C could could manage only 1 comparison per iteration, since you have to make sure you haven’t reached the end of any of the slices. Which means checking each slice individually.
Ok, so see what was possible without using zip, I did this:
pub fn sum(x: &[i32], y: &[i32], z: &[i32]) -> i32 {
let n = min(x.len(), min(y.len(), z.len()));
let mut i = 0;
let mut s = 0;
while i < n {
unsafe {
s += *x.get_unchecked(i)
+ *y.get_unchecked(i)
+ *z.get_unchecked(i);
}
i += 1;
}
s
}
fn min(a: usize, b: usize) -> usize {
if a < b { a } else { b }
}
Which LLVM helpfully vectorizes and indeed only does a single comparison for each iterator. I’m not sure if the zip iterator could manage it though. I’ll do some more investigation.
Out of curiosity, I benchmarked the unsafe sum function and a naive version:
pub fn naive(x: &[i32], y: &[i32], z: &[i32]) -> i32 {
x.iter().zip(y.iter()).zip(z.iter())
.map(flatten) // ((a, b), c) -> (a, b, c)
.fold(0, |a, (x, y, z)| a + *x + *y + *z)
}
And a safe version of the loop:
pub fn naive_loop(x: &[i32], y: &[i32], z: &[i32]) -> i32 {
let n = min(x.len(), min(y.len(), z.len()));
let mut s = 0;
for i in 0..n {
s += x[i] + y[i] + z[i];
}
s
}
Results were:
test tests::naive ... bench: 2094 ns/iter (+/- 106)
test tests::naive_loop ... bench: 1412 ns/iter (+/- 44)
test tests::unsafe_loop ... bench: 360 ns/iter (+/- 16)
I can understand the unsafe one being faster… but the iterator version, which is supposed to avoid bounds checks, being slower than a naive loop? That’s… really less than ideal.
Remember that when we boast about Rust’s iterators and how they behave exactly as efficiently as C++ iterators, it’s about how rust managed to get the slice’s iterators to compile to the exact same code, and that even that effort hasn’t reached the finish line just yet.
I’m extremely excited, because exactly that works! (At least in the two-iterator case… three or more has improvements possible)…
Edit: Just let’s experiment with that a bit. It seems I’m worse at benchmarking than at implementing this… didn’t get everything exactly right in that commit (iter by value vs ptr) etc. Looks like triple iteration maybe is equally fast already…
It doesn’t! That was just my benchmarks being inconsistent. It’s fixed in the next commit and both the 2-slice and 3-slice cases are just as fast with numerical loop + unchecked indexing as using ZipTrusted.