Hi, I've a strange problem with eliminating bound checks for AVX optimizations. I've managed to write some piece of safe code that compiles to avx instructions, but a modified version with unsafe code and get_unchecked slices results to no avx optimization at all while I've expected some more improvements.
I've a small working example on Compiler Explorer
//#[target_feature(enable = "avx2")] (not needed, we inline the fn)
#[inline]
pub fn do_work<const NCOMP: usize>(line0: &[u16], line1: &[u16], pred: &mut[u16], width: usize) {
let pixels = width/NCOMP;
let pred = &mut pred[..(pixels*NCOMP)];
let line0 = &line0[..(pixels*NCOMP)];
let line1 = &line1[..(pixels*NCOMP)];
for i in 1..pixels {
for j in 0..NCOMP {
// This adds AVX code
//pred[i*NCOMP+j] = (line1[i*NCOMP+j] as i32).saturating_sub(line1[(i-1)*NCOMP+j] as i32) as u16;
unsafe {
// This not..?
*pred.get_unchecked_mut(i*NCOMP+j) =
(*line1.get_unchecked(i*NCOMP+j) as i32)
.saturating_sub(*line1.get_unchecked((i-1)*NCOMP+j) as i32) as u16;
}
}
}
}
#[target_feature(enable = "avx2")]
pub unsafe fn calc(line0: &[u16], line1: &[u16]) {
let mut pred = vec![0; line0.len()];
// do_work::<1>(line0, line1, &mut pred, line0.len());
do_work::<2>(line0, line1, &mut pred, line0.len());
}
You can experiment with the NCOMP parameter, with 1 there are also good results even with get_unchecked(). But with > 1, the code don't optimize to avx with get_unchecked().
The major cases are NCOMP = {1, 2, 3, 4}, so it would be great if it would be possible to optimize the code for these cases.