No AVX optimization with slice get_unchecked()

cytrinox · February 14, 2022, 7:04am

Hi, I've a strange problem with eliminating bound checks for AVX optimizations. I've managed to write some piece of safe code that compiles to avx instructions, but a modified version with unsafe code and get_unchecked slices results to no avx optimization at all while I've expected some more improvements.

I've a small working example on Compiler Explorer

//#[target_feature(enable = "avx2")] (not needed, we inline the fn)
#[inline]
pub fn do_work<const NCOMP: usize>(line0: &[u16], line1: &[u16], pred: &mut[u16], width: usize) {
    let pixels = width/NCOMP;
    
    let pred = &mut pred[..(pixels*NCOMP)];
    let line0 = &line0[..(pixels*NCOMP)];
    let line1 = &line1[..(pixels*NCOMP)];

    for i in 1..pixels {
        for j in 0..NCOMP {
            // This adds AVX code
            //pred[i*NCOMP+j] = (line1[i*NCOMP+j] as i32).saturating_sub(line1[(i-1)*NCOMP+j] as i32) as u16;
            unsafe {
                // This not..?
                *pred.get_unchecked_mut(i*NCOMP+j) = 
                    (*line1.get_unchecked(i*NCOMP+j) as i32)
                    .saturating_sub(*line1.get_unchecked((i-1)*NCOMP+j) as i32) as u16;
            }
        }
    }
      
   
}


#[target_feature(enable = "avx2")]
pub unsafe fn calc(line0: &[u16], line1: &[u16]) {
    let mut pred = vec![0; line0.len()];
    // do_work::<1>(line0, line1, &mut pred, line0.len());
    do_work::<2>(line0, line1, &mut pred, line0.len());
}

You can experiment with the NCOMP parameter, with 1 there are also good results even with get_unchecked(). But with > 1, the code don't optimize to avx with get_unchecked().

The major cases are NCOMP = {1, 2, 3, 4}, so it would be great if it would be possible to optimize the code for these cases.

pcpthm · February 14, 2022, 8:09am

Not an answer, but is saturating_sub usage correct? x as i32 - y as i32 for x, y : u16 doesn't overflow so the result is same if you write line1[i*NCOMP+j].wrapping_sub(line1[(i-1)*NCOMP+j]).

Edit: I tried to produce a minimal example but it seems complicated.

system · May 15, 2022, 8:09am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing into a slice with a known length	9	956	April 11, 2023
Bounds checking at initialization ideas (deprecated)	10	3802	March 25, 2019
Bounds checking optimizations compiler	10	479	February 18, 2026
Missing optimizations with slices? compiler	7	852	March 25, 2019
More musings about slices and arrays language design	13	1579	January 3, 2021

No AVX optimization with slice get_unchecked()

Related topics