Potential Compiler Optimization Issue with &Vec<*const T> vs &[*const T]

See GodBolt and Github Issue.

In short, I found that a nested loop cannot be vectorized if the function signature is &Vec<*const T> rather than &[*const T] in the release build, even if the length of the arrays is given at compile time.

// cannot be vectorized
#[no_mangle]
unsafe fn per_pos_add_0(a:&Vec<*const i32>,b:&mut [*mut i32]){
    let a=a.as_slice();
    for i in 0..4000{
        for j in 0..4000{
            let v=a.get_unchecked(i).add(j);
            *b.get_unchecked_mut(i).add(j)+=*v;
        }
    }
}

// can be vectorized
#[no_mangle]
unsafe fn per_pos_add_1(a:&[*const i32],b:&mut [*mut i32]){
    for i in 0..4000{
        for j in 0..4000{
            let v=a.get_unchecked(i).add(j);
            *b.get_unchecked_mut(i).add(j)+=*v;
        }
    }
}

I also checked the MIR of the above functions. The only difference is the deref operation of Vec.

I am wondering whether this is a compiler bug or if the under-optimization is intentional?

Any insights or explanations would be greatly appreciated.

2 Likes

It might have something to do with LLVM supporting C restrict on function arguments, but not arbitrary variables. This way &[] can be definitely non-overlapping, but vec.as_slice() is just some random pointer.

3 Likes

Note that there's never a reason for you to write a function with that signature. &[T] is always a superior parameter type to &Vec<T>. The only thing you can do with the latter but not the former is see the capacity, but on an immutable reference there's nothing useful you can do with that capacity anyway.

Passing a slice is more flexible and optimizes better. So since you'd still want to use the slice version even if they did optimize the same, I can't say I'm that concerned here.

For good measure I'll also link Understanding Rusts Auto-Vectorization and Methods for speed increase - #5 by scottmcm - help - The Rust Programming Language Forum with my usual recommendations for getting good auto-vectorization.

(The following is not a guarantee, but describes implementation details of how things work today.)

&Vec<T> is passed as a single ptr to the vector.
&[T] is passed as a ptr to the first element and a usize with the count.

That means that what LLVM sees for them is drastically different, and what parameter attributes we can meaningfully put on them is also quite different.

Is the optimization difference desirable? No.
Is it something that we can reasonably do anything about right now? Probably not.

5 Likes

Thanks for your response.

Indeed, it is quite common for a function to indirectly have a signature &Vec<T> instead of &[T] through nested structure:

pub struct NestedVec{
    other_field:i32,
    arr:Vec<NestedVecInner>
}

pub struct NestedVecInner{
    other_field:&'static str,
    ptr:*const i32
}


#[no_mangle]
unsafe fn per_pos_add_2(nested:&NestedVec,b:&mut [*mut i32]){
    let a=&nested.arr;
    for i in 0..a.len(){
        for j in 0..b.len(){
            let v=a.get_unchecked(i).ptr.add(j);
            *b.get_unchecked_mut(i).add(j)+=*v;
        }
    }
}

Godbolt.

This situation is quite typical in daily work, and developers cannot optimize their code to adapt rustc without breaking changes. Moreover, Clang can vectorize this function call properly.


Update: core::hint::unreachable_unchecked() and is_aligned() did not work.

6 Likes

Why not provide an accessor here?

fn arr(&self)->&[NestedVecInner] {&self.arr}

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.