In short, I found that a nested loop cannot be vectorized if the function signature is &Vec<*const T> rather than &[*const T] in the release build, even if the length of the arrays is given at compile time.
// cannot be vectorized
#[no_mangle]
unsafe fn per_pos_add_0(a:&Vec<*const i32>,b:&mut [*mut i32]){
let a=a.as_slice();
for i in 0..4000{
for j in 0..4000{
let v=a.get_unchecked(i).add(j);
*b.get_unchecked_mut(i).add(j)+=*v;
}
}
}
// can be vectorized
#[no_mangle]
unsafe fn per_pos_add_1(a:&[*const i32],b:&mut [*mut i32]){
for i in 0..4000{
for j in 0..4000{
let v=a.get_unchecked(i).add(j);
*b.get_unchecked_mut(i).add(j)+=*v;
}
}
}
I also checked the MIR of the above functions. The only difference is the deref operation of Vec.
I am wondering whether this is a compiler bug or if the under-optimization is intentional?
Any insights or explanations would be greatly appreciated.
It might have something to do with LLVM supporting C restrict on function arguments, but not arbitrary variables. This way &[] can be definitely non-overlapping, but vec.as_slice() is just some random pointer.
Note that there's never a reason for you to write a function with that signature. &[T] is always a superior parameter type to &Vec<T>. The only thing you can do with the latter but not the former is see the capacity, but on an immutable reference there's nothing useful you can do with that capacity anyway.
Passing a slice is more flexible and optimizes better. So since you'd still want to use the slice version even if they did optimize the same, I can't say I'm that concerned here.
(The following is not a guarantee, but describes implementation details of how things work today.)
&Vec<T> is passed as a singleptr to the vector. &[T] is passed as a ptr to the first element and a usize with the count.
That means that what LLVM sees for them is drastically different, and what parameter attributes we can meaningfully put on them is also quite different.
Is the optimization difference desirable? No.
Is it something that we can reasonably do anything about right now? Probably not.
This situation is quite typical in daily work, and developers cannot optimize their code to adapt rustc without breaking changes. Moreover, Clang can vectorize this function call properly.
Update: core::hint::unreachable_unchecked() and is_aligned() did not work.