Potential Compiler Optimization Issue with &Vec<*const T> vs &[*const T]

See GodBolt and Github Issue.

In short, I found that a nested loop cannot be vectorized if the function signature is &Vec<*const T> rather than &[*const T] in the release build, even if the length of the arrays is given at compile time.

// cannot be vectorized
#[no_mangle]
unsafe fn per_pos_add_0(a:&Vec<*const i32>,b:&mut [*mut i32]){
    let a=a.as_slice();
    for i in 0..4000{
        for j in 0..4000{
            let v=a.get_unchecked(i).add(j);
            *b.get_unchecked_mut(i).add(j)+=*v;
        }
    }
}

// can be vectorized
#[no_mangle]
unsafe fn per_pos_add_1(a:&[*const i32],b:&mut [*mut i32]){
    for i in 0..4000{
        for j in 0..4000{
            let v=a.get_unchecked(i).add(j);
            *b.get_unchecked_mut(i).add(j)+=*v;
        }
    }
}

I also checked the MIR of the above functions. The only difference is the deref operation of Vec.

I am wondering whether this is a compiler bug or if the under-optimization is intentional?

Any insights or explanations would be greatly appreciated.

1 Like

It might have something to do with LLVM supporting C restrict on function arguments, but not arbitrary variables. This way &[] can be definitely non-overlapping, but vec.as_slice() is just some random pointer.

1 Like

Note that there's never a reason for you to write a function with that signature. &[T] is always a superior parameter type to &Vec<T>. The only thing you can do with the latter but not the former is see the capacity, but on an immutable reference there's nothing useful you can do with that capacity anyway.

Passing a slice is more flexible and optimizes better. So since you'd still want to use the slice version even if they did optimize the same, I can't say I'm that concerned here.

For good measure I'll also link Understanding Rusts Auto-Vectorization and Methods for speed increase - #5 by scottmcm - help - The Rust Programming Language Forum with my usual recommendations for getting good auto-vectorization.

(The following is not a guarantee, but describes implementation details of how things work today.)

&Vec<T> is passed as a single ptr to the vector.
&[T] is passed as a ptr to the first element and a usize with the count.

That means that what LLVM sees for them is drastically different, and what parameter attributes we can meaningfully put on them is also quite different.

Is the optimization difference desirable? No.
Is it something that we can reasonably do anything about right now? Probably not.

2 Likes