Potential Compiler Optimization Issue with &Vec<const T> vs &[const T]

zhongyi51 · May 19, 2024, 1:56pm

In short, I found that a nested loop cannot be vectorized if the function signature is &Vec<*const T> rather than &[*const T] in the release build, even if the length of the arrays is given at compile time.

// cannot be vectorized
#[no_mangle]
unsafe fn per_pos_add_0(a:&Vec<*const i32>,b:&mut [*mut i32]){
    let a=a.as_slice();
    for i in 0..4000{
        for j in 0..4000{
            let v=a.get_unchecked(i).add(j);
            *b.get_unchecked_mut(i).add(j)+=*v;
        }
    }
}

// can be vectorized
#[no_mangle]
unsafe fn per_pos_add_1(a:&[*const i32],b:&mut [*mut i32]){
    for i in 0..4000{
        for j in 0..4000{
            let v=a.get_unchecked(i).add(j);
            *b.get_unchecked_mut(i).add(j)+=*v;
        }
    }
}

I also checked the MIR of the above functions. The only difference is the deref operation of Vec.

I am wondering whether this is a compiler bug or if the under-optimization is intentional?

Any insights or explanations would be greatly appreciated.

kornel · May 19, 2024, 2:06pm

It might have something to do with LLVM supporting C restrict on function arguments, but not arbitrary variables. This way &[] can be definitely non-overlapping, but vec.as_slice() is just some random pointer.

scottmcm · May 19, 2024, 11:31pm

Note that there's never a reason for you to write a function with that signature. &[T] is always a superior parameter type to &Vec<T>. The only thing you can do with the latter but not the former is see the capacity, but on an immutable reference there's nothing useful you can do with that capacity anyway.

Passing a slice is more flexible and optimizes better. So since you'd still want to use the slice version even if they did optimize the same, I can't say I'm that concerned here.

For good measure I'll also link Understanding Rusts Auto-Vectorization and Methods for speed increase - #5 by scottmcm - help - The Rust Programming Language Forum with my usual recommendations for getting good auto-vectorization.

(The following is not a guarantee, but describes implementation details of how things work today.)

&Vec<T> is passed as a single ptr to the vector.
&[T] is passed as a ptr to the first element and a usize with the count.

That means that what LLVM sees for them is drastically different, and what parameter attributes we can meaningfully put on them is also quite different.

Is the optimization difference desirable? No.
Is it something that we can reasonably do anything about right now? Probably not.

zhongyi51 · July 5, 2024, 2:46pm

Thanks for your response.

Indeed, it is quite common for a function to indirectly have a signature &Vec<T> instead of &[T] through nested structure:

pub struct NestedVec{
    other_field:i32,
    arr:Vec<NestedVecInner>
}

pub struct NestedVecInner{
    other_field:&'static str,
    ptr:*const i32
}


#[no_mangle]
unsafe fn per_pos_add_2(nested:&NestedVec,b:&mut [*mut i32]){
    let a=&nested.arr;
    for i in 0..a.len(){
        for j in 0..b.len(){
            let v=a.get_unchecked(i).ptr.add(j);
            *b.get_unchecked_mut(i).add(j)+=*v;
        }
    }
}

Godbolt.

This situation is quite typical in daily work, and developers cannot optimize their code to adapt rustc without breaking changes. Moreover, Clang can vectorize this function call properly.

Update: core::hint::unreachable_unchecked() and is_aligned() did not work.

Neutron3529 · July 11, 2024, 8:56am

Why not provide an accessor here?

fn arr(&self)->&[NestedVecInner] {&self.arr}

system · October 9, 2024, 8:57am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Optimization comparison: Vec vs array and for vs while compiler	5	1184	July 2, 2022
Does vectorized impl of `<[_]>::contains` make sense? libs	6	542	September 1, 2023
Alloc: Vec::as_slice, String::as_str -> const libs	8	380	September 5, 2024
Decide if for-loop iterator is empty given a literal/const iterator compiler	6	299	August 20, 2024
Pre-Pre-RFC: Noncontinuous Lifetime Parameters language design	7	909	January 23, 2022

Potential Compiler Optimization Issue with &Vec<*const T> vs &[*const T]

Related topics

Potential Compiler Optimization Issue with &Vec<const T> vs &[const T]