Does vectorized impl of `<[_]>::contains` make sense?

AngelicosPhosphoros · June 3, 2023, 3:32pm

It can be implemented using specializations.

Or is it better to just wait until LLVM implements this?

the8472 · June 3, 2023, 4:01pm

We already have a specialization for u8 and I suspect searching slices of non-u8 primitives and not even looking for the position is not as common as say str.contains(&str). But if you have a case, go ahead and see what happens.

Or is it better to just wait until LLVM implements this?

Is there any indication they will anytime soon?

comex · June 3, 2023, 6:39pm

For the time being, LLVM IR isn't even expressive enough for this to be a legal optimization.

On Rust's side, the memory model (Stacked/Tree Borrows) requires function arguments that are references to point to valid memory. If I recall correctly, this includes both references to fixed-size types and references to slices. But LLVM IR can only express this concept via the dereferenceable attribute which requires a fixed number of bytes. So references to slices don't get tagged as dereferenceable at all.

From the optimizer's perspective, it's something like

pub unsafe fn contains(ptr: *const u32, count: usize, val: u32) -> bool {
    for i in 0..count {
        if *ptr.add(i) == val {
            return true;
        }
    }
    false
}

And it wouldn't be UB to call such a function with, say, count = 8 but ptr pointing to an array of only 5 elements – as long as one of those 5 elements matches. If there's a match, the loop will stop, so it doesn't matter if further iterations would be indexing out of bounds.

Therefore the optimizer can't group, say, elements 4..8 into a single vectorized load, because that might be loading out of bounds.

If you make the loop always load all the elements then LLVM does autovectorize:

pub fn contains1(slice: &[u32], val: u32) -> bool {
    let mut ret = false;
    for &elm in slice {
        ret |= elm == val;
    }
    ret
}

But of course this isn't a desirable implementation.

AngelicosPhosphoros · June 3, 2023, 7:01pm

Anyway, I opened an issue in LLVM tracker, maybe they would respond to it: LLVM should vectorize linear search if possible · Issue #63094 · llvm/llvm-project · GitHub

Oh, this is bad.

Are there some way to tell LLVM that whole range is dereferencable?

AngelicosPhosphoros · June 3, 2023, 7:11pm

I tried to do that by dereferencing last element, it didn't work.

scottmcm · June 3, 2023, 9:02pm

AFAICT, LLVM's loop optimizations only kick in if they can canonicalize the loop to a single induction variable and no early returns.

That's why binary_search works better in LLVM as while size > 1 instead of while max-min > 1, for example: Only one induction variable.

So I wouldn't expect this kind of loop to be implemented any time soon.

Also, as a historical note, the slice iterators used to be hard-coded to a 4-way unroll. That was removed because it was bad for compile time -- especially in debug -- and only really helpful for trivial predicates.

So it might be better for people to use things like Memchr — Rust HW library // Lib.rs when they know a search like this is a bottleneck, especially since specializations can't pick up newtypes.

Also, once we get std::simd, it'll be much easier for people to write efficient vectorized implementations themselves: https://rust.godbolt.org/z/45ebG1r49.

system · September 1, 2023, 9:03pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Potential Compiler Optimization Issue with &Vec<const T> vs &[const T] compiler	5	640	October 9, 2024
Aggressive loop hoisting (in MIR?)	14	2072	March 25, 2019
Resolving Rust's forward progress guarantees compiler	53	6876	December 22, 2024
Stacked Borrows Implemented Unsafe Code Guidelines	16	3616	March 25, 2019
Vec::contains should accept anything that T implements Borrow for libs	5	1236	March 25, 2019

Does vectorized impl of `<[_]>::contains` make sense?

Related topics