Does vectorized impl of `<[_]>::contains` make sense?

It can be implemented using specializations.

Example: godbolt

Or is it better to just wait until LLVM implements this?

We already have a specialization for u8 and I suspect searching slices of non-u8 primitives and not even looking for the position is not as common as say str.contains(&str). But if you have a case, go ahead and see what happens.

Or is it better to just wait until LLVM implements this?

Is there any indication they will anytime soon?

For the time being, LLVM IR isn't even expressive enough for this to be a legal optimization.

On Rust's side, the memory model (Stacked/Tree Borrows) requires function arguments that are references to point to valid memory. If I recall correctly, this includes both references to fixed-size types and references to slices. But LLVM IR can only express this concept via the dereferenceable attribute which requires a fixed number of bytes. So references to slices don't get tagged as dereferenceable at all.

From the optimizer's perspective, it's something like

pub unsafe fn contains(ptr: *const u32, count: usize, val: u32) -> bool {
    for i in 0..count {
        if *ptr.add(i) == val {
            return true;
        }
    }
    false
}

And it wouldn't be UB to call such a function with, say, count = 8 but ptr pointing to an array of only 5 elements – as long as one of those 5 elements matches. If there's a match, the loop will stop, so it doesn't matter if further iterations would be indexing out of bounds.

Therefore the optimizer can't group, say, elements 4..8 into a single vectorized load, because that might be loading out of bounds.

If you make the loop always load all the elements then LLVM does autovectorize:

pub fn contains1(slice: &[u32], val: u32) -> bool {
    let mut ret = false;
    for &elm in slice {
        ret |= elm == val;
    }
    ret
}

But of course this isn't a desirable implementation.

2 Likes

Anyway, I opened an issue in LLVM tracker, maybe they would respond to it: LLVM should vectorize linear search if possible · Issue #63094 · llvm/llvm-project · GitHub

Oh, this is bad.

Are there some way to tell LLVM that whole range is dereferencable?

I tried to do that by dereferencing last element, it didn't work.

AFAICT, LLVM's loop optimizations only kick in if they can canonicalize the loop to a single induction variable and no early returns.

That's why binary_search works better in LLVM as while size > 1 instead of while max-min > 1, for example: Only one induction variable.

So I wouldn't expect this kind of loop to be implemented any time soon.


Also, as a historical note, the slice iterators used to be hard-coded to a 4-way unroll. That was removed because it was bad for compile time -- especially in debug -- and only really helpful for trivial predicates.

So it might be better for people to use things like Memchr — Rust HW library // Lib.rs when they know a search like this is a bottleneck, especially since specializations can't pick up newtypes.

Also, once we get std::simd, it'll be much easier for people to write efficient vectorized implementations themselves: https://rust.godbolt.org/z/45ebG1r49.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.