TryFrom<Vec<u8>> for Vec<NonZeroU8>?

Could we have this impl of TryFrom? It'd be very useful. Thanks.

On nightly this is done in place:

u8_vec.into_iter().map(TryFrom::try_from).collect::<Result<Vec<NonZeroU8>, _>>()
7 Likes

Not particularly good codegen, to be fair. Presumably a TryFrom impl would do more than a byte-at-a-time check (by delegating to memchr, most likely).

1 Like

Interestingly enough, even the following compiles down to a byte-by-byte cmp:

pub fn into_nonzero(vec: Vec<u8>) -> Option<Vec<NonZeroU8>> {
    if vec.iter().all(|x| *x != 0) {
        Some(vec.into_iter().map(|x| unsafe {
            NonZeroU8::new_unchecked(x)
        }).collect())
    } else {
        None
    }
}

I would have expected the compiler to do better here (I could see the TryFrom inhibiting some optimizations, but this is a plain all comparison against 0).

Right, I meant libcore would manually call rust/memchr.rs at master · rust-lang/rust · GitHub.

Autovectorization is inherently unreliable, especially when it has early returns like this... In this case, I suspect that LLVM doesn't want to read past the first zero byte, since it doesn't know the whole slice is dereferenceable — for all it knows the first byte after zero could fault.

I'm deviating a bit here, but I would expect the following to compile to something better than a byte-by-byte cmp:

pub fn justcheck(vec: &[u8]) -> bool {
    vec.iter().all(|x| *x != 0)
}
Assembly on rustc 1.54.0-nightly (4de757209 2021-05-01)
example::justcheck:
        xor     ecx, ecx
.LBB2_1:
        mov     rax, rcx
        cmp     rsi, rcx
        je      .LBB2_3
        lea     rcx, [rax + 1]
        cmp     byte ptr [rdi + rax], 0
        jne     .LBB2_1
.LBB2_3:
        cmp     rsi, rax
        sete    al
        ret

Maybe if this is fixed the original proposed solution would compile to something better as well?

2 Likes

A good alternative to vec.iter().all(|x| *x != 0) is !vec.contains(&0), which will call memchr. This still needs unsafe for converting the vec after the check though.

3 Likes

An alternative that might have been vectorized better would be

vec.iter().min() != Some(0)

Taking that as an idea for manually vectorizing leads to something that could potentially be slightly better than what a more general memchr might do, at least if optimizing for the case where finding a zero is rare, and you don't care where it was found. The difference is that when looking for a zero (or -1) you get to accumulate a minimum (maximum for -1) instead of figuring out if you saw the byte in each chunk again.

I couldn't help myself from implementing https://godbolt.org/z/sz7sKzoKa, where the inner loop that processes 64 bytes per iteration ends up looking like:

.LBB0_3:
        vpminub ymm1, ymm1, ymmword ptr [rax]
        vpminub ymm0, ymm0, ymmword ptr [rax + 32]
        add     rax, 64
        cmp     rax, rcx
        jb      .LBB0_3

This will always read the entire array, but short circuiting could be added with varying levels of tradeoff between different probabilities of finding a zero.

1 Like

The bytecount crate might be relevant here.

Umm...

Optimizations aside, we feel like TryFrom<Vec<u8>> for Vec<NonZeroU8> is conceptually nicer than these alternatives, and it can even add position information and whatnot a la std::ffi::NulError or std::str::Utf8Error.