Could we have this impl of TryFrom? It'd be very useful. Thanks.
On nightly this is done in place:
u8_vec.into_iter().map(TryFrom::try_from).collect::<Result<Vec<NonZeroU8>, _>>()
Not particularly good codegen, to be fair. Presumably a TryFrom impl would do more than a byte-at-a-time check (by delegating to memchr, most likely).
Interestingly enough, even the following compiles down to a byte-by-byte cmp
:
pub fn into_nonzero(vec: Vec<u8>) -> Option<Vec<NonZeroU8>> {
if vec.iter().all(|x| *x != 0) {
Some(vec.into_iter().map(|x| unsafe {
NonZeroU8::new_unchecked(x)
}).collect())
} else {
None
}
}
I would have expected the compiler to do better here (I could see the TryFrom
inhibiting some optimizations, but this is a plain all
comparison against 0).
Right, I meant libcore would manually call rust/memchr.rs at master · rust-lang/rust · GitHub.
Autovectorization is inherently unreliable, especially when it has early returns like this... In this case, I suspect that LLVM doesn't want to read past the first zero byte, since it doesn't know the whole slice is dereferenceable — for all it knows the first byte after zero could fault.
I'm deviating a bit here, but I would expect the following to compile to something better than a byte-by-byte cmp
:
pub fn justcheck(vec: &[u8]) -> bool {
vec.iter().all(|x| *x != 0)
}
Assembly on rustc 1.54.0-nightly (4de757209 2021-05-01)
example::justcheck:
xor ecx, ecx
.LBB2_1:
mov rax, rcx
cmp rsi, rcx
je .LBB2_3
lea rcx, [rax + 1]
cmp byte ptr [rdi + rax], 0
jne .LBB2_1
.LBB2_3:
cmp rsi, rax
sete al
ret
Maybe if this is fixed the original proposed solution would compile to something better as well?
A good alternative to vec.iter().all(|x| *x != 0)
is !vec.contains(&0)
, which will call memchr
. This still needs unsafe
for converting the vec after the check though.
An alternative that might have been vectorized better would be
vec.iter().min() != Some(0)
Taking that as an idea for manually vectorizing leads to something that could potentially be slightly better than what a more general memchr might do, at least if optimizing for the case where finding a zero is rare, and you don't care where it was found. The difference is that when looking for a zero (or -1) you get to accumulate a minimum (maximum for -1) instead of figuring out if you saw the byte in each chunk again.
I couldn't help myself from implementing https://godbolt.org/z/sz7sKzoKa, where the inner loop that processes 64 bytes per iteration ends up looking like:
.LBB0_3:
vpminub ymm1, ymm1, ymmword ptr [rax]
vpminub ymm0, ymm0, ymmword ptr [rax + 32]
add rax, 64
cmp rax, rcx
jb .LBB0_3
This will always read the entire array, but short circuiting could be added with varying levels of tradeoff between different probabilities of finding a zero.
The bytecount crate might be relevant here.
Umm...
Optimizations aside, we feel like TryFrom<Vec<u8>> for Vec<NonZeroU8>
is conceptually nicer than these alternatives, and it can even add position information and whatnot a la std::ffi::NulError
or std::str::Utf8Error
.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.