i was messing around trying to optimize some code, and accidentally made some code that outperforms the implementation of PartialEq from libcore for slices of length 3, 5, 6, and 7
unfortunately it defiantly doesn't obey strict provenience, and i don't think it's entirely sound (namely, it will segfault if passed a slice that is on the very edge of a segment), but since bytestring equality is such a common operation, i thought i would share anyways.
the core idea is that trailing_zeros and xor can be used to give us the index of the first non-matching byte, so small bytestrings can be loaded into registers in their entirety and compared without a loop.
#![feature(test)]
extern crate test;
use std::hint::black_box;
use test::Bencher;
#[cfg(target_endian = "little")]
fn shared_prefix_z(a: &[u8], b: &[u8], min_len: usize) -> usize {
// TODO: make sure the query Vec has a capacity of 8, so this is sound
unsafe {
let aa = (a.as_ptr() as *const u64).read_unaligned();
let bb = (b.as_ptr() as *const u64).read_unaligned();
((aa ^ bb).trailing_zeros() as usize / 8).min(min_len)
}
}
#[bench]
fn eq(b: &mut Bencher) {
b.iter(|| {
black_box(black_box(b"123456") == black_box(b"123456"))
})
}
#[bench]
fn hack(b: &mut Bencher) {
b.iter(|| {
black_box(shared_prefix_z(black_box(b"123456"),
black_box(b"123456"),
6) == 6)
})
}
ok so i'll zero-initialize it, then shrink the length. also i'm still kinda confused as to why reading uninit memory is UB instead of "Unspecified Result".
bytestrings have their length as part of their type, so this is using the faster array comparison route (thus why it only outperforms when the length is not a power of two).
worth noting that my algorithm isn't actually broken by changing byte patters, as long as the individual bytes are marked as undef and not the entire value, since it only reads the uninitialized memory once.
definitions can be changed. much of rust's specified UB (such as str being valid UTF-8) are not actually utilized by the compiler, but they are specified as UB because defining behavior is a one-way ratchet in a language with strong backwards compatibility guarantees.
it's far from impossible, especially if you pin a specific compiler version, optimization level, and target architecture. it's been done many times, to varying levels of success.
As far as I understand: MIRI doesn't catch it as UB, but it's still UB because the documentation of from_utf8_unchecked says so and thus the function can't be trusted to do anything predictable when given non-UTF-8 bytes, which is the definition of UB.
The library versions of from_utf8_unchecked list WF UTF-8 as a prerequisite, so they're out. str::as_bytes_mut is highly relevant, since it allows temporarily putting non UTF-8 bytes in the string, but still requires restoring UTF-8 before the borrow is allowed to expire. Even though no action occurs on lifetime expiry, morally the documentation places a WF UTF-8 assertion at that point.
Ultimately this gets into a very subtle part of Rust in both the difference between library UB and language UB, and the fact that language validity of references doesn't care about the contents of the referenced memory beyond that it exists and is borrowed.
Even our more precise documentation does a poor job in properly distinguishing between language UB ("immediate" UB) and library UB ("deferred" UB). It's something we're slowly working on improving.
No; it is a potential valid implementation for &str to be (ptr, len) but &[u8] to be (len, ptr). However, as casing between *const [u8] and *const stris documented to work as expected, since the memory representation of str and [u8] is equivalent and they have identical unsize kinds.
Using a pointer cast like this is AIUI the only way that is fully supported by the documentation as it stands today. …Or at least I think we merged the reference PR that clarifies how as casts between slice types work, I didn't actually go verify that.