Faster PartialEq for small arrays of non 2^n length

i was messing around trying to optimize some code, and accidentally made some code that outperforms the implementation of PartialEq from libcore for slices of length 3, 5, 6, and 7

unfortunately it defiantly doesn't obey strict provenience, and i don't think it's entirely sound (namely, it will segfault if passed a slice that is on the very edge of a segment), but since bytestring equality is such a common operation, i thought i would share anyways.

the core idea is that trailing_zeros and xor can be used to give us the index of the first non-matching byte, so small bytestrings can be loaded into registers in their entirety and compared without a loop.

#![feature(test)]

extern crate test;

use std::hint::black_box;
use test::Bencher;

#[cfg(target_endian = "little")]
fn shared_prefix_z(a: &[u8], b: &[u8], min_len: usize) -> usize {
	// TODO: make sure the query Vec has a capacity of 8, so this is sound
	unsafe {
		let aa = (a.as_ptr() as *const u64).read_unaligned();
		let bb = (b.as_ptr() as *const u64).read_unaligned();
		((aa ^ bb).trailing_zeros() as usize / 8).min(min_len)
	}
}

#[bench]
fn eq(b: &mut Bencher) {
	b.iter(|| {
		black_box(black_box(b"123456") == black_box(b"123456"))
	})
}

#[bench]
fn hack(b: &mut Bencher) {
	b.iter(|| {
		black_box(shared_prefix_z(black_box(b"123456"),
						black_box(b"123456"),
						6) == 6)
	})
}

It's easy to make something that outperforms when it's UB -- after all, you can't get any faster than unreachable_unchecked().

Note that that's not enough for it to be sound, because reading uninitialized memory as u64 is immediate UB.

Also, do you know exactly how long they are? If so, you should compare [u8; N]s instead, which has a different implementation from the slice one.

10 Likes

ok so i'll zero-initialize it, then shrink the length. also i'm still kinda confused as to why reading uninit memory is UB instead of "Unspecified Result".

bytestrings have their length as part of their type, so this is using the faster array comparison route (thus why it only outperforms when the length is not a power of two).

3 Likes

worth noting that my algorithm isn't actually broken by changing byte patters, as long as the individual bytes are marked as undef and not the entire value, since it only reads the uninitialized memory once.

That's called [MaybeUninit<u8>; 8], not u64.

That's irrelevant to it being UB.

2 Likes

it's extremely relevant for whether it will have the expected behavior, though.

hopefully rust exposes the planned llvm freeze intrinsic someday, so code like this can be written soundly.

It's undefined behavior by definition. There is no "expected" behavior; anything can happen. Trying to predict UB is fruitless.

3 Likes

definitions can be changed. much of rust's specified UB (such as str being valid UTF-8) are not actually utilized by the compiler, but they are specified as UB because defining behavior is a one-way ratchet in a language with strong backwards compatibility guarantees.

it's far from impossible, especially if you pin a specific compiler version, optimization level, and target architecture. it's been done many times, to varying levels of success.

Violating that is not direct UB, it is just a library invariant that leads to later UB, like IIRC in the Chars iterator it may lead to OOB reads.

3 Likes

has that change been officially made? last i heard it was something that was "being considered"

IIRC it got decided a few years ago.

1 Like
2 Likes

I think this is a confusion in terminology.

As far as I understand: MIRI doesn't catch it as UB, but it's still UB because the documentation of from_utf8_unchecked says so and thus the function can't be trusted to do anything predictable when given non-UTF-8 bytes, which is the definition of UB.

It is permitted for the implementation of from_utf8_unchecked to be

pub const unsafe fn from_utf8_unchecked(v: &[u8]) -> &str {
    let Ok(s) = str::from_utf8(v) else {
        unreachable_unchecked()
    };
    s
}

This would very much be UB caught by miri!

We mostly do not do this because it optimizes poorly.

Right, definitely seems so from the documentation of from_utf8_unchecked.

And yet, str docs say:

Constructing a non-UTF-8 string slice is not immediate undefined behavior

So how do you construct a non-UTF-8 str in a sound way? Or are the docs just wrong?

I don't think you can do it safely, but you should be able to do it soundly with some form of transmutation.

Right, I misspoke, had already corrected it.

Is transmuting &[u8] into &str sound? Is this in the reference?

yes

The library versions of from_utf8_unchecked list WF UTF-8 as a prerequisite, so they're out. str::as_bytes_mut is highly relevant, since it allows temporarily putting non UTF-8 bytes in the string, but still requires restoring UTF-8 before the borrow is allowed to expire. Even though no action occurs on lifetime expiry, morally the documentation places a WF UTF-8 assertion at that point.

Ultimately this gets into a very subtle part of Rust in both the difference between library UB and language UB, and the fact that language validity of references doesn't care about the contents of the referenced memory beyond that it exists and is borrowed.

Even our more precise documentation does a poor job in properly distinguishing between language UB ("immediate" UB) and library UB ("deferred" UB). It's something we're slowly working on improving.

No; it is a potential valid implementation for &str to be (ptr, len) but &[u8] to be (len, ptr). However, as casing between *const [u8] and *const str is documented to work as expected, since the memory representation of str and [u8] is equivalent and they have identical unsize kinds.

Using a pointer cast like this is AIUI the only way that is fully supported by the documentation as it stands today. …Or at least I think we merged the reference PR that clarifies how as casts between slice types work, I didn't actually go verify that.

2 Likes