Why is String's equality comparison slower than `&[u8]`

thomas-k-cameron · November 14, 2024, 11:20pm

I noticed that equality comparison for String known at compile time with length of 33 or more with non repeating bytes is slower than bytes (such as Vec<u8> or &[u8]) on x86.

My benchmark shows that it is 10x slower.

Here is my github repo for the behcmark.

github.com

thomas-k-cameron/rust-string-equality-vs-bytes-equality/blob/master/README.md

# Equality Comparison for String Known at Compile Time with Length of 33 or More with Non-Repeating bytes is Slower than Array of `u8`

I created a function that performs equality comparison agaisnt a uuid4.
match statement where each match arm is a uuid v4 that I randomly generated. 

These two functions will always produce the same reuslt, however, the benchmark shows that bytes variant is faster.

LINK: <https://rust.godbolt.org/z/fbnP1Ph9c>

```rust
fn str_match(totally_random_uuid_v4: &str) -> i64
{
    match totally_random_uuid_v4 {
        "68975c1f-bb5e-4640-a1bd-66bcce78b73e" => 1,
        "0cbb46f2-2bed-4168-8263-501dad40f515" => 2,
        // lots of uuid-s
        _ => -1
    }
}

This file has been truncated. show original

I noticed that String uses libc function bcmp. On the other hand, &[u8] or Vec<u8> doesn't use any libc function and it just generate assembly code that looks like strcmp.

I'm not quite sure why String's implementation is so different from the others. Can anyone give me an insight to this?

scottmcm · November 15, 2024, 12:05am

The reason is that you're not actually doing equality, you're doing pattern matching.

This is clearest in the MIR: https://rust.godbolt.org/z/K9z4WMsKz

The arm

b"68975c1f-bb5e-4640-a1bd-66bcce78b73e" => 1,

is actually

[b'6', b'8', b'9', b'7', b'5', b'c', b'1', b'f', b'-', b'b', b'b', b'5', b'e', b'-', b'4', b'640-', b'a', b'1', b'b', b'd-', b'6', b'6', b'b', b'c', b'c', b'e', b'7', b'8', b'b', b'7', b'3', b'e'] => 1,

and thus it's treated as a slice pattern that checks all the bytes individually.

If you switch it to using equality instead of a slice pattern,

#[no_mangle] pub fn bytes_match(totally_random_uuid_v4: &str) -> i64
{
    match totally_random_uuid_v4.as_bytes() {
        x if x == b"68975c1f-bb5e-4640-a1bd-66bcce78b73e" => 1,
        x if x == b"0cbb46f2-2bed-4168-8263-501dad40f515" => 2,
        // lots of uuid-s
        _ => -1
    }
}

Then you'll see that one use bcmp as well: https://rust.godbolt.org/z/WWsn5droT

As for why there's a performance difference with the different approaches that depends on the number of arms? My guess would be because of icache. One problem with microbenchmarks is that if you are running only that code, the lots-of-code version looks good until it starts to bust the icache. But the lots-of-code version is often worse in real life, because loading all that code into the icache pushes out the code for whatever else you're doing around it, making it overall worse.

jrose · November 15, 2024, 12:09am

…but that still seems like a missing optimization opportunity, a slice pattern full of constants should be converted to a memcmp somewhere in the compiler (though at what threshold it might be hard to say).

EDIT: ah, the memcmp is slower, hm.

sahnehaeubchen · November 15, 2024, 11:32am

If the constant is smaller or equal to 32 bytes the str match gets inlined and transformed into SSE comparisons. But the slice match does not, it remains a byte-wise constant comparison.

Does it make sense to change the generated MIR for slice matches to use slice equality for subsequent elements? This might even speed up compilation because there are fewer blocks.

EDIT:

Oh, I just read the "with length of 33 or more" in the original post

Topic		Replies	Views
Using a more efficient string matching algorithm libs	39	7653	September 15, 2022
Wild idea: deprecating APIs that conflate str and [u8] libs	59	3572	November 12, 2020
Faster PartialEq for small arrays of non 2^n length compiler	25	724	September 14, 2024
Slice equality? libs	7	6142	August 26, 2021
Could we implement some kind of small struct optimization? compiler	16	2039	July 7, 2021

Why is String's equality comparison slower than `&[u8]`

Related topics