The reason is that you're not actually doing equality, you're doing pattern matching.
This is clearest in the MIR: https://rust.godbolt.org/z/K9z4WMsKz
The arm
b"68975c1f-bb5e-4640-a1bd-66bcce78b73e" => 1,
is actually
[b'6', b'8', b'9', b'7', b'5', b'c', b'1', b'f', b'-', b'b', b'b', b'5', b'e', b'-', b'4', b'640-', b'a', b'1', b'b', b'd-', b'6', b'6', b'b', b'c', b'c', b'e', b'7', b'8', b'b', b'7', b'3', b'e'] => 1,
and thus it's treated as a slice pattern that checks all the bytes individually.
If you switch it to using equality instead of a slice pattern,
#[no_mangle] pub fn bytes_match(totally_random_uuid_v4: &str) -> i64
{
match totally_random_uuid_v4.as_bytes() {
x if x == b"68975c1f-bb5e-4640-a1bd-66bcce78b73e" => 1,
x if x == b"0cbb46f2-2bed-4168-8263-501dad40f515" => 2,
// lots of uuid-s
_ => -1
}
}
Then you'll see that one use bcmp
as well: https://rust.godbolt.org/z/WWsn5droT
As for why there's a performance difference with the different approaches that depends on the number of arms? My guess would be because of icache. One problem with microbenchmarks is that if you are running only that code, the lots-of-code version looks good until it starts to bust the icache. But the lots-of-code version is often worse in real life, because loading all that code into the icache pushes out the code for whatever else you're doing around it, making it overall worse.