Hashing in Rust is slow. The current hashing scheme tends to produce very small chunks, which results in a lot of overhead compared to the actual hashing, which leads to a performance hit on the order of 25x for normal usage. A faster hashing algorithm makes no difference:
test hash::xxh::tests::bench_64k_oneshot ... bench: 16567 ns/iter (+/- 973) = 3955 MB/s test hash::xxh::tests::bench_long_str ... bench: 213 ns/iter (+/- 6) = 2093 MB/s test hash::sip::tests::bench_long_str ... bench: 905 ns/iter (+/- 12) test hash::xxh::tests::bench_str_under_8_bytes ... bench: 53 ns/iter (+/- 3) = 56 MB/s test hash::sip::tests::bench_str_under_8_bytes ... bench: 48 ns/iter (+/- 2)
But what if we relaxed the requirements to allow large chunks more often? Hash the binary representation instead of the semantics. This would make the hash result depend on architecture (endianness), compiler decisions (order of fields, padding) as well as
unsafe shenanigans (assume padding bytes are zero), but it could make common cases a whole lot faster.
For a motivating example, right now a
Vec<u16> can only be hashed one
u16 at a time because the bytes might need to be swapped on big endian, which forces the worst possible code path we can take, even though
to_le() is a no-op on little endian (i.e. most Rust installations). The compiler cannot be expected to optimize this away without arbitrary rewrite rules. Under the new scheme, the data would be hashed as a single
&[u8], allowing the hasher to do its job with no overhead.
Security would not suffer because we still use secure algorithms, in fact it would force an attacker to gather additional data.
Obviously, the stable hash would remain as either a prominent option or the default, but I expect most people would take the speed if the docs gave them the choice.