Implement XXHASH for core/hash

I did a quick POC to see if xxhash64 would be a good implementation for core/hash due to speed. I think speed wise, it makes sense. My reasoning for this is to keep users of the language to continue using the language std libs rather than implementing their own hasher (unless needed for cryptographic reasons).

thoughts?

Benchmarks with Hashmap collection

It seems to be susceptible to collision attacks.

1 Like

Based on your benchmarks xxhash is not a clear winner in terms of performance. I think it would be difficult to argue in favor of a switch to xxhash without significant performance gains, which appear to be lacking here.

1 Like

FXHash isn't any better. Back to the drawing boards...

//FXHASH
test hash::map::find_existing                ... bench:       9,736 ns/iter (+/- 220)
test hash::map::find_nonexisting             ... bench:       9,190 ns/iter (+/- 60)
test hash::map::get_remove_insert            ... bench:          47 ns/iter (+/- 1)
test hash::map::grow_by_insertion            ... bench:         246 ns/iter (+/- 17)
test hash::map::hashmap_as_queue             ... bench:          32 ns/iter (+/- 2)
test hash::map::new_drop                     ... bench:           4 ns/iter (+/- 0)
test hash::map::new_insert_drop              ... bench:          52 ns/iter (+/- 0)
test hash::set_ops::set_difference           ... bench:         131 ns/iter (+/- 14)
test hash::set_ops::set_intersection         ... bench:         133 ns/iter (+/- 2)
test hash::set_ops::set_is_subset            ... bench:         130 ns/iter (+/- 14)
test hash::set_ops::set_symmetric_difference ... bench:       1,303 ns/iter (+/- 15)
test hash::set_ops::set_union                ... bench:         462 ns/iter (+/- 44)

we've implemented FXHASH in some parts of the Rustc and it's also susceptible to collisions

That is true. Is replacing the uses of Fx in rustc the idea? Because from this...

...and from the comparisons to SIP, I interpreted the goal as replacing the default hasher in the standard library.

From the documentation:

By default, HashMap uses a hashing algorithm selected to provide resistance against HashDoS attacks.

LLVM learned blake3 a while ago: ⚙ D121510 [Support] Introduce the BLAKE3 hashing function implementation They already use it in the linkers.

Blake3 is strong enough, but last I checked it's perf was mostly compared to cryptographic hashes rather than something like Sip (especially Siphash13)

According to GitHub - rurban/smhasher: Hash function quality and speed tests (which is not perfect but is probably good enough here), SipHash13 (what we use as the default hasher in HashMap) is around 2x faster than Blake3

2 Likes

I did some additional work implementing XXHASH and Komihash. They win out via speed but, when translated to hashmap, the speed of SIP is faster. I think there's a need to rework std/collections/hashmap to allow better speeds for different hashes.

The APIs in std assume the streaming case is the generally as fast as the other cases. For xxHash this is not true. I talk about it here: Tracking Issue for `BuildHasher::hash_one` · Issue #86161 · rust-lang/rust · GitHub

The other option for xxHash is to not use streaming, and combine the hashes from different types yourself -- This is fine because it isn't like xxHash describes what should happen when you hash different fields of a struct together.

That said, even doing this, I was never able to get xxHash to be as fast as rustc_hash::FxHash, and while xxHash is higher quality, it still has plenty of collisions (which can be forced).

Generally speaking, I'm also opposed to adding something like xxHash to the stdlib too. SipHash13 is pretty fast (faster than people tend to think) and it would be a footgun. It's not like adding something from crates.io is difficult anyway.

agreed, Xxhash is out of the equation now

XXH2, yes, but XXH3? aHash lists XXH3's DOS resistance as unknown; if attacks have been discovered, that should probably be updated.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.