Help wanted: fast hash maps in std

How is the implementation going to find the lengths of all variable length sub components?

See also:

https://github.com/contain-rs/hashmap2/pull/5/files

a fairly old (March) PR to a fork of the current HashMap implementation, which implements this already. Iā€™m not sure what the current state is.

I really should try hacking on HashMap myself as there are a bunch of other potential optimizations Iā€™d like to look at.

For some reason, MurmurHash3 hasnā€™t been extensively investigated. It performs vastly better than xxHash for short keys because it has much less state to populate, but has top-quality (non-cryptographic) properties. For example, a 1-byte array can give a 32 bit hash in ~4 ns in Scala, which handily beats everything on https://cdn.rawgit.com/Gankro/hash-rs/7b9cf787a830c1e52dcaf6ec37d2985c8a30bce1/index.html save FNV. A somewhat-carefully optimized Rust implementation could probably outdo that substantially.

I strongly support the idea of having context-dependent hashing (switch to a slower algorithm if many collisions are detected), but I am not sure that FNV is a good choice for a default for anything. You donā€™t have to spend very much more time to get at least somewhat better hashing even at very short key sizes.

1 Like

Iā€™m improving the cache awareness in this PR https://github.com/rust-lang/rust/pull/36692 This will add up to any hashing improvements.

1 Like

How much does the extra hash_u8(0xff) cost? If itā€™s large, itā€™s worth noting we can avoid this for String where s.capacity() > s.len() by writing the 0xff to the underlying buffer.

A lot, I did benchmark something similar here Extend the `Hasher` trait with `fn delimit` to support one-shot hashing by pczarn Ā· Pull Request #1666 Ā· rust-lang/rfcs Ā· GitHub

We should get rid of it altogether (there's already open issues on the subject).

1 Like

FWIW, even FnvHasher is on the slow side. Look at https://github.com/rust-lang/rust/pull/37229 where I replaced FnvHasher with a worse-but-faster hash function and sped up rustc itself by 2ā€“6% across most workloadsā€¦

I think it's important to contextualize a bit here. That workload is mostly 98%+ usize and u64 writes, and for those the PR hasher is a single instruction instead of 16 non-parallelizable ones (fnv). Even if the hash-fn is a bit bad and has bad distribution at the lower bits, solving partial-collisions is not that expensive in rustc hashmap implementation, thus the speedup.

You can avoid the string terminator hashing on a case by case basis.

Comparing inserting 10,000 elements (no reallocation) on HashMap with the default hasher (SipHash13)

Some benches with very short strings (stringified integers 0ā€¦10,000)

test insert_hashmap_string_10_000          ... bench:   1,248,610 ns/iter (+/- 19,435)
test insert_hashmap_string_oneshot_10_000  ... bench:   1,211,014 ns/iter (+/- 18,908)
test lookup_hashmap_10_000_exist_string         ... bench:     199,742 ns/iter (+/- 1,870)
test lookup_hashmap_10_000_exist_string_oneshot ... bench:     177,365 ns/iter (+/- 2,821)

ā€œone shotā€ hashing adaptor for strings code in a gist

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.