Help wanted: fast hash maps in std

sfackler · September 15, 2016, 4:29am

How is the implementation going to find the lengths of all variable length sub components?

comex · September 15, 2016, 5:13am

See also:

https://github.com/contain-rs/hashmap2/pull/5/files

a fairly old (March) PR to a fork of the current HashMap implementation, which implements this already. I’m not sure what the current state is.

I really should try hacking on HashMap myself as there are a bunch of other potential optimizations I’d like to look at.

Ichoran · September 17, 2016, 3:49am

For some reason, MurmurHash3 hasn’t been extensively investigated. It performs vastly better than xxHash for short keys because it has much less state to populate, but has top-quality (non-cryptographic) properties. For example, a 1-byte array can give a 32 bit hash in ~4 ns in Scala, which handily beats everything on https://cdn.rawgit.com/Gankro/hash-rs/7b9cf787a830c1e52dcaf6ec37d2985c8a30bce1/index.html save FNV. A somewhat-carefully optimized Rust implementation could probably outdo that substantially.

I strongly support the idea of having context-dependent hashing (switch to a slower algorithm if many collisions are detected), but I am not sure that FNV is a good choice for a default for anything. You don’t have to spend very much more time to get at least somewhat better hashing even at very short key sizes.

arthurprs · September 24, 2016, 9:46am

I’m improving the cache awareness in this PR https://github.com/rust-lang/rust/pull/36692 This will add up to any hashing improvements.

Veedrac · October 3, 2016, 4:46am

How much does the extra hash_u8(0xff) cost? If it’s large, it’s worth noting we can avoid this for String where s.capacity() > s.len() by writing the 0xff to the underlying buffer.

arthurprs · October 12, 2016, 7:52pm

A lot, I did benchmark something similar here Extend the `Hasher` trait with `fn delimit` to support one-shot hashing by pczarn · Pull Request #1666 · rust-lang/rfcs · GitHub

We should get rid of it altogether (there's already open issues on the subject).

nnethercote · October 27, 2016, 12:41am

FWIW, even FnvHasher is on the slow side. Look at https://github.com/rust-lang/rust/pull/37229 where I replaced FnvHasher with a worse-but-faster hash function and sped up rustc itself by 2–6% across most workloads…

arthurprs · October 27, 2016, 7:44am

I think it's important to contextualize a bit here. That workload is mostly 98%+ usize and u64 writes, and for those the PR hasher is a single instruction instead of 16 non-parallelizable ones (fnv). Even if the hash-fn is a bit bad and has bad distribution at the lower bits, solving partial-collisions is not that expensive in rustc hashmap implementation, thus the speedup.

bluss · October 27, 2016, 12:43pm

You can avoid the string terminator hashing on a case by case basis.

Comparing inserting 10,000 elements (no reallocation) on HashMap with the default hasher (SipHash13)

Some benches with very short strings (stringified integers 0…10,000)

test insert_hashmap_string_10_000          ... bench:   1,248,610 ns/iter (+/- 19,435)
test insert_hashmap_string_oneshot_10_000  ... bench:   1,211,014 ns/iter (+/- 18,908)
test lookup_hashmap_10_000_exist_string         ... bench:     199,742 ns/iter (+/- 1,870)
test lookup_hashmap_10_000_exist_string_oneshot ... bench:     177,365 ns/iter (+/- 2,821)

“one shot” hashing adaptor for strings code in a gist

system · March 25, 2019, 8:26am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: bless the `type FastHashMap` pattern	92	4074	March 25, 2019
Help harden HashMap in libstd! libs	27	4435	March 25, 2019
A new default Hasher for HashMap? libs	26	6925	December 8, 2019
Global default hash function override for HashMap and HashSet libs	26	1259	August 20, 2024
Low-latency hashing libs	11	1348	December 18, 2024

Help wanted: fast hash maps in std

Related topics