How is the implementation going to find the lengths of all variable length sub components?
See also:
https://github.com/contain-rs/hashmap2/pull/5/files
a fairly old (March) PR to a fork of the current HashMap implementation, which implements this already. Iām not sure what the current state is.
I really should try hacking on HashMap myself as there are a bunch of other potential optimizations Iād like to look at.
For some reason, MurmurHash3 hasnāt been extensively investigated. It performs vastly better than xxHash for short keys because it has much less state to populate, but has top-quality (non-cryptographic) properties. For example, a 1-byte array can give a 32 bit hash in ~4 ns in Scala, which handily beats everything on https://cdn.rawgit.com/Gankro/hash-rs/7b9cf787a830c1e52dcaf6ec37d2985c8a30bce1/index.html save FNV. A somewhat-carefully optimized Rust implementation could probably outdo that substantially.
I strongly support the idea of having context-dependent hashing (switch to a slower algorithm if many collisions are detected), but I am not sure that FNV is a good choice for a default for anything. You donāt have to spend very much more time to get at least somewhat better hashing even at very short key sizes.
Iām improving the cache awareness in this PR https://github.com/rust-lang/rust/pull/36692 This will add up to any hashing improvements.
How much does the extra hash_u8(0xff)
cost? If itās large, itās worth noting we can avoid this for String
where s.capacity() > s.len()
by writing the 0xff
to the underlying buffer.
A lot, I did benchmark something similar here Extend the `Hasher` trait with `fn delimit` to support one-shot hashing by pczarn Ā· Pull Request #1666 Ā· rust-lang/rfcs Ā· GitHub
We should get rid of it altogether (there's already open issues on the subject).
FWIW, even FnvHasher is on the slow side. Look at https://github.com/rust-lang/rust/pull/37229 where I replaced FnvHasher with a worse-but-faster hash function and sped up rustc itself by 2ā6% across most workloadsā¦
I think it's important to contextualize a bit here. That workload is mostly 98%+ usize and u64 writes, and for those the PR hasher is a single instruction instead of 16 non-parallelizable ones (fnv). Even if the hash-fn is a bit bad and has bad distribution at the lower bits, solving partial-collisions is not that expensive in rustc hashmap implementation, thus the speedup.
You can avoid the string terminator hashing on a case by case basis.
Comparing inserting 10,000 elements (no reallocation) on HashMap with the default hasher (SipHash13)
Some benches with very short strings (stringified integers 0ā¦10,000)
test insert_hashmap_string_10_000 ... bench: 1,248,610 ns/iter (+/- 19,435)
test insert_hashmap_string_oneshot_10_000 ... bench: 1,211,014 ns/iter (+/- 18,908)
test lookup_hashmap_10_000_exist_string ... bench: 199,742 ns/iter (+/- 1,870)
test lookup_hashmap_10_000_exist_string_oneshot ... bench: 177,365 ns/iter (+/- 2,821)
āone shotā hashing adaptor for strings code in a gist
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.