Wouldn’t it be nice if Rust hash maps were faster? Everybody says so. The libs team generally agrees that std should provide hash maps that are fast in cases where today’s HashMaps with the default hasher (SipHash 1-3) are not, but the libs team does not have the capacity to work on this directly right now. This is your invitation to help make that happen.
A suitable design for this is not clear, and this is a contentious issue, so a motivated person will need to work with interested parties to come up with a viable design, then take it through the RFC process.
Some likely constraints:
- A solution that removes from the default HashMap Rust’s long-established strong protection against collision-based attacks is unlikely to be accepted, for backwards-compatibility reasons if nothing else, so the most likely path forward is to add additional hashers in some form to std.
- The old plan for this depended on default type parameter fallback, so that
let map: HashMap<_, _, FnvState> = HashMap::new()would create FNV hashmaps while
let map: HashMap<_, _> = HashMap::new()would continue to create SipHash hashmaps. Unfortunately, today
HashMap::newis only implemented for the default hasher and changing that appears to be massively breaking, so this is unlikely to be workable; not to mention that default type parameter fallback is unstable. As a result it will only be possible to instantiate
HashMap<_, _, CustomState>with the
defaultmethod or some other custom constructor.
- It’s desirable to have freedom to change the hashing algorithms used in std,
so naming should be semantic instead of based on the algorithm. Today’s hasher
DefaultHasher, a new one might be called e.g.
- It is primarily short keys (< 20 bytes) that are slow with SipHash1-3. SipHash1-3 performance with medium length keys is competitive. With long keys SipHash trails algorithms like xxHash, but the case for long keys in hash maps may not be common.
- Previous investigations into the characteristics of various hash algorithms can be found here.
- Hybrid implementations that switch algorithms based on key length sound attractive but seem to be firmly in the realm of research so don’t seem prudent right now.
Solutions might entail a new
FastHasher type and possibly new
FastHashSet types or typedefs.
So that’s it. The door is open. We just need to solve some design problems to step through. Be the change you want to see in the std.