Pre-RFC: bless the `type FastHashMap` pattern

background: rust defaults to a hashing algorithm that is resistant to collision attacks. This comes at two notable out-of-the-box costs: HashMaps behave non-deterministically and slower.

Long ago we defined https://doc.rust-lang.org/nightly/std/collections/hash_map/struct.DefaultHasher.html which just vaguely promises to have these properties, without guaranteeing any stable algorithm.

While we are adamant on the fact that this is a good default since the issue at hand is obscure and serious, it is definitely not the right solution for everyone. Those who care about determinism or performance and are aware of this situation reach for a custom hasher crate. Notably FxHasher is used by rustc, webrender, and stylo.

I would like to bless this pattern and make it easier to reach for random users who aren’t well-versed in all of the options and the situation. This is done with three steps:

  • Define struct std::collections::hash_map::FastHasher, which opaquely wraps up FxHasher, claims that it is tuned for performance, and guarantees deterministic behaviour for a given input (for a given std version, but the impl may change between versions) (also we still don’t guarantee any particular relationship between iteration and insertion order; that’s what a linked hashmap is for)
  • Define type std::collections::FastHashMap = HashMap<K, V, FastHasher> (yes, at the top!)
  • Define fn HashMap::fast() -> FastHashMap<K, V>

The ideal hasher for your project depends on your keys, but Fx seems to be pretty good for a lot of usecases, and represents a simple win for a lot of crates. Projects which are interested in further tuning may investigate the ecosystem as before.

sfackler suggested breaking this down into subcategories like FastForSmallKeys and FastForBigKeys but I think that’s overkill. I think we want a nice simple story here since the crates.io escape hatch will always exist regardless of what we do.

15 Likes

Yeah, my SmallKey/BigKey suggestion was based on me misremembering the performance characteristics of hashers last time I benched them (I thought FNV was faster than Fx at small key sizes).

I’m not totally sold on the naming of “fast” - “why would I want to use a hash map when I could use a fast hash map!?”. I don’t know of a better option off the top of my head though that isn’t something way too verbose.

1 Like

I kinda like fast because it’s evocative.

“Wait, why isn’t Fast what new makes?? I should read the docs on fast!”

(which will give a very thorough explanation)

2 Likes

Surely we need a faster and deterministic hashing mechanism, but I wouldn’t call it just “fast”. That suggests unequivocally “better”, and people will start reaching for it instead of the secure one.

I would give it a scarier name like UnsafeHasher, UnsafeFastHasher, DosProneFastHasher or something along those lines. Since its use induces a well-known and important security risk, its name should clearly indicate this up front.

13 Likes

I presume this would be applied to a FastHashSet alias too.

Perhaps "Weak" is a better adjective?

An abstract hasher could choose different implementations based on size_of, if there's really a benefit. (edit: I realized Hasher doesn't know it's total input type ahead of time, so maybe not.)

3 Likes

Its certainly not Unsafe, which is a term of art in the context of the Rust std. I’m inclined to agree with @Gankra that documenting clearly why “fast” is not the default version is the proper course, but if we had to choose something negative Insecure would be more appropriate than Unsafe.

I might prefer Faster to Fast, because it suggests that its a faster default while implying that HashMap is still “fast enough”.

11 Likes

I can live with a “why isn’t this the default” name like FastHashMap or a “sounds scarier than it really is” name like UnsafeHashMap or InsecureHashMap, but I think we should at least consider ways of getting “collision resistant” into these names, since that’s the magical property that actually matters.

HashMap::withoutCollisionResistance() -> NonCollisionResistantHashMap<K, V> is probably too long

HashMap::withoutCR() or nonCR() -> NonCRHashMap<K, V> seems pretty workable if the docs explain what CR stands for and what it means. Of course, we need that documented anyway, and I’m not sure “Collision Resistant” is really self-evident to novices unfamiliar with the problem, so maybe “CR” isn’t losing anything.

Annoyingly I can’t think of any existing terms of art or abbreviations or verbiage that would help connote “opt-out of a security property that has a perf cost because you’re sure you don’t need it here”, because that only makes sense in a “slow but secure by default” context which is pretty rare in any language.

I think if the friction of using Fast and Secure impls is the same, most folks will pick Fast by default (I certainly would, unless I am explicitly writing a web-facing thing). A neat hack here would be to simultaneously add HashMap / HashSet to prelude, but avoid adding FastHashMap there.

7 Likes

The friction is slightly higher because most will mindlessly just use ::new(), as all examples will continue to use ::new() and that's what you do with every other type. You still have to "know" that ::fast() is a thing. And FastHashMap is longer to type :slight_smile:

For sorting algorithms we’ve got sort and sort_unstable, rather than sort_faster.

If the goal is to make users go to the docs to learn about hash map problems, then a cryptic name may be fine. NCRHashMap maybe? (ncr = non-collision-resistant) or DoSHashMap

9 Likes

Repeating my earlier suggestion, I believe “weak” is already a term of art regarding collision resistance.

This is an ok name but a weak map is a term-of-art in managed languages, indicating weak ownership of the keys (as in Arc::weak): WeakMap - JavaScript | MDN

9 Likes

Ugh, that’s annoying – Java even has it spelled exactly as WeakHashMap.

I think just having a FastHashMap type alias is sufficient. Someone seeing this type name for the first time will immediately wonder why the default HashMap isn’t fast and the answer will be in the documentation for both HashMap and FastHashMap.

The FastHashMap name is fine as it is: it captures the main reason why someone would want to use this over HashMap.

1 Like

The only change I would suggest is HashMap::new_fast() instead of HashMap::fast(). The key verb here is “new”.

Regarding separate “fast hashers” for small and large keys, I think this is best solved in the hasher itself: Hasher::write() can switch to a different hash function if it is passed a long slice.

2 Likes

This is a cool solution because it makes total friction go down while keeping the relative friction the same. Personally I’ve always felt like all the maps and sets - maybe all of the collections - should be in the prelude.

9 Likes

This would be pretty great! (LinkedList probably shouldn't be in the prelude because reasons...)

4 Likes

Don’t want to bikeshed too much here, but I do very much agree with the notion that this should come with a “negative” name like InsecureHashMap (similar to sort_unstable). Even if the relative friction for using the secure version is decreased, I think it’s important that out-of-context uses make it immediately clear that there is a downside to using this version. Relying on users to internalize documentation is not a good alternative IMO.

Otherwise, I think including this in std is a great idea!

13 Likes

pub type HashMap<K, V> = collections::HashMap<K, V>; in prelude?

I thought that the default hash implementation automatically adapted so that it only used a collision-resistant hash if it started seeing collision attacks?