Global default hash function override for HashMap and HashSet

michaeleisel · May 14, 2024, 8:43pm

Although Rust's default hash function (SipHash) is good in that it's DOS resistant, it's much slower than many high-quality, non-cryptographic functions out there. For many problem domains, DOS resistance isn't a concern, meaning devs in these areas are paying a performance price for something they don't ever benefit from. There are many programs that have lots and lots of HashMaps and HashSets without any need for any of them to be DOS resistant.

I propose that Rust allow developers to override the default hash function globally, perhaps in a similar way as they currently override the default allocator. This would allow developers to choose a better hash function for their needs without having to change the type signature for every single occurrence of HashMap/HashSet.

chrefr · May 14, 2024, 9:39pm

But the choice of whether some code needs HashDOS-resistant hashing function is not global, it's per map. So changing it globally can affect places where it was meant to be secure.

The only viable way I see to do that is to provide a GuaranteedDosResistant hasher, but at that point we can just always make the default hasher fast, and it doesn't solve the original motivation - protecting people that do not know they need to secure that. Also, it's a breaking change to require a specific hash for this.

pitaj · May 14, 2024, 10:29pm

Yeah I don't think this should be a global toggle.

What we can do instead is add a clippy restriction lint that triggers on HashMaps that don't manually specify a hasher.

Then you can turn on this lint in your repository if you want to be forced to make a decision for every HashMap you use.

michaeleisel · May 14, 2024, 11:15pm

People would still be able to set the hash function on a per map/set basis, the only thing changing here would be the default that it falls back to if they don't. And yes, changing it globally would affect places where it was meant to be secure, but the developer would be well aware of that. The point here is that, for many applications, there are few to no places where it's meant to be secure, and migration (changing those few-to-none places to use the SipHash hasher) would be easy.

pitaj · May 15, 2024, 1:15am

HashMap is already something you have to manually import. You can always create a type alias for the "fast version" and use that everywhere in your program.

kornel · May 15, 2024, 1:17am

I agree. This would be super useful.

I'm trying to use fxhash and ahash in my code, but I can't easily configure all my dependencies, and incompatibilities between crate interfaces are annoying.

Even though Rust code can be hasher-agnostic in theory, in practice this makes type inference fail too often, and makes struct and function definitions look more complicated.

Another usecase would be for people new to Rust who benchmark it against other languages. Rust can look slow compared to implementations that use weaker hashers, and trying a global built-in option is easier to recommend than changing code to use 3rd party crates.

It's hard to argue for a globally "less secure" option, because it's impossible to prove that any hypothetical program will never encounter any malicious data. It's always possible to imagine some convoluted counter-example scenario.

However, more pragmatically, people building executables can judge what is the likelihood of such attack happening, and whether the consequences are bad enough to pay the performance cost.

For example, I'm building image compression tools. They're already potentially very expensive, so I have CPU time and memory limits already imposed on them. CLI tools run interactively are almost impossible to attack, and the attack itself is pointless - at worst a mildly annoyed user will cancel the operation.

bjorn3 · May 15, 2024, 9:35am

There are two big reasons why IMO the default hasher should be random:

It avoids accidentally depending on iteration order, which isn't stable across rustc versions even with a stable hasher.
A seemingly innocent merge of two hashmaps can take O(n^2) when both hashmaps use the same seed for the hasher even without an attacker having to generate adversarial inputs based on the exact hasher that was used: Exposure of HashMap iteration order allows for O(n²) blowup. · Issue #36481 · rust-lang/rust · GitHub

kornel · May 15, 2024, 2:26pm

This is an orthogonal issue. The seed can stay random and prevent these issues even if the hash function is cryptographically weak.

josh · May 15, 2024, 3:41pm

I've seen proposals for making the default be "start out being fast, but switch to DoS-resistant if it looks like we're getting substantial collisions". If we can do that, that seems like the right default.

Jules-Bertholet · May 16, 2024, 12:22am

One easy win IMO would be to add FxHash to the standard library. Then at least people will know it exists, and won't stick with the slow default purely out of a desire not to add an external dependency.

chrefr · May 16, 2024, 12:24am

I think it's a good idea, but I don't think we should limit ourselves to one algorithm, just like we don't with the current secure hasher. Just add a FastHasher.

scottmcm · May 16, 2024, 12:46am

Yeah, one that's randomized enough to keep people from depending on the details -- so we can change the implementation -- would be nice.

(But maybe call it InsecureHasher, to avoid the "of course I want the fast one" effect.)

That said, deciding which to offer is no easy task. How do you benchmark "all the rust code out there that uses this other one" in any meaningful way? If someone wants a faster hasher they probably want a context-specific one too. (Like the best ones for contiguous runs of keys are different from those for strings, etc.)

Jules-Bertholet · May 16, 2024, 12:49am

I would suggest DosableHasher, as InsecureHasher implies that the default hasher is cryptographically secure, which it is not.

Vorpal · May 16, 2024, 7:11am

I thought ahash was straight upp better than fxhash: aHash/compare/readme.md at master · tkaitchuck/aHash · GitHub

So wouldn't it make more sense to use the best known implementation for implementing an unspecified fast hash (that can be changed in the future)?

SkiFire13 · May 16, 2024, 8:16am

Is that benchmark done with the aes target feature enabled or not? From the docs it seems to imply it would improve performance, but given it's disabled by default I would prefer to see how it performs without it.

Noratrieb · May 16, 2024, 4:02pm

There is generally no such thing as a "better hash". Hash functions have tradeoffs of speed and quality. ahash has higher quality than fxhash, while fxhash is quicker to hash. In the workload of rustc, with lots of integer keys, fxhash was significantly faster than ahash.

eggyal · May 16, 2024, 4:19pm

It's fairly easy to drop a type alias type HashMap<K, V, S = FasterHash> = std::collections::HashMap<K, V, S>; in as a direct replacement for use std::collections::HashMap;. Would setting a global default hasher be much more ergonomic than this? (Okay, it'd only need be once per crate instead of once per module).

jrose · May 16, 2024, 4:31pm

I think part of the (original) request was to be able to change HashMaps in dependencies. The conversation has moved on since that, but that’s why a type alias wasn’t immediately the answer.

the8472 · May 16, 2024, 5:02pm

Sounds like another thing that global exclusive features could be used for. Only letting the final consumer (the bin crate) opt into yolo-hash or whatever the feature would be called sounds like the right place, no dependency would be in a position to decide the default for all its consumers.

epage · May 16, 2024, 5:12pm

The following RFCs generalize the global allocator mechanism so its easier to add additional uses, like hashing algorithms

Topic		Replies	Views
Pre-RFC: bless the `type FastHashMap` pattern	92	4113	March 25, 2019
Help wanted: fast hash maps in std	29	6135	March 25, 2019
A new default Hasher for HashMap? libs	26	6949	December 8, 2019
Make Hasher portable? (mini RFC) libs	37	2748	March 25, 2019
RandomState::new_non_random libs	19	1828	May 21, 2021

Global default hash function override for HashMap and HashSet

Related topics