Global default hash function override for HashMap and HashSet

Although Rust's default hash function (SipHash) is good in that it's DOS resistant, it's much slower than many high-quality, non-cryptographic functions out there. For many problem domains, DOS resistance isn't a concern, meaning devs in these areas are paying a performance price for something they don't ever benefit from. There are many programs that have lots and lots of HashMaps and HashSets without any need for any of them to be DOS resistant.

I propose that Rust allow developers to override the default hash function globally, perhaps in a similar way as they currently override the default allocator. This would allow developers to choose a better hash function for their needs without having to change the type signature for every single occurrence of HashMap/HashSet.

5 Likes

But the choice of whether some code needs HashDOS-resistant hashing function is not global, it's per map. So changing it globally can affect places where it was meant to be secure.

The only viable way I see to do that is to provide a GuaranteedDosResistant hasher, but at that point we can just always make the default hasher fast, and it doesn't solve the original motivation - protecting people that do not know they need to secure that. Also, it's a breaking change to require a specific hash for this.

4 Likes

Yeah I don't think this should be a global toggle.

What we can do instead is add a clippy restriction lint that triggers on HashMaps that don't manually specify a hasher.

Then you can turn on this lint in your repository if you want to be forced to make a decision for every HashMap you use.

3 Likes

People would still be able to set the hash function on a per map/set basis, the only thing changing here would be the default that it falls back to if they don't. And yes, changing it globally would affect places where it was meant to be secure, but the developer would be well aware of that. The point here is that, for many applications, there are few to no places where it's meant to be secure, and migration (changing those few-to-none places to use the SipHash hasher) would be easy.

HashMap is already something you have to manually import. You can always create a type alias for the "fast version" and use that everywhere in your program.

3 Likes

I agree. This would be super useful.

I'm trying to use fxhash and ahash in my code, but I can't easily configure all my dependencies, and incompatibilities between crate interfaces are annoying.

Even though Rust code can be hasher-agnostic in theory, in practice this makes type inference fail too often, and makes struct and function definitions look more complicated.

Another usecase would be for people new to Rust who benchmark it against other languages. Rust can look slow compared to implementations that use weaker hashers, and trying a global built-in option is easier to recommend than changing code to use 3rd party crates.


It's hard to argue for a globally "less secure" option, because it's impossible to prove that any hypothetical program will never encounter any malicious data. It's always possible to imagine some convoluted counter-example scenario.

However, more pragmatically, people building executables can judge what is the likelihood of such attack happening, and whether the consequences are bad enough to pay the performance cost.

For example, I'm building image compression tools. They're already potentially very expensive, so I have CPU time and memory limits already imposed on them. CLI tools run interactively are almost impossible to attack, and the attack itself is pointless - at worst a mildly annoyed user will cancel the operation.

4 Likes

There are two big reasons why IMO the default hasher should be random:

2 Likes

This is an orthogonal issue. The seed can stay random and prevent these issues even if the hash function is cryptographically weak.

5 Likes

I've seen proposals for making the default be "start out being fast, but switch to DoS-resistant if it looks like we're getting substantial collisions". If we can do that, that seems like the right default.

1 Like

One easy win IMO would be to add FxHash to the standard library. Then at least people will know it exists, and won't stick with the slow default purely out of a desire not to add an external dependency.

5 Likes

I think it's a good idea, but I don't think we should limit ourselves to one algorithm, just like we don't with the current secure hasher. Just add a FastHasher.

5 Likes

Yeah, one that's randomized enough to keep people from depending on the details -- so we can change the implementation -- would be nice.

(But maybe call it InsecureHasher, to avoid the "of course I want the fast one" effect.)

That said, deciding which to offer is no easy task. How do you benchmark "all the rust code out there that uses this other one" in any meaningful way? If someone wants a faster hasher they probably want a context-specific one too. (Like the best ones for contiguous runs of keys are different from those for strings, etc.)

2 Likes

I would suggest DosableHasher, as InsecureHasher implies that the default hasher is cryptographically secure, which it is not.

I thought ahash was straight upp better than fxhash: aHash/compare/readme.md at master · tkaitchuck/aHash · GitHub

So wouldn't it make more sense to use the best known implementation for implementing an unspecified fast hash (that can be changed in the future)?

Is that benchmark done with the aes target feature enabled or not? From the docs it seems to imply it would improve performance, but given it's disabled by default I would prefer to see how it performs without it.

There is generally no such thing as a "better hash". Hash functions have tradeoffs of speed and quality. ahash has higher quality than fxhash, while fxhash is quicker to hash. In the workload of rustc, with lots of integer keys, fxhash was significantly faster than ahash.

1 Like

It's fairly easy to drop a type alias type HashMap<K, V, S = FasterHash> = std::collections::HashMap<K, V, S>; in as a direct replacement for use std::collections::HashMap;. Would setting a global default hasher be much more ergonomic than this? (Okay, it'd only need be once per crate instead of once per module).

I think part of the (original) request was to be able to change HashMaps in dependencies. The conversation has moved on since that, but that’s why a type alias wasn’t immediately the answer.

2 Likes

Sounds like another thing that global exclusive features could be used for. Only letting the final consumer (the bin crate) opt into yolo-hash or whatever the feature would be called sounds like the right place, no dependency would be in a position to decide the default for all its consumers.

3 Likes

The following RFCs generalize the global allocator mechanism so its easier to add additional uses, like hashing algorithms