A new default Hasher for HashMap?

Why not just have a fallback mechanism that uses a fast hash like aHash by default and falls back first to a secure hash and then to a BTreeMap using the concatenation of values passed to the hasher as key?

Because ultimately if the attacker can do perfect timing they can DoS any hash table by simply inserting, looking up and deleting random keys until they find a collision (detectable with timing) and keep doing so until they create an arbitrary number of colliding keys (or force table resizes until address space is exhausted).

Even being able to iterate the hash table in order should be enough to cause collisions with any hash: just add a bunch of keys and then delete all keys except the ones closest to a reference keys, and repeat keeping a bit more keys every iteration: if done right I think it allows to keep the number of colliding keys to the number of total keys within a constant factor.

Yes, and I can break AES by just guessing keys until I get the right one. The question everybody wants to know is if you can do it more quickly than that.

Assuming a hash table with a million entries (and thus, 1,111,112 buckets, but I can round down and just go with a million buckets, too). If I want to get them all to collide, and the hashing function always produces a different output for each input, I would have to make half a million guesses on average to get an item to collide with another one*. Assuming it takes a millisecond** to make each guess, it would take more than a year to force a million collisions.

* Since PRFs might produce the same output twice for each input, it would probably take more guesses. But searching an infinite space is harder.

** Honestly, most networks have more than a millisecond of latency, too.

3 Likes

Yes, online brute-force search is alway possible; that was already discussed, but the TL;DR is that a proper keyed hash doesn't introduce a DoS amplification factor: the adversary has to insert on the order of 2³² values (around 4.3 trillions) before hitting the first collision with non-neglible probability (assuming a 64b PRF such as SipHash); for an adversary looking for partial collisions, @notriddle already provided you the math.

hashDoS is a specific class of vulnerabilities where the adversary can compute, offline, many values colliding to the same hash, and only then cause the remote service to insert them all in a hash table, causing O(n²) ressource usage (with n the number of multicollisions).

In the case of CityHash and MurmurHash, this is possible because the hashDoS team discovered (through differential cryptanalysis, IIRC) an efficient attack that yields universal multicollisions: multiple values that collide under any hash key.

2 Likes

Purely out of curiosity, do you know if anyone has written up those attacks?

I can ask the researcher. It'd definitely be nice if there was more public info on them.

You only need many keys to end in the same bucket to slow down an hash table from O(1) to O(n), not have the same hash, so you only need to insert O(B) keys where B is the size of the hash table / number of buckets.

For the online attack, vulnerability depends on the protocol, but an efficient protocol with bulk inserts and support for accurate timing (e.g. one that lets you upload an arbitrary JITted script where you have an API with access to the hash table and a CPU cycle counter) should be able to result in somewhat practical DoS at least for small tables.