A new default Hasher for HashMap?

#1

I have been working on aHash for a while now. It’s performance has steadily improved. As a hashing algorithm for a HashMap I believe it to be significantly superior to SipHash-1-3 which is the current default.

This is my first time working on such a change so It would be good to get some advice. What is the best next step? Is there a good way to test our the real world performance impact? (Like maybe a compiler benchmark)

Does anyone have any ideas for tests, benchmarks, features, or integrations they would like to see?

1 Like

#2

The HashMap by it self can be replaced in the near future: https://github.com/rust-lang/rust/pull/58623 , so I guess you can look at benchmarks that were used, and do the same.

0 Likes

#3

The README is a bit unclear about the properties you’re trying to achieve.

It’s described as “non-cryptographic hashing algorithm”, which seems to imply that it is not collision resistant, but it also simultaneously makes claims that it is “DOS resistant”. To me these claims are conflicting.

The silver bullet for preventing hashDoS is a cryptographically secure PRF with a random key. Since this construction does not fall into that category, I’m curious what the basis is for claiming it resists hashDoS. Can you provide a rationale for why you believe this is the case?

Have you run this construction by any of the authors of the previous hashDoS papers and gotten their opinion on it? I’m worried because there’s quite a history of constructions put forward as being “hashDoS resistant” which were subsequently broken when given proper scrutiny.

11 Likes

#4

Note that the compiler already doesn’t use the default hasher in most places. You could swap the use of fnv for yours and see if the perf suite comes out better, but that would just be for changing rustc – it wouldn’t say much about whether the default should change, since there are many hashers faster than the default.

0 Likes

#5
1 Like

#6

Not at all. Consider SHA-1 It’s a cryptographic hash, but it’s not DOS resistant, because it’s trivial to brute force a collision of say the lower 20bits. So if you used SHA as your hash map’s hashcode algorithm an malicious party could collect a bunch of strings that cause a partial collision and turn your hashmap into a list, effectively DOSing the CPU on your server.

So the criteria are different. DOS resistance implies that even if you know the algorithm, you can’t choose in advance a bunch of keys that will have partial collisions. A cryptographic hash guarentees for example that you ought not to be able to ‘undo’ the hash, and work out what data produced it. AHash does NOT provide this feature. Obviously AES is a block cipher, so it can be easily reversed (knowing the key).

0 Likes

#7

For the AES version, the is based on the assumption that AES is strong. In particular it rests on the assumption that someone not knowing the key won’t be able to generate data that will result in a high probability of collision after two rounds of AES.

Looking at the structure of AES you can immediately see why this is highly likely. The first step is an S-box lookup. If the incoming data is xored with a key the attacker cannot control the lookup path and resulting data will be totally different. This is only amplified after the shift/mix step, and a second round repeats this process.

AES has been sufficiently analyzed that I don’t think any new predictable patterns will come up that will allow for exploit.

For the fallback algorithm the case is harder to make, as it is operating on primitives. So what I can say is this: I have taken the time to understand how each of the hashes was broken and did not repeat the same mistake. (I can’t promise I haven’t invented a whole new way of being wrong.) But I do think any attack against it will require a significantly higher level of sophistication.

One reason for this is a very simple test I wrote: I assert that every single bit in the input will end up changing a large enough number of bits, nibbles, and bytes in the output to look like a totally different value. I make a similar assertion about changing every single bit in the key. Every hasher that has been broken that I am aware of, would have failed this simple test. It’s easy for an attacker to cause a collision when one bit only affects one other bit, they just have to find a way to cancel it out. But if a single bit flip affects many other bits and which ones depends on a key they cannot observe, it is much harder to attack.

1 Like

#8

You’re describing the properties of collision resistance and preimage resistance respecively. So I guess your claim is your construction is collision resistant, but not preimage resistant. I’d agree that would be sufficient for this particular use case, however…

For substantiating your security claims, the modern bar in cryptographic research is generally something like a security proof that your hash function is collision resistant so long as AES is a secure PRP.

3 Likes

#9

Oh, it absolutely holds that if a single round of AES is a secure PRP then AHash is provably collision resistant. That part is trivial. The problem is that a single round of AES is NOT a secure PRP! It is fairly distinguishable, even just visually.

0 Likes

#10

I’d agree that it should be fairly trivial to provide a proof your construction is secure if it’s based on a PRP, as a PRP is a powerful building block for avoiding collisions. However…

…you are using AES in a particularly unusual way, which is worrisome from a security perspective.

I would suggest looking through the academic literature for a construction which already has a security proof you can build on or adapt.

Encrypting the output of a universal hash function is a fairly well-understood technique.

2 Likes

#11

@bascule I’ve written up some of my thoughts here: https://github.com/tkaitchuck/aHash/wiki/Attacking-aHash-or-why-it’s-good-enough-for-a-hashmap Treating AES a 4 32bit SPRPs is plenty secure. As unpredictability required to prevent DOS attack is many orders of magnitude lower than what is needed in a cryptographically secure hash.

0 Likes

#12

If I understand correctly, the hash function will differ depending on whether AES is available? Are there situations where this might be a problem, such serializing a data structure on one machine and deserializing on another?

0 Likes

#13

The default Hasher used in an instance of HashMap is already randomized through RandomState so not even two HashMaps created within the same run of a program can have hashes directly compared with each other (unless they have their BuildHasher overridden). Also RandomState provides no way to seed with a specific value, so you cannot even attempt to construct two HashMaps using the same Hasher across multiple program runs via it.

If you’re using a Hasher instance for serialization purposes you already need to be choosing a concrete implementation that guarantees reproducibility, changing the default for HashMap shouldn’t affect that.

0 Likes