Help harden HashMap in libstd!

[quote=“dobenour, post:19, topic:4138, full:true”] @briansmith please check this, but I think that incrementing the seed to SipHash (not the key) would solve the problem, inasmuch as as SipHash is considered to be a strong PRF (pseudorandom function) and that is not affected by any of these attacks. Since SipHash is a PRF, changing even one bit of the input completely changes the output. By “seed”, I mean the common prefix that is used by all hasher invocations.[/quote]

Sounds plausible.

It seems strange to me to add this extra complexity to every single hash table, just in case merges another hash table with attacker-controlled contents into it. I’m curious what other implementations do or don’t do to avoid this.

1 Like

Aren’t we going to replace SipHash anyway?

Also, why not cuckoo hashing instead of robin hood? edit: links

basic theory web.stanford.edu/class/cs166/lectures/13/Small13.pdf original paper (.pdf) some concurrent work https://da-data.blogspot.nl/2013/03/optimistic-cuckoo-hashing-for.html

@Ofekmeister one just have to prove the Cuckoo benefits and open the PR.

Personally I think chucko hash will perform worse on average. There's this excellent paper with a comprehensive-ish analysis https://infosys.cs.uni-saarland.de/publications/p249-richter.pdf but they don't store the hash on the table and that's really a shame because doing so really unlocks the best optimizations for RH. I'm working on optimizing the current implementation, for instance https://github.com/rust-lang/rust/pull/36692

With what? I'm not aware of any plans to change the default hashing algorithm.

Performance is a concern, but not at the cost of security (in this particular case).

We have verified the bit distribution and avalanche properties of HighwayHash. A formal security analysis is pending publication, though new cryptanalysis tools may still need to be developed for further analysis.

It might be worth waiting a bit before jumping the gun, here.

And in any case, it would not solve the issue at hand. The collision attacks here come not from a weak hash, but from the ability to observe the distribution of keys in the buckets of two maps using a similar distribution and injecting a cluster from one into the cluster (in a close enough position) of the other.

Changing the hash algorithm does not solve the issue. Seeding it differently might (if it is strong).

Assuming that the computer even have (SSE4/AVX2) the crossing point seems to be at 10, so a sizable amount of real word cases would actually be slower.

And std lib actually uses the sip13 variant, so that crossing point would be even larger.

1 Like

I’ve posted a summary comment with the discussion of the libs team and our thoughts on how to proceed here.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.