Help harden HashMap in libstd!

briansmith · October 3, 2016, 6:08am

[quote=“dobenour, post:19, topic:4138, full:true”] @briansmith please check this, but I think that incrementing the seed to SipHash (not the key) would solve the problem, inasmuch as as SipHash is considered to be a strong PRF (pseudorandom function) and that is not affected by any of these attacks. Since SipHash is a PRF, changing even one bit of the input completely changes the output. By “seed”, I mean the common prefix that is used by all hasher invocations.[/quote]

Sounds plausible.

It seems strange to me to add this extra complexity to every single hash table, just in case merges another hash table with attacker-controlled contents into it. I’m curious what other implementations do or don’t do to avoid this.

Ofekmeister · October 5, 2016, 3:21am

Aren’t we going to replace SipHash anyway?

Also, why not cuckoo hashing instead of robin hood? edit: links

basic theory web.stanford.edu/class/cs166/lectures/13/Small13.pdf original paper (.pdf) some concurrent work https://da-data.blogspot.nl/2013/03/optimistic-cuckoo-hashing-for.html

arthurprs · October 5, 2016, 10:59am

@Ofekmeister one just have to prove the Cuckoo benefits and open the PR.

Personally I think chucko hash will perform worse on average. There's this excellent paper with a comprehensive-ish analysis https://infosys.cs.uni-saarland.de/publications/p249-richter.pdf but they don't store the hash on the table and that's really a shame because doing so really unlocks the best optimizations for RH. I'm working on optimizing the current implementation, for instance https://github.com/rust-lang/rust/pull/36692

sfackler · October 5, 2016, 1:29pm

With what? I'm not aware of any plans to change the default hashing algorithm.

matthieum · October 5, 2016, 6:05pm

Performance is a concern, but not at the cost of security (in this particular case).

We have verified the bit distribution and avalanche properties of HighwayHash. A formal security analysis is pending publication, though new cryptanalysis tools may still need to be developed for further analysis.

It might be worth waiting a bit before jumping the gun, here.

And in any case, it would not solve the issue at hand. The collision attacks here come not from a weak hash, but from the ability to observe the distribution of keys in the buckets of two maps using a similar distribution and injecting a cluster from one into the cluster (in a close enough position) of the other.

Changing the hash algorithm does not solve the issue. Seeding it differently might (if it is strong).

arthurprs · October 5, 2016, 6:35pm

Assuming that the computer even have (SSE4/AVX2) the crossing point seems to be at 10, so a sizable amount of real word cases would actually be slower.

And std lib actually uses the sip13 variant, so that crossing point would be even larger.

alexcrichton · October 25, 2016, 9:43pm

I’ve posted a summary comment with the discussion of the libs team and our thoughts on how to proceed here.

alexcrichton · March 25, 2019, 8:27am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help wanted: fast hash maps in std	29	6077	March 25, 2019
A new default Hasher for HashMap? libs	26	6752	December 8, 2019
Pre-RFC: The amortized hashing strategy ideas (deprecated)	18	4075	March 25, 2019
Pre-RFC: bless the `type FastHashMap` pattern	92	3949	March 25, 2019
Global default hash function override for HashMap and HashSet libs	26	1091	August 20, 2024

Help harden HashMap in libstd!

Related topics