Help harden HashMap in libstd!

#21

[quote=“dobenour, post:19, topic:4138, full:true”] @briansmith please check this, but I think that incrementing the seed to SipHash (not the key) would solve the problem, inasmuch as as SipHash is considered to be a strong PRF (pseudorandom function) and that is not affected by any of these attacks. Since SipHash is a PRF, changing even one bit of the input completely changes the output. By “seed”, I mean the common prefix that is used by all hasher invocations.[/quote]

Sounds plausible.

It seems strange to me to add this extra complexity to every single hash table, just in case merges another hash table with attacker-controlled contents into it. I’m curious what other implementations do or don’t do to avoid this.

1 Like
#22

Aren’t we going to replace SipHash anyway?

Also, why not cuckoo hashing instead of robin hood? edit: links

basic theory web.stanford.edu/class/cs166/lectures/13/Small13.pdf original paper (.pdf) some concurrent work https://da-data.blogspot.nl/2013/03/optimistic-cuckoo-hashing-for.html

#23

@Ofekmeister one just have to prove the Cuckoo benefits and open the PR.

Personally I think chucko hash will perform worse on average. There’s this excellent paper with a comprehensive-ish analysis https://infosys.cs.uni-saarland.de/publications/p249-richter.pdf but they don’t store the hash on the table and that’s really a shame because doing so really unlocks the best optimizations for RH. I’m working on optimizing the current implementation, for instance https://github.com/rust-lang/rust/pull/36692

#24

With what? I’m not aware of any plans to change the default hashing algorithm.

#25

Performance is a concern, but not at the cost of security (in this particular case).

We have verified the bit distribution and avalanche properties of HighwayHash. A formal security analysis is pending publication, though new cryptanalysis tools may still need to be developed for further analysis.

It might be worth waiting a bit before jumping the gun, here.

And in any case, it would not solve the issue at hand. The collision attacks here come not from a weak hash, but from the ability to observe the distribution of keys in the buckets of two maps using a similar distribution and injecting a cluster from one into the cluster (in a close enough position) of the other.

Changing the hash algorithm does not solve the issue. Seeding it differently might (if it is strong).

#26

Assuming that the computer even have (SSE4/AVX2) the crossing point seems to be at 10, so a sizable amount of real word cases would actually be slower.

And std lib actually uses the sip13 variant, so that crossing point would be even larger.

1 Like
#27

I’ve posted a summary comment with the discussion of the libs team and our thoughts on how to proceed here.

closed #28

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.