Pre-RFC: bless the `type FastHashMap` pattern

sfackler · November 29, 2018, 1:39am

Nope

Tom-Phinney · November 29, 2018, 2:59am

Why not put both the pros and cons together in the name: InsecureFastHashMap?

birkenfeld · November 29, 2018, 6:29am

I don't understand this claim that sort_unstable is a negative name. "unstable" is correct to use here, because it's exactly the technical term that describes the behavior.

In contrast, "Insecure" implies all kinds of things to different people, so it's pointless scaremongering. Just imagine the benchmark/language comparison entries, of course using InsecureHashMap. "I thought Rust was safe, why does it require insecurity for its HashMap?"

I really don't get the "let's make it as annoying as possible to use this" mindset of some replies here.

bovine_buddha · November 29, 2018, 7:05am

Hi, first post here on internals, just couldn’t resist the bikeshedding.

How about TrustedHashMap<K,V>, and HashMap::new_trusted? To denote the use case: Use this only for trusted code.

josh · November 29, 2018, 8:31am

Could we fix that, please? Use fast hashing until enough collisions occur and the table becomes imbalanced, and then switch to secure hashing?

Then we get the best of both worlds in every case: speed and safety.

H2CO3 · November 29, 2018, 10:29am

That's not the mindset. The mindset is that giving the faster but not DOS-resistant alternative a more positive or positive only name will encourage people to pick it, because it communicates that "this is unconditionally better". Users will think along the lines of "Why wouldn't I just always use the faster one?" – and sure they will.

The point is not to make it "as annoying as possible" – please don't put words in our mouth. The point is to be very clear about the fact that speed comes with a trade-off in security.

That's a problem with any sort of natural language expression used in programming, there's no way we can get around it. "Function", "trait", "structure", and even "unsafe" mean different things to different people. The point isn't to come up with a name which can communicate every single minute usage detail of the type at first glance without reading any of its documentation. Rather, its purpose should be that people notice that it's not something they can/should use as a default without further consideration.

Just like most users, benchmark authors shouldn't use InsecureFastHashMap by default, or at least not exclusively. They should ideally show the performance of both the slower but DOS-resistant and the faster but prone-to-attacks type, noting the difference.

kornel · November 29, 2018, 11:04am

WeakMap has a different meaning, but maybe WeakerHashMap would work? A reference is either weak or not, so gradation makes the term not apply to references.

felix.s · November 29, 2018, 11:41am

Maybe DetmHashMap, for ‘deterministic’? Gankro’s proposal already specifies that determinism should be mentioned in public documentation, and the term is not loaded with conspicuous affect, so it’s not immediately encouraging (or discouraging). Someone familiar with the relevant security issues will see the word ‘deterministic’ and understand the trade-off involved immediately; someone unfamiliar with them will be encouraged to look it up the documentation (where I hope it will be sufficiently explained).

Tom-Phinney · November 29, 2018, 3:08pm

Since the current one is HashMap, why not call the other what it is: DosProneFasterHashMap? That way both the performance advantage and the class of security susceptibility are both made apparent at the single point where the exact type is selected.

djc · November 29, 2018, 7:26pm

Also a bit off-topic, but if we start adding more things to prelude, maybe it would go about it in a more structured fashion (doing something like crater to analyze what things people typically import from std) rather than being haphard about it (although I do agree that HashMap/HashSet are attractive candidates – and the data would probably show it, too).

Diggsey · November 29, 2018, 8:07pm

What about TrustedHashMap? I think this strikes a good balance in that it’s not too negative, whilst being descriptive about what the limitations are - you should only fill it with keys from a trusted source.

RustyYato · November 29, 2018, 8:12pm

I see TrustedHashMap as positive, as in this is a trustworthy HashMap, so it seems a bit off to me.

Soni · November 29, 2018, 8:36pm

OverexertingHashMap? This indicates that it’s faster, but also harmful.

SimonSapin · November 29, 2018, 9:12pm

That would risk creating performance cliffs: a program could perform well until the input data reaches some hard-to-predict criteria, then it "falls off the cliff" and everything becomes slow(er) for no apparent reason.

josh · November 29, 2018, 9:49pm

It’s not a hard-to-predict criteria; it happens if the table becomes wildly imbalanced, which will only happen if someone is actively attempting to exploit the hash function.

And even after the switch to a secure hash function, the algorithmic speed would remain the same, and the constant factors would just go up.

Even if we want a hash table that always uses insecure hashing, it makes sense for the default secure hash table to optimistically try to go fast as long as it can.

nematsakis · November 30, 2018, 5:50am

Not all programmers think like that; that kind of thinking comes from experience. Others may simply assume that this was a later addition to the standard library that improved upon the performance of the original implementation. This happens all the time in mature language standard librarires (I'm thinking of things like Java's "new io" or python's "urllib2").

I strongly agree with the sentiment that a positive name would tend towards misuse. It's a hard naming problem, but i like the notion of including collision in the name, like CollisionVulnerableHashMap or CollisionSusceptibleHashMap but those are really long names that I'm not seriously suggesting.

matklad · November 30, 2018, 9:04am

I am against “discouragement via unwieldy name”. I think it is reasonable to assume that folks read the docs. There are much simpler ways to introduce DOS vulnerability then a wrong hash-table, for example, one can use an accidentally quadratic algorithm, for example. So, to have a hope of DOS-resistant app, we need to assume some amount of “know what you are doing” already.

The problem of a small fraction of people who don’t read the docs should be solved by inlining the docs into the names of things, at the expense of the majority who does read documentation. It should be solved by making docs more discoverable, by having a culture of understanding things, etc etc.

What I am worried about, though, is that folks who do know what they are doing would pick Fast by default, because that’s an easier choice to make, all else being equal. I centrally can see me picking Fast because “nah, this piece of code will never experience the Internet” in cases where performance does not actually matter. That’s why I like “longer to import” form of discouragement: it makes you to do a conscious decision when picking Fast, but does not strain your eyes when reading the code.

SimonSapin · November 30, 2018, 9:31am

If we’re really that confident that it can never happen accidentally, then indeed the perf cliff is not really a concern.

glaebhoerl · November 30, 2018, 10:41am

So am I, but I don't think this is what most people were suggesting. What would be desirable is clarity about (at least the existence of) tradeoffs or downsides, as opposed to just "fast". The challenge is how to accomplish that without it being unwieldy.

leonardo · November 30, 2018, 10:45am

HashMap2? LightHashMap?

Topic		Replies	Views
Help wanted: fast hash maps in std	29	6077	March 25, 2019
A new default Hasher for HashMap? libs	26	6752	December 8, 2019
Support turning off hashmap randomness language design	16	2198	January 21, 2024
Global default hash function override for HashMap and HashSet libs	26	1091	August 20, 2024
Help harden HashMap in libstd! libs	27	4383	March 25, 2019

Pre-RFC: bless the `type FastHashMap` pattern

Related topics