Why not put both the pros and cons together in the name:
I don’t understand this claim that
sort_unstable is a negative name. “unstable” is correct to use here, because it’s exactly the technical term that describes the behavior.
In contrast, “Insecure” implies all kinds of things to different people, so it’s pointless scaremongering. Just imagine the benchmark/language comparison entries, of course using
InsecureHashMap. “I thought Rust was safe, why does it require insecurity for its HashMap?”
I really don’t get the “let’s make it as annoying as possible to use this” mindset of some replies here.
Hi, first post here on internals, just couldn’t resist the bikeshedding.
How about TrustedHashMap<K,V>, and HashMap::new_trusted? To denote the use case: Use this only for trusted code.
Could we fix that, please? Use fast hashing until enough collisions occur and the table becomes imbalanced, and then switch to secure hashing?
Then we get the best of both worlds in every case: speed and safety.
That’s not the mindset. The mindset is that giving the faster but not DOS-resistant alternative a more positive or positive only name will encourage people to pick it, because it communicates that “this is unconditionally better”. Users will think along the lines of “Why wouldn’t I just always use the faster one?” – and sure they will.
The point is not to make it “as annoying as possible” – please don’t put words in our mouth. The point is to be very clear about the fact that speed comes with a trade-off in security.
That’s a problem with any sort of natural language expression used in programming, there’s no way we can get around it. “Function”, “trait”, “structure”, and even “unsafe” mean different things to different people. The point isn’t to come up with a name which can communicate every single minute usage detail of the type at first glance without reading any of its documentation. Rather, its purpose should be that people notice that it’s not something they can/should use as a default without further consideration.
Just like most users, benchmark authors shouldn’t use
InsecureFastHashMap by default, or at least not exclusively. They should ideally show the performance of both the slower but DOS-resistant and the faster but prone-to-attacks type, noting the difference.
WeakMap has a different meaning, but maybe
WeakerHashMap would work? A reference is either weak or not, so gradation makes the term not apply to references.
DetmHashMap, for ‘deterministic’? Gankro’s proposal already specifies that determinism should be mentioned in public documentation, and the term is not loaded with conspicuous affect, so it’s not immediately encouraging (or discouraging). Someone familiar with the relevant security issues will see the word ‘deterministic’ and understand the trade-off involved immediately; someone unfamiliar with them will be encouraged to look it up the documentation (where I hope it will be sufficiently explained).
Since the current one is
HashMap, why not call the other what it is:
DosProneFasterHashMap? That way both the performance advantage and the class of security susceptibility are both made apparent at the single point where the exact type is selected.
Also a bit off-topic, but if we start adding more things to prelude, maybe it would go about it in a more structured fashion (doing something like crater to analyze what things people typically import from
std) rather than being haphard about it (although I do agree that
HashSet are attractive candidates – and the data would probably show it, too).
TrustedHashMap? I think this strikes a good balance in that it’s not too negative, whilst being descriptive about what the limitations are - you should only fill it with keys from a trusted source.
TrustedHashMap as positive, as in this is a trustworthy HashMap, so it seems a bit off to me.
OverexertingHashMap? This indicates that it’s faster, but also harmful.
That would risk creating performance cliffs: a program could perform well until the input data reaches some hard-to-predict criteria, then it “falls off the cliff” and everything becomes slow(er) for no apparent reason.
It’s not a hard-to-predict criteria; it happens if the table becomes wildly imbalanced, which will only happen if someone is actively attempting to exploit the hash function.
And even after the switch to a secure hash function, the algorithmic speed would remain the same, and the constant factors would just go up.
Even if we want a hash table that always uses insecure hashing, it makes sense for the default secure hash table to optimistically try to go fast as long as it can.
Not all programmers think like that; that kind of thinking comes from experience. Others may simply assume that this was a later addition to the standard library that improved upon the performance of the original implementation. This happens all the time in mature language standard librarires (I’m thinking of things like Java’s “new io” or python’s “urllib2”).
I strongly agree with the sentiment that a positive name would tend towards misuse. It’s a hard naming problem, but i like the notion of including collision in the name, like
CollisionSusceptibleHashMap but those are really long names that I’m not seriously suggesting.
I am against “discouragement via unwieldy name”. I think it is reasonable to assume that folks read the docs. There are much simpler ways to introduce DOS vulnerability then a wrong hash-table, for example, one can use an accidentally quadratic algorithm, for example. So, to have a hope of DOS-resistant app, we need to assume some amount of “know what you are doing” already.
The problem of a small fraction of people who don’t read the docs should be solved by inlining the docs into the names of things, at the expense of the majority who does read documentation. It should be solved by making docs more discoverable, by having a culture of understanding things, etc etc.
What I am worried about, though, is that folks who do know what they are doing would pick
Fast by default, because that’s an easier choice to make, all else being equal. I centrally can see me picking
Fast because “nah, this piece of code will never experience the Internet” in cases where performance does not actually matter. That’s why I like “longer to import” form of discouragement: it makes you to do a conscious decision when picking
Fast, but does not strain your eyes when reading the code.
If we’re really that confident that it can never happen accidentally, then indeed the perf cliff is not really a concern.
So am I, but I don’t think this is what most people were suggesting. What would be desirable is clarity about (at least the existence of) tradeoffs or downsides, as opposed to just “fast”. The challenge is how to accomplish that without it being unwieldy.