Pre-RFC: bless the `type FastHashMap` pattern

JeffBurdges · November 30, 2018, 12:07pm

AttackableHashMap is a short non-abbreviation that clearly captures the problem.

I think VulnerableHashMap and InsecueHashMap sound fine too. And the abbreviations NonCRHashMap works too.

WeakerHashMap works but communicates less. Fast is wrong word because collision resistance keeps the map fastish under DoS. Unsafe is wrong word because unsafe refers to memory unsafety.

pachi · November 30, 2018, 12:31pm

IMHO, NonCRHashMap is very nice. It is short enough and still makes you wonder (go to the docs) what does that ‘Non’ prefix means in terms of missing functionality, and why would you choose something that lacks such property?

alexheretic · November 30, 2018, 12:51pm

It’d be great to bless the use of FxHashMap. I’d suggest deprecating fn new(), replacing with fn new_secure() & fn new_fast(). Along with type aliases SecureHashMap & FastHashMap.

This way you state a preference and the constructor names clearly state the intended benefit. It answers the question “why would I not use ‘fast’?” - because it isn’t as secure, “why would I not use ‘secure’?” - because it isn’t as fast.

Or just use HashMap::default() if you don’t care.

Gankra · November 30, 2018, 1:43pm

We are absolutely not deprecating new.

alexheretic · November 30, 2018, 1:50pm

Well the idea still works even keeping new. Basically having secure alongside fast in autocomplete, each suggests the reason for the other.

I’d still mark new for removal in the next edition, but it isn’t the crux of the argument.

JeffBurdges · November 30, 2018, 2:18pm

Anytime you write HashMap<K,V> you actually get HashMap<K,V,RandomState>, which then assures collision resistance in the type system. We’d need some new AttackableState: BuildHasher, which works exactly like RandomState but uses a faster hasher, like:

pub struct AttackableState(RandomState);
impl BuildHasher for AttackableState {
    type Hasher = AttackableHasher;
    fn build_hasher(&self) -> AttackableHasher {
        AttackableHasher(FxHasher::new_with_keys(self.0.k0, self.0.k1))
    }
}

type AttackableHashMap<K,V> = HashMap<K,V,AttackableState>;
type AttackableHashSet<K> = HashSet<K,AttackableState>;

or whatever color gets selected.

If I understand, you’re proposing to distinguish between these types with inherent methods, like:

impl HashMap<K,V,RandomState> {
    pub new_secure() -> HashMap<K,V,AttackableState> { HashMap::new() }
}
impl HashMap<K,V,AttackableState> {
    pub new_attackable() -> HashMap<K,V,AttackableState> { HashMap::new() }
}

newpavlov · November 30, 2018, 2:29pm

I like WeakHashMap, it is short enough, has clear negative connotation, can be explained as “weak hash” + “map”, and re-uses Java name, so it will be immediately familiar to some. I don’t think that similarity to WeakMap from JS is a big issue, and it certainly does not warrant long unwieldy names.

alexheretic · November 30, 2018, 2:38pm

WeakHashMap in Java the weakness refers to the garbage collectability of the keys. In rust “weak” is used the same way in Rc. So it isn’t ideal.

Ixrec · November 30, 2018, 2:47pm

I’m strongly of the opinion that any name involving “weak” is a non-starter here, because “weak” already refers to weak ownership in C++, Java, C#, Haskell, Kotlin, JavaScript, and Rust itself (and apparently D doesn’t have this by any name). Javascript was merely following existing usage when they picked that name.

newpavlov · November 30, 2018, 3:06pm

Yes, collision with rc::Weak will be unfortunate, but I think “weak” is a too common word for reserving it just for this use-case. Other alternative could be VulnHashMap, it has the same length, and reading “vuln” as “vulnarable” shouldn’t be an issue. We could use more explicit VulnarableHashMap, but it’s a bit too unwieldy for my taste.

scottmcm · November 30, 2018, 8:48pm

As another example, in dotnet you almost always want SemaphoreSlim, not Semaphore, as the latter is actually an inter-process OS-level thing. I would absolutely expect that FastHashMap is what one should use by default, and the HashMap some legacy thing (perhaps because of API differences).

This is especially true as the edition release is likely to re-emphasize that we have a migration story for language features, but not for the library, so new things getting a name that implies you should use them over the other is absolutely something I'd expect for standard library evolution.

Soni · November 30, 2018, 8:53pm

LinearHashMap?

It’s obviously faster if you only have 3 elements. The Hasher should just be return 0.

cuviper · November 30, 2018, 8:58pm

Maybe something like SimpleHashMap? Implying that “simple” code is probably faster, but may lead folks to investigate the drawbacks (collision resistance).

Soni · November 30, 2018, 8:59pm

Nope, simpler is usually better for usability and that alone makes it preferred.

cuviper · November 30, 2018, 9:26pm

Other synonyms along that line are possible too, like NaiveHashMap.

Soni · November 30, 2018, 9:31pm

That sounds like it’d be slower.

cuviper · November 30, 2018, 9:51pm

Well, if there are many collisions, it will be slower.

Tom-Phinney · December 1, 2018, 2:53am

That choice also raises the problem of the diaeresis that should be over the i of Naïve to indicate that the ai is not a diphthong.

Soni · December 1, 2018, 3:00am

Use Naiive instead?

cuviper · December 1, 2018, 3:06am

Plain “naive” is the more common English spelling, but if you think that’s troublesome, we can keep looking…

Topic		Replies	Views
Help wanted: fast hash maps in std	29	6204	March 25, 2019
Global default hash function override for HashMap and HashSet libs	26	1473	August 20, 2024
Help harden HashMap in libstd! libs	27	4479	March 25, 2019
Pre-RFC: The amortized hashing strategy ideas (deprecated)	18	4120	March 25, 2019
A new default Hasher for HashMap? libs	26	7040	December 8, 2019

Pre-RFC: bless the `type FastHashMap` pattern

Related topics