Global default hash function override for HashMap and HashSet

Vorpal · May 16, 2024, 8:05pm

It has been a while but I seem to remember that ::new no longer works, you have to call ::with_hasher etc.

And of course, if a dependency are using a hash map, you now depend on them exposing a way of selecting it.

robertbastian · May 21, 2024, 1:27pm

Each instance of a hashmap has different requirements wrt speed and security, so I don't think a global flag is a solution. When pulling in a library, it's not possible to know whether resistance is required for security, or even for correctness.

We need crate authors to think about hashes, and I think they (including myself) currently don't, because the ergonomics are bad. HashMap doesn't make you think about it, and pulling in an extra crate to get performance that you might not even care about yourself is not something that authors will do. Similarly for clippy lints, as soon as it's opt-in, it will not happen across the ecosystem.

The only path forward I'm seeing is to add a faster hasher to std, and to remove the default type parameter in an edition bump. For convenience there could be type aliases like SipHashMap and FxHashMap or something.

kornel · May 22, 2024, 3:01pm

I don't think that quite solves the problem, because:

library authors don't know how their library is going to be used. Maybe the library is going to be exposed to the internet, maybe it will be used in an off-line desktop app with trusted data, or maybe it will be used only at build time.
Being agnostic about hasher is tedious. It proliferates generic arguments, and makes type inference flaky.
So in practice it'd be easiest and safest to just import SipHashMap and we're back to square one.

CAD97 · May 22, 2024, 6:26pm

Another subtle issue is that using a globally configured hasher is going to be a not insignificant but to performance, if it isn't done by propagating a S: BuildHasher generic around. Specifically, it has to go through some kind of dynamic dispatch like dyn Hasher, which blocks inlining of the cheap hash function you want to be used.

Vorpal · May 22, 2024, 8:11pm

I was wondering about that earlier, but I assumed it had already been solved for the global allocator, since presumably we don't want to do dynamic dispatch there either? Or is allocation a blocker for inlining? Seems we might want to solve that anyway, especially if LTO doesn't handle it properly.

CAD97 · May 22, 2024, 11:59pm

It's not fully dynamic dispatch, since it's fixed at link time, so LTO could theoretically do some further inlining. (Essentially, each method on GlobalAlloc gets an extern fn defined by #[global_allocator] which is used to perform allocation.) However, allocation is still treated very specially.

Global allocation isn't a library call, it's a fundamental operation provided by the language runtime. This is required in order to justify optimizations such as removing allocations; allocation is not an observable effect of the program. What this means exactly is still an open question.

And global allocation also benefits from the fact that all of its state is global. Whereas with hashers, even ignoring BuildHasher itself, you have state local to each hash_one operation that needs to be shaped the same if it's going to be compiled non-generic.

This doesn't want for a #[global_hasher], it wants for an implicit generic S: BuildHasher so that code can be monomorphic over the hasher without the syntactic overhead.

system · August 20, 2024, 11:59pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help wanted: fast hash maps in std	29	6155	March 25, 2019
Pre-RFC: bless the `type FastHashMap` pattern	92	4169	March 25, 2019
Pre-RFC: The amortized hashing strategy ideas (deprecated)	18	4097	March 25, 2019
A new default Hasher for HashMap? libs	26	6979	December 8, 2019
Make Hasher portable? (mini RFC) libs	37	2764	March 25, 2019

Global default hash function override for HashMap and HashSet

Related topics