Global default hash function override for HashMap and HashSet

It has been a while but I seem to remember that ::new no longer works, you have to call ::with_hasher etc.

And of course, if a dependency are using a hash map, you now depend on them exposing a way of selecting it.

Each instance of a hashmap has different requirements wrt speed and security, so I don't think a global flag is a solution. When pulling in a library, it's not possible to know whether resistance is required for security, or even for correctness.

We need crate authors to think about hashes, and I think they (including myself) currently don't, because the ergonomics are bad. HashMap doesn't make you think about it, and pulling in an extra crate to get performance that you might not even care about yourself is not something that authors will do. Similarly for clippy lints, as soon as it's opt-in, it will not happen across the ecosystem.

The only path forward I'm seeing is to add a faster hasher to std, and to remove the default type parameter in an edition bump. For convenience there could be type aliases like SipHashMap and FxHashMap or something.

3 Likes

I don't think that quite solves the problem, because:

  1. library authors don't know how their library is going to be used. Maybe the library is going to be exposed to the internet, maybe it will be used in an off-line desktop app with trusted data, or maybe it will be used only at build time.

  2. Being agnostic about hasher is tedious. It proliferates generic arguments, and makes type inference flaky.

  3. So in practice it'd be easiest and safest to just import SipHashMap and we're back to square one.

Another subtle issue is that using a globally configured hasher is going to be a not insignificant but to performance, if it isn't done by propagating a S: BuildHasher generic around. Specifically, it has to go through some kind of dynamic dispatch like dyn Hasher, which blocks inlining of the cheap hash function you want to be used.

I was wondering about that earlier, but I assumed it had already been solved for the global allocator, since presumably we don't want to do dynamic dispatch there either? Or is allocation a blocker for inlining? Seems we might want to solve that anyway, especially if LTO doesn't handle it properly.

It's not fully dynamic dispatch, since it's fixed at link time, so LTO could theoretically do some further inlining. (Essentially, each method on GlobalAlloc gets an extern fn defined by #[global_allocator] which is used to perform allocation.) However, allocation is still treated very specially.

Global allocation isn't a library call, it's a fundamental operation provided by the language runtime. This is required in order to justify optimizations such as removing allocations; allocation is not an observable effect of the program. What this means exactly is still an open question.

And global allocation also benefits from the fact that all of its state is global. Whereas with hashers, even ignoring BuildHasher itself, you have state local to each hash_one operation that needs to be shaped the same if it's going to be compiled non-generic.

This doesn't want for a #[global_hasher], it wants for an implicit generic S: BuildHasher so that code can be monomorphic over the hasher without the syntactic overhead.

1 Like