Post: optimizing hashmaps even more

Hey all, I wrote a post looking at Rust's hashmap: how it's currently implemented, how we could optimize it further, and directions we could explore to automatically apply these optimizations. Maybe some here might find it interesting. Thanks!

8 Likes

I love this concept. This is exactly the kind of meta-optimization where we make Rust better once and all Rust code benefits.

I'd love to see some prototypes that show how much benefit we could get out of, say, the enum optimization. How many hash tables in the Rust compiler itself have enums as keys, how easily could we try a hack that turns them into simple enum-keyed lookup tables, and could we do a perf run on the result?

1 Like

Doing this kind of optimization is inherently loopy and quite hidden, so I'm kind of worried about both how practical Implementing specialization capable of monomorphizing over use cases would be, and how nonlocal the reasoning is. Rust generally prefers to do what the user asked, how the user asked (e.g. standard types never doing small struct optimization), so while statically switching between enum lookup tables, phf, mixed phf, and fully dynamic tables would keep the API surface identical (and theoretically could keep stack size identical), I'm worried about that being a scarily nonlocal choice and hidden from the developer.

Of course, the crates ecosystem already has some good options if you know your case is statically bounded ahead of time. enum-map provides an easy enum lookup array, and phf provides an immutable perfect hash map. I don't know of, so there's definitely space for, any mutable PHF or mixed PHF off-the-shelf option available. (That said, rolling a mixed PHF with enum-map and a hash map doesn't seem overly onerous, either.)

That said, this does definitely look like an interesting avenue for experimentation.

2 Likes

Rust's HashMap type is tied to a specific hasher type, so the compiler is not allowed to change it to PHF. There's also a problem with usual hashers having a random state, so the hash can't be pre-computed at compile time. So this optimization may be possible in theory, but the specific libstd type has opted out of it already.

But I imagine if you had something like HashMap<k, V, StaticHasher>, then it should be possible to at least precompute hashes at compile time, so that get("string") would optimize to get_hashed(magic_constant, "string") (the string still needed for Eq).

Or perhaps you could make your own HashMap type with const fn get_const or such.

Oh yeah for sure, I initially mentioned this in the post but ended up taking it out. I would imagine this type of optimization would only be possible if the hasher opted into it. I think it would be possible to enable this for Rust's default hasher [1], but you're right we couldn't just enable this for all hashers.

[1]: As I understand it Rust's default hasher doesn't assure a specific algorithm, but instead assures the properties of DDoS resistance + generally good performance. As long as our optimizations don't violate guarantees we've made, I believe we should be able to optimize away!

3 Likes