Post: optimizing hashmaps even more

yoshuawuyts · May 18, 2021, 10:06pm

Hey all, I wrote a post looking at Rust's hashmap: how it's currently implemented, how we could optimize it further, and directions we could explore to automatically apply these optimizations. Maybe some here might find it interesting. Thanks!

josh · May 18, 2021, 10:36pm

I love this concept. This is exactly the kind of meta-optimization where we make Rust better once and all Rust code benefits.

I'd love to see some prototypes that show how much benefit we could get out of, say, the enum optimization. How many hash tables in the Rust compiler itself have enums as keys, how easily could we try a hack that turns them into simple enum-keyed lookup tables, and could we do a perf run on the result?

CAD97 · May 19, 2021, 12:04am

Doing this kind of optimization is inherently loopy and quite hidden, so I'm kind of worried about both how practical Implementing specialization capable of monomorphizing over use cases would be, and how nonlocal the reasoning is. Rust generally prefers to do what the user asked, how the user asked (e.g. standard types never doing small struct optimization), so while statically switching between enum lookup tables, phf, mixed phf, and fully dynamic tables would keep the API surface identical (and theoretically could keep stack size identical), I'm worried about that being a scarily nonlocal choice and hidden from the developer.

Of course, the crates ecosystem already has some good options if you know your case is statically bounded ahead of time. enum-map provides an easy enum lookup array, and phf provides an immutable perfect hash map. I don't know of, so there's definitely space for, any mutable PHF or mixed PHF off-the-shelf option available. (That said, rolling a mixed PHF with enum-map and a hash map doesn't seem overly onerous, either.)

That said, this does definitely look like an interesting avenue for experimentation.

kornel · May 19, 2021, 10:18am

Rust's HashMap type is tied to a specific hasher type, so the compiler is not allowed to change it to PHF. There's also a problem with usual hashers having a random state, so the hash can't be pre-computed at compile time. So this optimization may be possible in theory, but the specific libstd type has opted out of it already.

But I imagine if you had something like HashMap<k, V, StaticHasher>, then it should be possible to at least precompute hashes at compile time, so that get("string") would optimize to get_hashed(magic_constant, "string") (the string still needed for Eq).

Or perhaps you could make your own HashMap type with const fn get_const or such.

yoshuawuyts · May 19, 2021, 3:27pm

Oh yeah for sure, I initially mentioned this in the post but ended up taking it out. I would imagine this type of optimization would only be possible if the hasher opted into it. I think it would be possible to enable this for Rust's default hasher [1], but you're right we couldn't just enable this for all hashers.

[1]: As I understand it Rust's default hasher doesn't assure a specific algorithm, but instead assures the properties of DDoS resistance + generally good performance. As long as our optimizations don't violate guarantees we've made, I believe we should be able to optimize away!

system · August 17, 2021, 3:28pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Low-latency hashing libs	11	1525	December 18, 2024
Help wanted: fast hash maps in std	29	6293	March 25, 2019
Unstable hash architecture ideas (deprecated)	13	3427	March 25, 2019
Map! macro as a HashMap counter of vec! language design	16	10442	August 30, 2022
Global default hash function override for HashMap and HashSet libs	26	1611	August 20, 2024

Post: optimizing hashmaps even more

Related topics