Support comparison policies in HashMap

Currently HashMap and HashSet rely on the inherent implementations of the Eq and Hash traits. I propose moving this behaviour to the S (RandomState) parameter. This change is backwards compatible, but improves flexibility and makes it easier to implement custom comparisons.

The current approach as several downsides:

  1. Customizing comparisons requires implementing a new-type. This is often a clean way to model, it can also be unnecessarily complex. For example just because I want to compare strings case-insensitively, doesn't mean that it's a whole new kind of string.
  2. If you want to do a borrowed lookup using a dynamically sized type (e.g. str), you need to either implement yet another new-type around &str, or unsafely implement wrapping of str.
  3. The comparison is determined at compile-time. Sometimes it's needed to customize comparisons at runtime, for example culture/collation dependent string comparisons.
  4. Hasher constrains hashing implementations to a particular iterative approach. This can hurt performance (I've seen several complaints about this, for example)
  5. Borrow in the signatures of methods is rather confusing for beginners

As an alternative I'd introduce these traits:

pub trait EqPolicy<T: ?Sized, Q: ?Sized = T> {
    fn eq(&self, x: &T, y: &Q) -> bool;
}

pub trait HashPolicy<T: ?Sized> {
    fn hash(&self, x: &T) -> u64;
}

The methods of HashMap should then express their constraints in terms of these traits:

impl<K, V, S> HashMap<K, V, S>
where
    S: EqPolicy<K> + HashPolicy<K>,
{
    pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
    where
        S: EqPolicy<K, Q> + HashPolicy<Q>;

    pub fn insert(&mut self, k: K, v: V) -> Option<V>;
}

This can be made backwards compatible by adding a blanket implementation based on the existing traits.

impl<S, T> HashPolicy<T> for S
where
    S: BuildHasher,
    T: Hash + ?Sized,
{
    fn hash(&self, x: &T) -> u64 {
        self.hash_one(x)
    }
}

impl<S, K, Q> EqPolicy<K, Q> for S
where
    S: BuildHasher,
    K: Borrow<Q> + ?Sized,
    Q: Eq + ?Sized,
{
    fn eq(&self, x: &K, y: &Q) -> bool {
        x.borrow() == y
    }
}

Playground which contains a compilable example, and demonstrates that that the new constraints are compatible with the old constraints.

I'm not sure what the best name for these traits is. Alternatives could be EqComparer and HashCalculator. Hasher is unfortunately already taken.

Similar changes could be done to BTreeMap/BTreeSet, but would require adding an additional generic parameter for the policy.

1 Like

That EqPolicy implementation is a blanket for every type, so it isn’t possible to override it for a new policy type.

Fixed that, and added an example of a customized policy to the playground

Haven’t read your whole post .. but I know you can use a newtype wrapper to override Eq and PartialEq, ergonomics suffer with a few .into() uses but if your storing a type you don’t control in a HashMap your probably better to just upgrade your entire project to your new newtype wrapper anyway

A great usecase for this feature which wasn’t mentioned in the main post is being able to use composite borrowed types as keys when querying. For example if you have (String, String) as key then it’s currently impossible to call get with a (&str, &str), and a newtype with a custom Eq implementation won't help [1]


  1. Technically you can make this work with a newtype and some weird hacks with trait objects, but it's far from ideal or intuitive ↩︎

5 Likes

The underlying hashbrown library (that std uses to implement HashMap/HashSet has Equivalent in hashbrown - Rust

Doesn’t that solve the same issue? Why is this design better than that?

3 Likes

The Equivalent trait helps with the problem that writing a new-type wrapper for borrowed unsized types is hard/unsafe.

But it doesn't get several of the benefits of the approach I suggest:

  • Requires new-types for both the owned and borrowed type
  • Forced to use the Hasher trait.
  • The use-case of runtime determined comparisons (e.g. culture dependent string comparisons) is not possible

So I prefer the abstraction I propose.

2 Likes

In Polars we quite extensively use the HashTable API which is more flexible still. My preference would be to standardize something similar instead of providing yet another policy-tied-to-type data structure which always ends up being too inflexible at some point.

4 Likes