HashMap/Set new/with_capacity and BuildHasher

What we want to accomplish

Say I have made a fast hashing function and I want to create a FastHashMap type alias that people can use as a drop-in replacement for std::collection::HashMap. The natural (and desireable) syntax for this would be:

pub type FastHashMap<K, V> = HashMap<K, V, FastHasher>;

(see for instance nohash_hasher::IntMap - Rust and fxhash::FxHashMap - Rust)

The problem

Consider this:

fn main() {
    let my_map = FastHashMap::with_capacity(32);
    expects_my_hash_map(my_map);
}
fn expects_my_hash_map(_map: FastHashMap<i32, f32>) {}

This fails with the following error message:

 1  error[E0599]: no function or associated item named `with_capacity` found for struct `HashMap<i32, f32, FastHasher>` in the current scope
   --> src/main.rs:29:29
    |
 29 |     let my_map = FastHashMap::with_capacity(32);
    |                             ^^^^^^^^^^^^^ function or associated item not found in `HashMap<i32, f32, FastHasher>`
    |
    = note: the function or associated item was found for
            - `HashMap<K, V>`

This is because HashMap::new and with_capacity are defined so:

impl<K, V> HashMap<K, V, RandomState> {
    pub fn new() -> HashMap<K, V, RandomState> {
        Default::default()
    }
    pub fn with_capacity(capacity: usize) -> HashMap<K, V, RandomState> {
        HashMap::with_capacity_and_hasher(capacity, Default::default())
    }
}

This means that new/with_capacity is incompatible with other BuildHasher:s. In other words, our FastHashMap cannot be used as a drop-in replacement like we would hope.

See also:

Solution that doesn't work

Consider this hypothetical change to std:

impl<K, V, S=RandomState> HashMap<K, V, S>
    where S: Default
{
    pub fn new() -> HashMap<K, V, S> {
        Default::default()
    }
    pub fn with_capacity(capacity: usize) -> HashMap<K, V, S> {
        HashMap::with_capacity_and_hasher(capacity, Default::default())
    }
}

This is not legal rust:

error: defaults for type parameters are only allowed in `struct`, `enum`, `type`, or `trait` definitions

    = note: `#[deny(invalid_type_param_default)]` on by default
    = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
    = note: for more information, see issue #36887 <https://github.com/rust-lang/rust/issues/36887>

Solution that does work

Consider this extension trait:

trait HashMapExt {
    fn new() -> Self;
    fn with_capacity(x: usize) -> Self;
}

impl<K, V, S> HashMapExt for HashMap<K, V, S>
where
    K: Hash + Eq,
    S: BuildHasher + Default,
{
    fn new() -> Self {
        HashMap::with_hasher(S::default())
    }

    fn with_capacity(capacity: usize) -> Self {
        HashMap::with_capacity_and_hasher(capacity, S::default())
    }
}

By importing this trait into your code, now the original problem is solved:

fn main() {
    let std_map = HashMap::with_capacity(32);
    expects_std_hash_map(std_map);

    let my_map = FastHashMap::with_capacity(32);
    expects_my_hash_map(my_map);
}
fn expects_std_hash_map(_map: HashMap<i32, f32>) {}
fn expects_my_hash_map(_map: FastHashMap<i32, f32>) {}

HashMapExt could be made part of the std prelude. It would not break any existing code, but would enable making it easier to make drop-in replacements for std::collection::HashMap that work with new and with_capacity.

1 Like

I think it'd actually be really nice to support the non-working code you proposed. If you follow through the linked issue in the compiler output, you'll come across a comment talking about very similar code, and how it doesn't really fall into the difficulties issue #36887 was raised to avoid (wrt type inference).

Here's the link to the comments I think are applicable, though I'll inline their responses here. https://github.com/rust-lang/rust/issues/36887#issuecomment-803536903

So with all that said, I think we really might want to consider making default types available to impl blocks where they don't appear as a parameter to method (or related to any parameter types). I think it'd be useful more than just here, provided that turning something like the with_capacity from the std into the one you provided doesn't have backwards compatibility issues. (Because if that transformation isn't backwards compatible then maybe we do need to go the trait route anyway?)

IMO, your HashMapExt::new() is sufficiently covered by just using HashMap::default(), where Default is already in the prelude. For with_capacity, I could imagine even having a WithCapacity trait for any collection, but there has been some resistance to collection traits in the past.

Sorry for the dumb question, but why does HashMap::new/with_capacity depend on RandomState specifically? Why can't the impl just use impl<K, V, S: Default> HashMap<K, V, S> (no reference to RandomState)?

1 Like

If I understand it correctly, this will break code using HashMap::new() because the compiler won't be able to infer what S is. (Unless the user specified it, which is unlikely)

For example:

fn main()
{
    let foo = HashMap::new();
    foo.insert(1, 1);
    //Compiler can infer K and V from usage, but not S
}
4 Likes

From what I understand, while Rust allows default types in generic parameters for impl blocks, they are ignored (on stable Rust). So impl<K, V, S: Default> HashMap<K, V, S> is equivalent to impl<K, V, S> HashMap<K, V, S>

Edit: I totally misread that, thanks. It doesn't work well because the compiler can't infer a default hasher and you have to explicitly request the hasher type somewhere.

That's not a default type but a trait bound.

Oh, you're right. Thank you :sweat_smile: In that case it doesn't work because it can't infer a default, and you'd have to be explicit about what type of hasher you want.

To illustrate, here's an example: Rust Playground

#[derive(Default)]
struct DefaultType;
#[derive(Default)]
struct AnotherType;

struct TypeWrapper<Inner = DefaultType> {
    inner: Inner,
}

impl <Inner: Default> TypeWrapper<Inner> {
    pub fn new() -> Self {
        TypeWrapper {
            inner: Inner::default()
        }
    }
}

type AlternateWrapper = TypeWrapper<AnotherType>;

fn this_works(){
    let wrapper: TypeWrapper = TypeWrapper::new();
}

fn this_fails(){
    // Here the compiler can't infer the type of TypeWrapper::Inner
    let wrapper_inference_fail = TypeWrapper::new(); 
}

fn this_fails_too(){
    // The compiler decides that there's a conflict with the inner type.
    let wrapper_inference_conflict: TypeWrapper = AlternateWrapper::new(); 
}

So basically you have to choose the hasher somewhere and it still requires more work to change it :confused:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.