`alloc`-only `HashMap`?

There have been times when I wanted HashMap/HashSet in some code which only has alloc to work with, but the default BuildHasher implementation being std-only prevents that from working.

What stops us from just moving the type definition to alloc and removing the default type parameter there, and providing the default again via a type alias in std? I think it's just methods/trait implementations which explicitly call out the RandomState argument (which only exists in std). These are:

  • new/with_capacity: I see no reason why we can't allow that for any S: Default instead, and this would allow new to be const once const traits happen (this also works for the with-allocator equivalents).
    • I think this shouldn't break existing code type inference since the existing (soon to be re-export) std name for it still has the default argument.
  • From<[(K, V); N]>: This can again be made to use Default like above, but the possibility of .into() into a place which might be generic and relying on it for type inference could cause type inference failures. Definitely at least worth a crater run before committing anything.

So I think it might be possible? I'm not an expert on how type inference works so there might be wider fallout than I'm thinking of.

1 Like

I don't actually know if that works. If it does, then I agree this seems obvious.

(I feel like people think it doesn't, and that's why it hasn't been done already, but maybe the gestalt is wrong and we can just do it?)

That would break inference.

But hold on, how is the compiler inferring S in this snippet?

let mut my_map = HashMap::new();
my_map.insert("hello", "there");
println!("{:?}", my_map);

What possibly determines it? Well, we can find out the hard way by using HashMap::default instead:

let mut my_map = HashMap::default();
my_map.insert("hello", "there");
println!("{:?}", my_map);
error[E0282]: type annotations needed for `HashMap<&str, &str, S>`
--> src/main.rs:4:22
 |
4 |     let mut my_map = HashMap::default();
 |         ----------   ^^^^^^^ cannot infer type for type parameter `S` declared on the struct >`HashMap`
 |         |
 |         consider giving `my_map` the explicit type `HashMap<_, _, S>`, where the type parameter `S` is specified

YUP! HashMap::new hardcoding S = RandomState is load bearing. Any code that creates a short-lived HashMap as part of some algorithm would run into this horrible error about how S is unknown!

An alias is too transparent to help in the expression case on its own.

5 Likes

Well then, RIP my aspirations of this being a quick and easy fix (I think this sort of inference breakage is technically allowed under semver as a minor version bump, but I expect it'd break too much existing code to be worth it).

Is this an area where people are trying to improve the language such that I can wait and this eventually becomes possible? Or is inference never going to be able to handle this change?

Is there a way to not provide HashMap::new in alloc, but provide it in the std re-export somehow?

Having stuff defined only when std or alloc is available already has some problematic behavior. Having stuff defined differently depending on whether you have access to std or not sounds like it would probably only cause more problems.

1 Like

A few alternatives:

Have you considered using the hashbrown crate? It is the crate std uses for its hashmap/hashset implementation. This crate is no_std compatible.

If you don't want to add a new dependency, then you could use BTreeMap and BTreeSet, which do live in the alloc crate.

If you do need HashMap specifically, then you are a bit out of luck.

To be fair, the fact that every no_std crate using maps via alloc needs to use hashbrown is kinda weird. Something that fundamental should be in alloc.

You mean alloc?

2 Likes

no_std crates can use BTreeMap too without issue, so it isn't that big of a deal imo

1 Like

That doesn't help if you want to use a HashMap.

There's also a discussion thread here: #t-libs > Moving hashed collection to liballoc

You would need some sort of Ord-wrapper to make it work for key types that are not Ord, assuming the keys can be totally ordered.

We do not even have core::io or alloc::io yet, despite there being massive numbers of crates that depend upon external projects that supply io::{Read,Write} outside std.

Anything that touches randomness sounds vastly more complex. lol

We could've some uninhabited type like

pub enum RandomState { };

or alternatively

pub type DefaultHasher = !;
impl BuildHasher for RandomState {
    type Hasher = !;
    fn build_hasher(&self) -> ! {
        panic!("Instantiated HashMap outside std!")
    }
}

Yet, these merely force you into declaring your randomness source, so the code would be less compatible than simply doing pub type HashMap = getrandom_hashmap::HashMap or whatever. Not great

If I read correctly, getrandom has 14 different target_os options and another 12 features, including many options only accessible through RUSTFLAGs to avoid cargo feature additivity. I doubt std ever supports all those options.

Also, we could even migrate the ecosystem away from std::collections::HashMap and towards some semi-blessed probabilistic data structures project that always uses getrandom everywhere, so morally shrink std but not really.

That situation already describes alloc as-is. If you depend on it there must be a GlobalAlloc instance somewhere but alloc itself will not (strongly) provide that symbol on a per-target basis. Instead, there's a declared GlobalAllocator singleton-type with the interface that's relaying calls to some user-defined symbol annotated with a macro, which must appear somewhere in the final crate graph.

There's a compatibility problem, we can't outright add the requirement to alloc without a fallback but fallbacks are the problem, but the concept itself is established. If that were a normal crate, I'd say to put the data structures under a new feature flag..

1 Like

Another option would be to add hashbrown's HashTable in alloc. As it doesn't have a hasher by itself, it would have no issue being added to alloc.

1 Like

That is an interesting suggestion, although that's a lot of new API surface to hash out (pun intended).

But if this existed, I would probably transition indexmap to use it instead of direct hashbrown, for example.

I don't know why you're focusing so much on randomness when nothing in my post requires any random input.

In std's current HashMap, the RandomState default implementation requires randomness, but I can provide my own argument that does whatever I want, which may or may not involve randomness.

My proposal was that code with only alloc would be able to use HashMaps but would have to provide an explicit BuildHasher. This could pull in randomness from some source, or depending on the code it might not need randomness at all if the keys are trusted input data.

1 Like

And as a bonus, exposing HashTable would also open up some usecases that people often complain about, like querying a HashMap<(String, String), T> using a (&str, &str), or using a type as key that requires outside context for hashing/equality.

2 Likes