Maps, Sets, and the value of Keys

Gankra · September 19, 2014, 11:55am

I’ve had this conversation a lot lately, so I’m just going to post this here and hope we can settle it once and for all.

So currently, Maps treat keys like trash. These are the two main MutableMap fns:

    fn swap(&mut self, k: K, v: V) -> Option<V>;
    fn pop(&mut self, k: &K) -> Option<V>;

swap takes a key by value, and in the case of a key-match returns a value, and just drops one of the keys. pop meanwhile removes a key, but never gives it back. The search methods are a similar story, they only yield values. As a consequence, Sets (which are just a Map<T, ()>) can’t tell you at all about what’s happening inside them:

    fn insert(&mut self, value: T) -> bool;
    fn remove(&mut self, value: &T) -> bool;

They just know if something was there, and nothing else.

Now, on one hand, this all makes sense. The current convention in Rust is that if two objects are Eq, they’re basically byte-wise equal (ignoring actual memory addresses). Similarly, the two biggest keys are integers and strings, objects which are completely indistinguishable when equal. Therefore, information about keys would just be noise when we really do generally only care about values. If we accept this, then this opens up room for some internal optimizations. On key collisions, the implementation can keep whichever key is easiest (probably the old one, to avoid a write). In fact, this is actually basically the case right now. HashMap keeps the old key, and TreeMap keeps the new key. The behavior of a generic Map is basically undefined with respect to which keys are kept or lost.

On the other hand, there’s no fundamental reason why Eq keys should be considered indistinguishable. It’s arbitrary code, the user can make it work how they want. Further, the keys could implement Drop, and therefore which does or doesn’t survive could really matter. Sets, which consist only of Keys, are exceptional minimal because of this behavior. All you can do is ask yes/no questions or get an iterator. Also, with tuple indexing, returning a “useless” key wouldn’t be a huge ergonomic loss.

So there’s the issue. We have Maps and Sets which basically assert that keys are uninteresting as actual values. This provides room for optimization, and keeps the API simple. However this constrains some niche hypothetical use cases. At very least, the current behavior is undocumented and inconsistent. I’d like to at least have a documented convention for how keys should be treated.

davidhalperin · September 20, 2014, 6:24pm

I’d be interested to hear any compelling examples you can come up with or precedent in other languages. As this stands, this doesn’t seem worthwhile IMHO.

I think Eq should fundamentally mean that the 2 objects are indistinguishable for useful purposes. Obviously that can’t be enforced but it should be the high level contract. Failing to meet that expectation seems both fundamentally avoidable and certain to lead to confusion. It seems like this would make the API more complex where the only gain would be making it convenient to allow people to make bad design decisions.

So perhaps the idea of Eq and/or the way collections treat keys should be documented more explicitly but I think the current state of affairs is fine. Thoughts? Am I missing something?

Gankra · September 20, 2014, 6:53pm

I’d like to clarify that I am not advocating any particular stance on this matter. I wrote out my thoughts on the matter, and I just want to obtain a consensus so that it can be documented and coded against. The current state of affairs is effectively “a concrete map will do whatever the author decided made sense at the time”, and I don’t think that’s an acceptable way to build out standard APIs.

After a cursory scan of some languages, the following behaviour can be found on a key collision in a Map:

Java: Change the value, not the key
Python: Ambiguous
C#: Throw an IllegalArgumentException (lolwut)
C++: Drop both the new key and value
Ruby: Ambiguous
Qt: Change the value, not the key

There’s some inconsistency here, but the general trend is that keys are considered uninteresting or forgettable during a collision. This seems like a perfectly reasonably solution. I’m not sure if I’m willing to commit to Eq === Indistinguishable, but that would also certainly be a sensible convention.

davidhalperin · September 20, 2014, 8:20pm

Sorry, to clarify are you suggesting (thought not necessarily advocating for) a richer API to return keys in more places so they can be used as more meaningful values? That was mainly what I was objecting to. I don’t think there’s a use case for swap or pop to return the key in addition to the value that’s not an anti-pattern and I don’t think there’s much precedent for collection APIs doing things like that. If this is just about what the convention for which key should win on an insert collision, I agree that should be documented. I’d vote in favor of it being defined as implementation dependent (since I think Eq !== Indistinguishable is an anti-pattern) but I don’t see that as mattering too much either way.

Gankra · September 20, 2014, 10:30pm

I think it’s worth considering exposing the keys, if only as a “we thought about it, and it wasn’t worth it” so I have somewhere to point when people mention it.

For what it’s worth, right now I’m leaning strongly towards “it’s not worth it”, because it clearly hasn’t been a problem, and no one can give me a concrete use case. Certainly for 1.0, where we’re going for minimal and lean, it seems unnecessary. I’m not sure if I’d go so far as to say it’s an anti-pattern, but I can certainly see where you’re coming from. It would likely be a symptom of a hack around something.

Edit: and I lumped in the “which key wins” thing because it is clearly part of the same overall “how much do we care about keys” issue.

bluss · September 21, 2014, 9:43am

If you want to look up the actual key value in Python, you can just make a dict of key -> key mappings. Since rust maps want to own both key and value, we don’t have that option.

Gankra · September 28, 2014, 3:35pm

@eddyb has a use case in the compiler that makes keys interesting. He basically wants a HashSet to use as a cache that he can query with equiv keys to get the real back out. He wants something like:

HashSet::find_equiv<Q: Hash<S> + Equiv<T>>(&self, &Q) -> Option<&T>

Specifically:

>  right now I have FnvHashMap<InternedTy<'tcx>, Ty<'tcx>>
> InternedTy is a special wrapper for Ty that just has the right Equiv, Eq and Hash impls1
> (to work as a key)
> I would prefer to use FnvHashSet<InternedTy<'tcx>>

frankmcsherry · April 8, 2015, 1:29pm

Sorry for reviving this, but it came up again recently for me. One distinction I see between Rust and most of these languages you surveyed is the linear-ness of Rusts types.

This sort of answers the "are they effectively indistinguishable?" question with a "no, because if I move/drop one, I still have the other". Cloning or dropping something can be expensive, and it would be great to avoid this when possible (e.g. hand things back to the user and let them drop them if appropriate, and spare them an early clone if they don't need one).

I have several cases where keys can be String or Vec<T>, in which case it is a pain to have one taken away from me, even if it is Eq to another one I happen to have nearby.

carllerche · April 17, 2015, 10:11pm

I’ve been needing to “pop” values from sets.

Basically: HashSet::pop(&mut self) -> Option<T>

This came up when I was trying to implement DFA minimization (Hopcroft’s algorithm) http://en.wikipedia.org/wiki/DFA_minimization. Basically, I have a set of partitions and loop, removing an element from the set each iteration. The set is also mutated later (items removed / added) so set operations are needed as well as being able to remove a value from the set each iteration (the specific value does not matter).

i30817 · April 18, 2015, 1:34am

never mind this comment, misunderstood the issue.

gsingh93 · April 28, 2015, 11:28pm

I’m also having performance issues where I have to excessively clone keys because I can’t get them back if they’re not needed. In my case, instead of using the entry API, I just use multiple calls on the hashtable to check if a key is present and if not, insert the key. This is less than optimal, but better than excessive clones.

system · March 25, 2019, 8:22am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
HashMap / HashSet 'get_key()', generalised contianer	8	2058	March 25, 2019
Post: optimizing hashmaps even more language design	5	1333	August 17, 2021
Impl IndexMut for HashMap libs	22	4076	December 21, 2019
What does the future hold for IndexSet? language design	4	1992	March 25, 2019
[Pre-RFC] NonInternalMut auto trait (never mind) libs	8	1110	March 25, 2019

Maps, Sets, and the value of Keys

Related topics