Add a `take_any` method to `HashMap`/`HashSet`

thaud · January 6, 2023, 12:34pm

It is not the first time I'm missing a method to remove an arbitrary element from a set or map. This issue has been solved for BTreeSet and BTreeMap with the recent introduction of the pop_first/pop_last methods, but there is no equivalent for HashMap/HashSet.

What I'm looking for is as simple as that:

/// Removes an arbitrary element from the set.
fn take_any(&mut self) -> Option<T>

It is currently impossible to remove an element from a HashMap/HashSet without a preexisting reference to an element/key of the set/map (through remove or take). So to have such reference the only way is to clone one beforehand:

fn take_any<T: Clone>(set: &mut HashSet<T>) -> Option<T> {
  let result = set.iter().next()?.clone();
  set.remove(&result);
  result
}

It will be possible without the Clone bound with the yet-to-be-stabilized drain_filter function that could allow to drain the first item and keep the rest, but it seems expansive to me.

fn take_any<T>(set: &mut HashSet<T>) -> Option<T> {
  let mut taken = false;
  set.drain_filter(|_| {
    let take = !taken;
    taken = true;
    take
  }).next()
}

sfackler · January 6, 2023, 12:42pm

This has been discussed a few times, but hasn't been added to the std types because it is actually an O(n) operation for a standard hash map design.

Other map types can support it efficiently though - for example IndexMap: IndexMap in indexmap::map - Rust

thaud · January 6, 2023, 12:49pm

Is it? How can it cost more on average than deletion with a hash? Isn't that O(1) on average?

sfackler · January 6, 2023, 12:50pm

When deleting with a hash you already know what entry you're looking for. Without that you need to scan through the internal table looking for an occupied bucket.

thaud · January 6, 2023, 12:54pm

True... That's unfortunate.

However I still think it is better than the current solution since it avoids the Clone bound. But that's the only benefit indeed.

jjpe · January 6, 2023, 4:49pm

I haven't checked this, but what would the performance characteristics be if one were to simply do .iter().next().unwrap()?

Iterators are supposed to be lazy, so the only things that could trip that up are:

empty set -> panic. This could be worked around by just doing if let Some(elt) = set.iter().next() { ... } to NOP on an empty set
allocation of the iterator created by the .iter() call may or may not be expensive

sfackler · January 6, 2023, 5:28pm

O(n)

jjpe · January 6, 2023, 5:48pm

Is that for the allocation alone? Or are you referring to the general case of using iterators?

The first would be surprising to me to be honest.

The second would indeed definitely be true. However, calling .take_any() n times would presumably also be O(n), so in that case it wouldn't matter too much, and it could be used as a stable alternative for .take_any().

cuviper · January 6, 2023, 6:03pm

It's O(n) in terms of raw capacity, not the number of items. In the worst case, you could have a map with only one item in the very last bucket. If you repeat take_any(), you'll keep re-scanning for occupied entries, multiplying the complexity to O(n*m).

sfackler · January 6, 2023, 6:12pm

There is no allocation involved in making an iterator.

Calling take_any until the set is empty is O(n*n).

CAD97 · January 6, 2023, 8:00pm

Minor reclarification: in the case of hashmap-like structures, it's often helpful to differentiate between

n, the number of items in the collection, and
c, the capacity of the collection (extremely handwavy).

Asymptotically, the difference disappears (c grows exactly linearly with the (historical maximum) value of n), but for the intuitive rather than formal use of big-O notation, the difference matters, because we care about the performance characteristics of small values of n and c, importantly including small n but (comparatively) large c.

With this expanded notation, you get that

.iter().next() is O(c) but
for in .iter() is just O(c + n)^[1],
.take_any() is O(c), and
for in iter::from_fn(take_any) is O(c * n).

Pedantry: because n is restricted to always be less than c, this should be just O(c); addition of big-O is a maximum operation. Differentiating between O(2c) and O(c + n) is a larger abuse of big-O notation than differentiating between O(c²) and O(c * n); the latter actually provides an asymptotically tighter bound when c and n grow at different rates, but the former doesn't. ↩︎

system · April 6, 2023, 8:01pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
HashMap / HashSet 'get_key()', generalised contianer	8	2040	March 25, 2019
Conditional removal for Btree and HashMap: remove_if() language design	6	834	February 6, 2024
[Pre-RFC] Retain Iterators API for HashSet/HashMap/Vec/VecDeque libs	13	1789	March 25, 2019
Maps, Sets, and the value of Keys libs	11	3325	March 25, 2019
[Pre-RFC] HashMap/BTreeMap Entry method or_insert_with_key libs	5	813	July 9, 2020

Add a `take_any` method to `HashMap`/`HashSet`

Related topics