Add a `take_any` method to `HashMap`/`HashSet`

It is not the first time I'm missing a method to remove an arbitrary element from a set or map. This issue has been solved for BTreeSet and BTreeMap with the recent introduction of the pop_first/pop_last methods, but there is no equivalent for HashMap/HashSet.

What I'm looking for is as simple as that:

/// Removes an arbitrary element from the set.
fn take_any(&mut self) -> Option<T>

It is currently impossible to remove an element from a HashMap/HashSet without a preexisting reference to an element/key of the set/map (through remove or take). So to have such reference the only way is to clone one beforehand:

fn take_any<T: Clone>(set: &mut HashSet<T>) -> Option<T> {
  let result = set.iter().next()?.clone();
  set.remove(&result);
  result
}

It will be possible without the Clone bound with the yet-to-be-stabilized drain_filter function that could allow to drain the first item and keep the rest, but it seems expansive to me.

fn take_any<T>(set: &mut HashSet<T>) -> Option<T> {
  let mut taken = false;
  set.drain_filter(|_| {
    let take = !taken;
    taken = true;
    take
  }).next()
}

This has been discussed a few times, but hasn't been added to the std types because it is actually an O(n) operation for a standard hash map design.

Other map types can support it efficiently though - for example IndexMap: IndexMap in indexmap::map - Rust

1 Like

Is it? How can it cost more on average than deletion with a hash? Isn't that O(1) on average?

When deleting with a hash you already know what entry you're looking for. Without that you need to scan through the internal table looking for an occupied bucket.

True... That's unfortunate.

However I still think it is better than the current solution since it avoids the Clone bound. But that's the only benefit indeed.

I haven't checked this, but what would the performance characteristics be if one were to simply do .iter().next().unwrap()?

Iterators are supposed to be lazy, so the only things that could trip that up are:

  • empty set -> panic. This could be worked around by just doing if let Some(elt) = set.iter().next() { ... } to NOP on an empty set
  • allocation of the iterator created by the .iter() call may or may not be expensive

O(n)

1 Like

Is that for the allocation alone? Or are you referring to the general case of using iterators?

The first would be surprising to me to be honest.

The second would indeed definitely be true. However, calling .take_any() n times would presumably also be O(n), so in that case it wouldn't matter too much, and it could be used as a stable alternative for .take_any().

It's O(n) in terms of raw capacity, not the number of items. In the worst case, you could have a map with only one item in the very last bucket. If you repeat take_any(), you'll keep re-scanning for occupied entries, multiplying the complexity to O(n*m).

2 Likes

There is no allocation involved in making an iterator.

Calling take_any until the set is empty is O(n*n).

3 Likes

Minor reclarification: in the case of hashmap-like structures, it's often helpful to differentiate between

  • n, the number of items in the collection, and
  • c, the capacity of the collection (extremely handwavy).

Asymptotically, the difference disappears (c grows exactly linearly with the (historical maximum) value of n), but for the intuitive rather than formal use of big-O notation, the difference matters, because we care about the performance characteristics of small values of n and c, importantly including small n but (comparatively) large c.

With this expanded notation, you get that

  • .iter().next() is O(c) but
  • for in .iter() is just O(c + n)[1],
  • .take_any() is O(c), and
  • for in iter::from_fn(take_any) is O(c * n).

  1. Pedantry: because n is restricted to always be less than c, this should be just O(c); addition of big-O is a maximum operation. Differentiating between O(2c) and O(c + n) is a larger abuse of big-O notation than differentiating between O(c²) and O(c * n); the latter actually provides an asymptotically tighter bound when c and n grow at different rates, but the former doesn't. ↩︎

14 Likes