It is not the first time I'm missing a method to remove an arbitrary element from a set or map. This issue has been solved for BTreeSet and BTreeMap with the recent introduction of the pop_first/pop_last methods, but there is no equivalent for HashMap/HashSet.
What I'm looking for is as simple as that:
/// Removes an arbitrary element from the set.
fn take_any(&mut self) -> Option<T>
It is currently impossible to remove an element from a HashMap/HashSet without a preexisting reference to an element/key of the set/map (through remove or take). So to have such reference the only way is to clone one beforehand:
fn take_any<T: Clone>(set: &mut HashSet<T>) -> Option<T> {
let result = set.iter().next()?.clone();
set.remove(&result);
result
}
It will be possible without the Clone bound with the yet-to-be-stabilized drain_filter function that could allow to drain the first item and keep the rest, but it seems expansive to me.
fn take_any<T>(set: &mut HashSet<T>) -> Option<T> {
let mut taken = false;
set.drain_filter(|_| {
let take = !taken;
taken = true;
take
}).next()
}
When deleting with a hash you already know what entry you're looking for. Without that you need to scan through the internal table looking for an occupied bucket.
Is that for the allocation alone?
Or are you referring to the general case of using iterators?
The first would be surprising to me to be honest.
The second would indeed definitely be true. However, calling .take_any() n times would presumably also be O(n), so in that case it wouldn't matter too much, and it could be used as a stable alternative for .take_any().
It's O(n) in terms of raw capacity, not the number of items. In the worst case, you could have a map with only one item in the very last bucket. If you repeattake_any(), you'll keep re-scanning for occupied entries, multiplying the complexity to O(n*m).
Minor reclarification: in the case of hashmap-like structures, it's often helpful to differentiate between
n, the number of items in the collection, and
c, the capacity of the collection (extremely handwavy).
Asymptotically, the difference disappears (c grows exactly linearly with the (historical maximum) value of n), but for the intuitive rather than formal use of big-O notation, the difference matters, because we care about the performance characteristics of small values of n and c, importantly including small n but (comparatively) large c.
Pedantry: because n is restricted to always be less than c, this should be just O(c); addition of big-O is a maximum operation. Differentiating between O(2c) and O(c + n) is a larger abuse of big-O notation than differentiating between O(c²) and O(c * n); the latter actually provides an asymptotically tighter bound when c and n grow at different rates, but the former doesn't. ↩︎