So… apparently I keep screwing up uses of HashSet::insert (I essentially get it wrong every time, even when I explicitly think about it). My problem seems to be that HashMap::insert returns Some(foo) if the value already existed, while HashSet::insert returns false if the value already existed.
The reasoning behind that is that HashSet::insert returns true if the value was inserted (meaning it didn’t already exist). This is all sound, but I still kind of think of a Some as a true with data. So these two methods, even though they do almost the same, have the exact opposite return value.
I mean, I get all that, and the docs on it are good, but is anyone else also having issues with correctly calling that method? If this is an issue and not just carelessness on my part, we should probably put some thought into how to prevent such bugs.
Not sure how to solve this, but some ideas:
deprecate insert on HashSet and prefer a new try_insert, which gives you your value back if it couldn’t be inserted
deprecate insert on HashSet and prefer some new method which gives you the old value back and inserts yours
create some smart lint that figures out whether you need !hash_set.insert(foo) or hash_set.insert(foo)
The problem is that if you look into other languages you will see that:
on a hash map insertion the old value is returned (or null/none/…)
on a hash set insertion either
nothing is returned at all
true is returned if the length of the hash map changed/
the value was “new”
So changing this would be confusing for everyone being used to
other map/set implementations.
Also you semantically always insert a value into a has set, the question is just if the length of the set changes. I.e. if the value
was new.
@scottmcm: wouldn’t this try_insert be more or less a entry api?
Also you can’t get mutable access to a hash sets item (or an
hash maps key) as this would allow you to change the item,
which would change the hash, which in turn would require
the set to run some code (calculating the new hash, potentially
reducing the length by 1 if it now is equal to another item).
This can’t be done when returning a &mut T.
What could work is something like:
fn entry(&mut self, value: T) -> Entry<T>
with:
impl<'a, T> Entry<'a, T> {
fn is_new(&self) -> bool {
self.external().is_none()
}
/*[EDIT]: this method has a number of conceptual problems
implementing a &mut T, method on a standard HashSet turns
out to be a bad idea (in the general case)
fn modify<FN, R>(func: FN) -> R
where FN: FnOnce(&mut T) -> R
{...}
*/
fn external(&self) -> Option<&T> { ... }
fn external_mutl(&mut self) -> Option<&mut T> { ... }
}
// equiv. to get (but this means a Entry has two fields,
// 1st a reference to the now contained value, 2nd an
// option with the not used new value, we probably might
// not want to return a reference to the second value here
// for the case it is not exactly the same wrt. side-effects even
// through it should, or we do so make entry an enum and
// document the potential problem?)
impl<'a, T> Borrow<T> for Entry<'a, T> { ... }
//maybe also add a `fn into_external_value()` to Entry
impl<'a, T> Into<Option<T>> for Entry<'a, T> { ... }
(omitted some bounds on T)
(API would need some more consideration wrt. turning a Entry
into the opt. external value)
Yes, why not look to .entry() first to see if this can be improved with new methods there (In preference to deprecating or adding one-off methods to HashMap/HashSet)?
I see this problem mostly with HashSet::insert; it has that typical boolean problem where you might not remember which way the flag goes.
Which ones are actually inserted and which are dropped is
a implementation detail. It just happens that not inserting
is more efficient then replacing or keeping all.
The whole concept of hashing (at last wrt. maps/sets in the
normal case) is, that the hash and eq functions represent all
state of a object, and as such dropping either of two equal
objects should not be differentiable.
(I probably should have said the semantics of a set/hash set in
general computer since in my previous post)
Also as a side note this is why you should never put thinks
which have hidden state or side effects in a data structure
using hashing on it (i.e. HashSet item or HashMap key).
(Through it’s fine if the side effect is not directly observed,
e.g. a Arc of a sideeffect free type, freeing memory on drop
etc.). While you often can rely on insert dropping the new
value, it gets more messy with other operations like e.g.
union/intersection and rust theoretically could change which
value is inserted and which is keep as this is just an implementation
detail, through they won’t as it 1. makes no sense and 2. might
cause silent breakage of people wrongly implementing Hash or
misusing side-effects.