`HashSet::insert` considered harmful


#1

So… apparently I keep screwing up uses of HashSet::insert (I essentially get it wrong every time, even when I explicitly think about it). My problem seems to be that HashMap::insert returns Some(foo) if the value already existed, while HashSet::insert returns false if the value already existed.

The reasoning behind that is that HashSet::insert returns true if the value was inserted (meaning it didn’t already exist). This is all sound, but I still kind of think of a Some as a true with data. So these two methods, even though they do almost the same, have the exact opposite return value.

I mean, I get all that, and the docs on it are good, but is anyone else also having issues with correctly calling that method? If this is an issue and not just carelessness on my part, we should probably put some thought into how to prevent such bugs.

Not sure how to solve this, but some ideas:

  • deprecate insert on HashSet and prefer a new try_insert, which gives you your value back if it couldn’t be inserted
  • deprecate insert on HashSet and prefer some new method which gives you the old value back and inserts yours
  • create some smart lint that figures out whether you need !hash_set.insert(foo) or hash_set.insert(foo)

#2

:+1:

Maybe fn try_insert(&mut self, value: T) -> Result<&mut T, T>?

  • On success, gives you a borrow of what you inserted
  • On error, gives you your value back
    • I suppose this could give you a borrow of the value that’s already there too; dunno if that’s overkill

#3

The problem is that if you look into other languages you will see that:

  • on a hash map insertion the old value is returned (or null/none/…)
  • on a hash set insertion either
    1. nothing is returned at all
    2. true is returned if the length of the hash map changed/ the value was “new”

So changing this would be confusing for everyone being used to other map/set implementations.

Also you semantically always insert a value into a has set, the question is just if the length of the set changes. I.e. if the value was new.

@scottmcm: wouldn’t this try_insert be more or less a entry api?

Also you can’t get mutable access to a hash sets item (or an hash maps key) as this would allow you to change the item, which would change the hash, which in turn would require the set to run some code (calculating the new hash, potentially reducing the length by 1 if it now is equal to another item). This can’t be done when returning a &mut T.

What could work is something like:

fn entry(&mut self, value: T) -> Entry<T>

with:

impl<'a, T> Entry<'a, T> {
  fn is_new(&self) -> bool {
      self.external().is_none()
  }
/*[EDIT]: this method has a number of conceptual problems
  implementing a &mut T, method on a standard HashSet turns
  out to be a bad idea (in the general case)
  fn modify<FN, R>(func: FN) -> R
      where FN: FnOnce(&mut T) -> R
  {...}
*/
  fn external(&self) -> Option<&T> { ... }

  fn external_mutl(&mut self) -> Option<&mut T> { ... }
}

// equiv. to get (but this means a Entry has two fields,
// 1st a reference to the now contained value, 2nd an
// option with the not used new value, we probably might
// not want to return a reference to the second value here
// for the case it is not exactly the same wrt. side-effects even
// through it should, or we do so make entry an enum and 
// document the potential problem?)
impl<'a, T> Borrow<T> for Entry<'a, T> { ... }

//maybe also add a `fn into_external_value()` to Entry
impl<'a, T> Into<Option<T>> for Entry<'a, T> { ... }

(omitted some bounds on T) (API would need some more consideration wrt. turning a Entry into the opt. external value)


#4

Hmm, I don’t think this is true. You observably do not insert if an equal value is present – see playground.


#5

Yes, why not look to .entry() first to see if this can be improved with new methods there (In preference to deprecating or adding one-off methods to HashMap/HashSet)?

I see this problem mostly with HashSet::insert; it has that typical boolean problem where you might not remember which way the flag goes.


#6

Which ones are actually inserted and which are dropped is a implementation detail. It just happens that not inserting is more efficient then replacing or keeping all.

The whole concept of hashing (at last wrt. maps/sets in the normal case) is, that the hash and eq functions represent all state of a object, and as such dropping either of two equal objects should not be differentiable.

(I probably should have said the semantics of a set/hash set in general computer since in my previous post)


Also as a side note this is why you should never put thinks which have hidden state or side effects in a data structure using hashing on it (i.e. HashSet item or HashMap key). (Through it’s fine if the side effect is not directly observed, e.g. a Arc of a sideeffect free type, freeing memory on drop etc.). While you often can rely on insert dropping the new value, it gets more messy with other operations like e.g. union/intersection and rust theoretically could change which value is inserted and which is keep as this is just an implementation detail, through they won’t as it 1. makes no sense and 2. might cause silent breakage of people wrongly implementing Hash or misusing side-effects.