pre-RFC: `and_then` method on `std::collections::hash_map::Entry`


#1

Sorry if this has been suggested before. I’ve searched for similar suggestions, but didn’t find anything. I recently found myself writing some code that was most naturally implemented using a HashMap, but I needed to be able to mutate the values in the map only if they were already initialised; if the key produced Entry::Vacant then I could insert a default. Because of the requirement to modify the value before inserting a default I was unable to use the following pattern as suggested in the docs:

use std::collections::HashMap;

let mut map: HashMap<&str, u32> = HashMap::new();
map.entry("poneyland").or_insert(12);

assert_eq!(map["poneyland"], 12);

*map.entry("poneyland").or_insert(12) += 10;
assert_eq!(map["poneyland"], 22);

At first I implemented what I needed as a match on the Entry variant, modifying if it was Occupied and inserting if it was Vacant, but then I wondered why I couldn’t do something similar to and_then on Option. I wrestled with move semantics for a while before coming up with an implementation that worked, so I thought I’d open up a pre-RFC to gauge interest for the feature.


#2
  • Feature Name: hashmap_entry_and_then
  • Start Date: 2017-09-14
  • RFC PR:
  • Rust Issue:

Summary

Add an and_then method to std::collections::hash_map::Entry to allow mutable access to an occupied entry’s value, before any potential calls to or_insert.

Motivation

Entrys are useful for allowing access to existing values in a map while also allowing default values to be inserted for absent keys. The existing API is similar to that of Option, where or and or_with can be used if the option variant is None.

The Entry API is, however, missing an equivalent of Option's and_then method. If it were present it would be possible to modify an existing entry before calling or_insert without resorting to matching on the entry variant.

Guide-level explanation

If a value in a HashMap is required to be modified before inserting a default it is possible to do so using entry in combination with and_then and or_insert. This allows modification of occupied entries to be chained with insertion in the case where the entry is vacant, as demonstrated below:

use std::collections::HashMap;

struct Foo {
    new: bool,
}   

let mut map: HashMap<&str, Foo> = HashMap::new();

map.entry("poneyland")
    .and_then(|e| *e.new = false)
    .or_insert(Foo { new: true })

assert_eq!(map["poneyland"].new, true)

Without and_then we would have to use match over the two Entry variants: Occupied and Vacant. For example:

use std::collections::HashMap;
use std::collections::hash_map::Entry;

struct Foo {
    new: bool,
}

let mut map: HashMap<&str, Foo> = HashMap::new();

match map.entry("poneyland") {
    Entry::Occupied(entry) => {
        entry.into_mut().new = false;
    },
    Entry::Vacant(entry) => {
        entry.insert(Foo { new: true });
    },
}

assert_eq!(map["poneyland"].new, true)

Reference-level explanation

This feature has two constraints:

  1. The new and_then method is designed to work alongside the functionality provided by or_insert and or_insert_with, therefore it must return an Entry so that the methods can be chained.
  2. It must provide access to the contained value, so that the value can be mutated if necessary.

Reference implementation:

impl<'a, K, V> Entry<'a, K, V> {
    pub fn and_then<F>(self, mut f: F) -> Self
        where F: FnMut(&mut V)
    {
        match self {
            Occupied(mut entry) => {
                f(entry.get_mut());
                Occupied(entry)
            },
            Vacant(entry) => Vacant(entry),
        }
    }
}

The two constraints could also be satisfied if we require the user provided function to accept an OccupiedEntry and also return it. The user is then free to call get_mut on it in the function body in order to obtain a mutable borrow of the entry’s value.

Alternative implementation:

impl<'a, K, V> Entry<'a, K, V> {
    pub fn and_then<F>(self, f: F) -> Self
        where F: FnOnce(OccupiedEntry<K, V>) -> OccupiedEntry<K, V>
    {
        match self {
            Occupied(entry) => Occupied(f(entry)),
            Vacant(entry) => Vacant(entry),
        }
    }
}

This has the advantage that the closure is FnOnce as opposed to the more restrictive FnMut; however, the ergonomics for the user are slightly worse as it would have to be called like so:

map.entry("poneyland")
    .and_then(|mut e| { e.get_mut() += 1; e })
    .or_insert(42);

Drawbacks

This logic can be implemented by matching on the entry variant, as described in the Guide-level section, and as such could be seen as an unneccessary addition to the API.

Another potential drawback is that this change to the API is specifically intended to solve the problem of mutating an existing value before insertion of defaults, but it is conceivable that newcomers could use it instead of following more obvious patterns. For example:

use std::collections::HashMap;

let mut map: HashMap<&str, u32> = HashMap::new();

map.insert("poneyland", 0)

// Using `and_then` for simple mutation of an entry
map.entry("poneyland").and_then(|e| { *e += 1 });

// as opposed to just using the entry in place
*map.entry("poneyland") += 1;

However, the already good documentation of the hash lib should mitigate this risk.

It is also conceivable that newcomers to the API will attempt to use and_then and or_insert out of order, although the fact that or_insert does not return Entry should make it obvious how the two are supposed to interact and will result in a compile time error.

Rationale and Alternatives

It is difficult to conceive of an alternative that fits the criteria of being compatible with existing methods (in order to support chaining) and also provide mutable access to the entry’s value. An alternative is provided in the Reference-level section, but is not as ergonomic for the user of the API.

Any alternative implementations would have to provide both mutable access to the held value and return the entry, which are hard to balance due to move semantics.

Unresolved questions

Whether or not this change to the API adds significant value over simply using match and whether or not there are better alternative implementations that have been overlooked.

Out of scope for this RFC is the porting of other methods from Option, which may make the Entry API more user-friendly/flexible.

Also out of scope is the potential for an or_insert alternative (lets call it or_inplace, for example) to be implemented that returns Entry instead of &'a mut V, which would allow the chaining of and_then after a call to or_inplace.


#3

I think you’ve found a real hole in the API, but I think you may have garden-pathed yourself into choosing a solution that is going to be hard to teach and confusing even for experts. The control structure you are trying to express is if-then-else, with mutual exclusion between the two arms, but you’re expressing it in a way that makes them sound not mutually exclusive, “do this before that”. I actually couldn’t figure out what your goal was until I got to the very bottom of the RFC because of this.

Your match-based alternative does express the if-then-else pretty cleanly IMHO, but I can see where it has a lot of boilerplate. What I’m going to suggest to you is something along the lines of

map.entry("poneyland").mutate_or_insert(
    |e| { e.new = false },
    || { Foo { new: true } })

There are two reasons why both arms are closures: so their bodies will be lazily evaluated, and to make it visually obvious which order the arguments go in. (It would be even clearer with Smalltalk-like named arguments,

(map entryFor: 'poneyland')
    mutate:   [ :e | e setOld ] 
    orInsert: [ Foo new ].

but let’s not drag in that whole can of worms.)


#4

Thanks for replying. I opened this first and was then advised to go down the PR route, which has now been merged (https://github.com/rust-lang/rust/pull/44734). My original motivation was to make the entry API more general and much more in-line with how Option works, however I think you’re right when you say:

I think you may have garden-pathed yourself into choosing a solution that is going to be hard to teach and confusing even for experts

In the end the method name was changed to and_modify to indicate more of the intent of its specific implementation, but after seeing your suggestion I think there could be an argument for replacing my implementation with your design. (I’m not quite sure what the best way of going about that would be; Open up a new PR? Reopen the old one?)

In either case I don’t think we’re really addressing my original concern, which is that the API doesn’t behave very “monadically” (for want of a better term).