Multiple mutable borrows / finer grained borrow check


#1

Is there any fundamental reason that the following isn’t possible?

let mut map = HashMap::new();
...

let elt1 = map.get_mut("key 1").expect("getting elt 1");
let elt2 = map.get_mut("key 2").expect("getting elt 2");

elt1.do_something();
elt2.do_something_too();
elt1.do_something_else();

Obviously the current borrow checker won’t allow the line let elt2 = ... because there is already a mutable reference to map, and map does not have a get_mut_two(k1, k2) operation. But elt1 and elt2 do not overlap and get_mut does not move elements so will not invalidate pointers to existing elements. Tricky, but is is this actually possible to implement?

A simpler example:

struct A {
    data: Vec<T>,
    name: Option<String>,
};
impl A {
    fn set_name(&mut self, name: String) {
        if self.name.is_some() { panic!("Help! My name is not supposed to change!"); }
        self.name = Some(name);
    }
    fn read_data(&mut self) {
        for item in self.data {  // reference to self.data over the following block
            if let Some(name) = item.get_name() {
                self.set_name(name);  // requires a mutable reference to self, except it doesn't...
            } else ....
        }
    }
}

In this example, in-lining the set_name method or passing it a mutable reference directly to self.name would solve the problem. But why is this necessary? Well, because we don’t have any way of annotating set_name to say that it only mutates self.name and does not even read other members of self. Which, if available, would make the set_name function signature more complex, but might still be useful if set_name is used a lot.

To be clear, none of this functionality is required for anything I’ve wanted to do, it would just be nice.

So, is this possible? The second example should be quite straight-forward since the fields in question are fixed at compile-time. The first example is similar but I can’t currently see a way of doing it without run-time checks, but I’m not convinced this should be necessary (except to ensure "key 1" != "key 2").


#2

The second example is certainly possible if we made more extensive annotations. For example, instead of &mut A, we could have something like &mut A {name}, which is a mutable reference to Foo that is only permitted access to the name field. Then you could write something like this:

struct A {
    data: Vec<T>,
    name: Option<String>,
};
impl A {
    fn set_name(&mut self {name}, name: String) {
        if self.name.is_some() { panic!("Help! My name is not supposed to change!"); }
        self.name = Some(name);
    }

    fn read_data(&mut self) { .. }
}

Now the compiler could consider the call to name safe, since self.name is not borrowed. I guess one could imagine even encoding that borrow into the signature, sort of like:

fn set_name(self: &mut A { name: ref mut name, .. }, ..)

Or of course we could maybe move the annotation to the fn signature, sort of like a where clause:

fn set_name(&mut self, name: String) writes self.name { ... }

That’s probably the most realistic, but it’s also probably not how the type system would think of things under the covers.

Anyway, until now, we’ve shied away from these sorts of annotations out of a desire to keep the complexity of the overall annotations under control, but of course the cost of that is that one often has to restructure one’s code into distinct types or free functions to avoid these errors.

Now, your first example, with the hash-map, is much harder. For that we’d have to actually know that the keys are unequal, and we’d have to understand the details of how hashmap is implemented. It’d be much more plausible for slices ([T]), since indexing of those is built into the compiler, and the keys are integers, but even there it’s pretty hard.


#3

For the first example, which is very hard to prove to be safe via statical typing, maybe the following API could be provided in the future (“type-level usize” and “value N” are placeholders for type level integer syntax):

fn get_mut_multi<Q: ?Sized, N: type-level usize>(&mut self, k: [&Q; value N]) -> Option<[&mut V; value N]>
    where K: Borrow<Q>, Q: Hash + Eq

The library can then dynamically ensure that the returned references refer to distinct values. This doesn’t allow getting values multiple times, but at least it allows for getting multiple values at once.


#4

At the very least, get_mut_two (and get_mut_three? more than that and you probably have to start sorting and stuff, and you might want to just use RefCell instead) sound like useful methods to have on maps and slices.


#5

For slices, you can use split_at and split_at_mut.


#6

more than that and you probably have to start sorting and stuff, and you might want to just use RefCell instead

I, too, was thinking that sorting might be necessary for the uniqueness check… this would make the getting operation O(m log m), where m is the number of values being fetched. (The unsafe version where the programmer asserts that the keys are distinct, would be O(m)).

But I think there’s a way to get safe O(m) fetching: for starters, with small m, naive O(m^2) behaviour should be fine (or alternatively O(m log m) shellsort or other low-overhead sort…?), but with larger m, there can be a small hashmap of the pointers to the values being fetched that keeps track whether a value has been already borrowed during the call. (Not sure, if it’s possible to stack-allocate a hashmap? Dynamic allocation feels unacceptable for just a collection API call.)

This reproduces the RefCell-like dynamic borrow checking during the call without the need for all the values in the collection to be wrapped in RefCells. Of course, it might well be that a complex scenario like this isn’t needed, if the m is expected to be quite small.

Another design question would be, if the amount of values fetched should be statically determined or dynamic. Unless I’m mistaken, the static API needs type-level integers to work. The dynamic API is (at least I think it is) implementable today.

Now that I think about it, the dynamic API is more flexible… but the static API would be easy to use with heap-allocations only. EDIT: Now that I think about it, dynamic is definitely better, as it allows for everything the static API would allow, and more.


#7

I wonder if you are over-thinking this. I can only see simultaneous access being required where each item used is mentioned explicitly in the code, thus most uses will use a very small number of items simultaneously (two, possibly three). For these cases O(m^2) is hardly an issue.

Where m > 3 quite likely the code needs rethinking anyway.


#8

I think a get_multi_mut would be a good idea, however I am not so sure about requiring a fixed-length.

An algorithm might very well have a set of keys for which it wishes to interact with the values and compute that set of keys at runtime.

If one wishes to conserve the “non-allocating” behavior of the method (which is nice), it could be switched to:

fn get_multi_mut<Q: ?Sized>(&mut self, key_values: &mut [(&Q, Option<&mut V>)])
    where K: Borrow<Q>, Q: Hash + Eq

In this version, choosing how to handle duplicate keys is an open-choice: you could simply set the Option to None for the 2nd+ occurrence of a given key.

Another advantage is that it’s implementable today (even externally, by the way).

I also find it ugly :frowning:, but taste is always subjective!


#9

Perhaps a nicer API would be to wrap the original container in a new one (Wrapper::new(&mut map)); Wrapper would contain a hash set of borrowed values (perhaps starting with a linear search of a small array and switching to an allocated set when necessary), and it would itself implement versions of get/get_mut (and Index) which all take &self and return a Ref wrapper, dynamically tracking borrows like RefCell.

Definitely implementable externally, though… maybe not without violating some people’s idea of the Rust memory model. (You have to compare pointers rather than keys to deal with treacherous Eq instances, but since there is no index method that returns a raw pointer, you have to get a reference and compare that, and some memory model proposals include “instant aliasing death”, i.e. undefined behavior as soon as an aliasing reference is created. In practice, LLVM won’t do anything weird if you throw away the reference before dereferencing it.)