[blog post] Nested method calls via two-phase borrowing

nikomatsakis · March 1, 2017, 2:50pm

Continuing the discussion from Accepting nested method calls with an `&mut self` receiver, I wrote a new blog post called "Nested method calls via two-phase borrowing" that proposes a solution. Here is the conclusion:

I have presented here a simple proposal that tries to address the "nested method call" problem as part of the NLL work, without modifying the desugaring into MIR at all (or changing MIR's dynamic semantics). It works by augmenting the borrow checker so that mutable borrows begin as "reserved" and then, on first use, convert to active status. While the borrows are reserved, they impose the same restrictions as a shared borrow.

Read the whole thing, then let me know what you think.

ExpHP · March 1, 2017, 5:29pm

I keep seeing the phrasing “to use a mutable borrow,” but it’s not clear to me what that means (which is bothersome because it seems to be the crux of the dilema!). Is there a precise meaning for this? E.g. at the boundary of a function that takes it as an argument? Or maybe when it is dereferenced?

let x = 3;
let p = &mut x;

let q1 = p;  // Is this a "use" of p?
let q2 = *p; // How about this?
let q3 = &mut *p; // Eh?

Hm, and it’s always bothered me that closures mutably borrow their environment immediately…

let mut vec = vec![];
let mut f = |i| vec.push(i); // is creating this closure a use of &mut vec?
f(vec.len());

mbrubeck · March 1, 2017, 6:01pm

This focuses on the case where the “reserved” phase comes strictly before the “mutably borrowed” phase. Or, in the terms of alternative “two-phase lifetime” proposal, where the write lifetime is a suffix of the read lifetime. But there’s also an interesting case where the “write” lifetime is a prefix of the “read” lifetime. This is detailed in the Rustonomicon and I have seen it several times in real-world code:

struct Foo(u32);

impl Foo {
    fn mutate_and_share(&mut self) -> &u32 {
        self.0 = 5;
        // `self` will no longer be mutated after this point
        &self.0
    }
    fn share(&self) {}
}

fn main() {
    let mut foo = Foo(1);
    let loan = foo.mutate_and_share();
    foo.share();
}

It seems like this could be solved as part of the same mechanism, as long as a mutable borrow can be “deactivated” as well as “activated”. This could be considered “three-phase borrowing” since there would be a “reserved” phase, then a “mutably borrowed” phase, then a “shared” phase. Alternately, it could be implemented as “two-phase lifetimes” where the “write lifetime” is any continuous subset of the “read lifetime” (not necessarily a suffix).

nikomatsakis · March 1, 2017, 6:33pm

Heh, I wondered if someone would call me on that. I did indeed have a precise meaning in mind. I meant that it appears in MIR in some position other than being directly assigned to (often called a "use" in compilers, as opposed to x = ..., which is a "def").

So, in answer to your questions, the answer in all cases is yes:

let x = 3;
let p = &mut x;

let q1 = p;  // Is this a "use" of p? YES
let q2 = *p; // How about this? YES 
let q3 = &mut *p; // Eh? YES
...
let mut vec = vec![];
let mut f = |i| vec.push(i); // is creating this closure a use of &mut vec? YES
f(vec.len());

Basically, anytime that the variable name appears but is not being directly overwritten.

nikomatsakis · March 1, 2017, 6:40pm

Perhaps. I think the tricky part here how to communicate to the caller that it’s ok for them to allow the receiver to be mutated after mutate_and_share() returns. In particular, the question is this: what happens to the state that you did not return. Can mutate_and_share() assume that nobody will be touching that state? Right now, it can assume that, though only for the lifetime 'a. Given that mutate_and_share() can’t rely on destructors to run, I’m not sure if there is a profitable way to take advantage of that assumption. We’d also have to specify precisely what the rules are for when we decide that the “main borrow” is “deactivated”. Is it ok if the &u32 is returned embedded in a struct (e.g., Slice<'a>)? If so, how do we know that it’s ok? Seems like we have to look at the struct fields (perhaps we can deduce it from variance).

Anyway, it seems like a complex topic to me. I know there have been several threads on it.

JelteF · March 1, 2017, 8:46pm

I also posted this on reddit, but then saw that you wanted to discuss it here:

There’s a quite confusing typo in the second code block containing this:

/* 0 */ tmp0 = &mut vec;   // mutable borrow created here..
/* 1 */ tmp1 = &vec; // <-- shared borrow overlaps here         |
/* 2 */ tmp2 = Vec::len(tmp1); //                               |
/* 3 */ Vec::push(tmp0, tmp1); // ..but not used until here!

In line three tmp1 is used instead of tmp2. In the first code snippet with almost the same code this is done correctly.

nikomatsakis · March 1, 2017, 9:04pm

thanks, fixed.

comex · March 1, 2017, 11:14pm

If you mean whether it should be able to assume that, then surely it should be able to assume the state won't be written/mutably borrowed, since the same goes for regular &self -> &u32, but may be read/immutably borrowed, as the goal is to call &self methods afterward.

But after thinking about it, I guess you're worried about what unsafe code may currently be assuming?

I came up with an example. It's pretty nonsensical and unlikely, but I believe the current borrow rules guarantee it to be safe, whereas it would be unsafe if &'a mut self -> StructContainingImmutableRef<'a> allowed future immutable borrows of self. Basically, it involves a union with a safe method from &self to variant A, which normally always contains A, except for a method that takes &mut self, swaps the variant to B, and returns a struct wrapping an immutable reference to that.

comex · March 1, 2017, 11:34pm

We could consider reserved borrows to be something akin to the old const borrows we used to support: these would permit reads and writes of the original path, but not moves. There are some tricky cases to be careful of (for example, if you reserve *b where b: Box, you cannot permit people to mutate b, because that would cause the existing value to be dropped and hence invalidate your existing reference to *b), but it seems like there is nothing fundamentally stopping us. I did not propose this because (a) I would prefer not to introduce a third class of borrow restrictions and (b) most examples which would benefit from this change seem quite artificial and not entirely desirable (though there are exceptions). Basically, it seems ok for vec.get({vec.push(2); 0}) to be an error. =)

I had a thought that if a third class of borrow restriction were not only added (scary) but exposed in the type system (much scarier), it could make some helper methods more ergonomic - ones that look like:

fn get_baz(&mut self) -> &mut Baz { &mut self.foo.bar.baz }

Right now, calling that locks you out of the whole rest of self, because the contract is "return value is alive and won't be touched as long as self is alive and won't be touched". In theory, if you could change the signature to have the contract just be "return value is alive as long as self is alive", and then later "activate" it into a mutable borrow (which would still lock out all of self, but only temporarily), you could greatly reduce spurious lifetime conflicts.

But that doesn't help if baz is behind a pointer or option or something, because then that contract can't be guaranteed. So this is probably a dumb idea. If the problem needs solving at all, I guess it would be better to approach it with things like "which fields I borrow" specifications on methods (which are really ugly, but at least more flexible).

glandium · March 2, 2017, 2:39am

The v.push(v.len()) thing makes me wonder. Are such nested function calls codified? Someone on /r/rust mentions that they don't understand why it's not desugared to

let foo = v.len(); v.push(foo)

which is good enough in that simple case, but what about more complex things, like

foo.bar().baz(foo.qux())

where bar, baz and qux might do arbitrary things?

arielb1 · March 2, 2017, 9:46am

I think that if you start allowing “yet-to-be-borrowed” lifetimes to be present in types, you can’t rely on the borrow-checker to enforce them, because of things like:

let mut x = 1;
let mut y = 2;
let mut x_ = (&mut x,);
let mut y_ = (&mut y,);
mem::swap(&mut x_, &mut y_);

// must allow (one occurrence) of these lines in both orders, but
// not concurrently
read(*y);
*x_ = 2;

In the “normal” case, the “lifetime as set of points” proposal dually identifies a live lifetime with it’s set of exits. I suppose we can allow for “yet-to-be-live” lifetimes by identifying them with their set of starts and set of exits (and therefore, in the dual “set of points” representation, forcing all points between a start and an exit to be live). We can achieve this in inference by adding dataflow equalities that force regions to be contiguous.

Also, LSP computes the “precise” lifetimes of each borrow by itself - even if you have eqtys, 'a ~ 'b is defined as 'a <: 'b ∧ 'b <: 'a - LSP subverts eqty, so there is no need for borrowck to do its own dataflow.

nikomatsakis · March 2, 2017, 10:23am

What I am wondering is whether it can assume that the state will not be read. To step back, when you see a signature like this one:

fn foo<'a>(&'a mut Foo) -> &'a u32

The way I think of it is: you give me exclusive access to some memory containing Foo for the lifetime 'a. I give you back shared access to some memory containing a u32 for the remainder of the lifetime 'a.

There is no a priori reason to think that (a) the u32 is part of the Foo or, even if it is, that (b) I have relinquished my exclusive access to the other parts of Foo.

Now, it may be that one cannot make productive use of those other parts of Foo. For example, if I had (e.g.) started a thread that was accessing the other fields of Foo in parallel, I wouldn't have any way to make that thread stop executing before 'a concluded -- but this argument is subtle, and it relies on the fact that 'a is bound on the method (it's a kind of lifetime parametricity argument).

So I guess what I'm saying is that I can imagine yes that we could make some rules for when the caller must be "relinquishing" their borrow (or at least converting it to a shared borrow) but I'm wary and would like to think more deeply about it. At best that would be a sort of "pattern matching" on the signature, since we don't currently have a way to express that concept in the type system. (Which might be satisfactory.)

Regardless, this is a good example to keep in mind. I've been meaning to write up a post going through my list of "borrowck challenges" -- I think that NLL and the nested method thing addresses a good many of them, but certainly some remain. My personal list I just put into a gist with some very brief notes, but it's probably incomplete.

nikomatsakis · March 2, 2017, 10:26am

This is an interesting example. To be clear, the system I was thinking of would consider the mutable borrows of x and y to start immediately, since (in the MIR) the temporary would get used when it is packaged up into a tuple.

If we were just trying to track mutation regions, if seems like the mem::swap line would require that the type of x_ and y_ be equal (at that point), which would seem to imply that their write regions would overlap (at that point). This would then imply that the two borrows are overlapping and an error would be reported, right?

That is, I wouldn't expect this program to be legal, as long as any write occurs (since otherwise the write regions would be empty).

nikomatsakis · March 2, 2017, 10:27am

This is what I referred to as “changing the desugaring”. The previous internals thread did indeed discuss various options like that, along with their pros and cons.

nikomatsakis · March 2, 2017, 11:41am

I wanted to expand on my answer. I think I overstated the case in my conclusions and comparisons about the "equivalence" between this borrowck-based approach and a type-based approach (even one that guarantees continuous regions). In particular, I intentionally adopted a limited and simple version of "use" (i.e., one that cannot "see through" a move, or embedding the value into a structure). Most likely, if we took a type-based approach, you would get a similar feel, but with a richer notion of "use".

Example:

let a = &mut foo;
let b = a; // alternative: `let b = (a, );`
// is the borrow of `foo` active here?

With a type-based approach, presumably the type of b would be equal to the type of a (or, in the alternative case, it would be a tuple that includes a). I think that if we wanted a richer notion of "use" like this, I'd probably pursue some kind of type-based alternative -- you probably could model it with the borrowck, at least those examples, but there are a host of others where it might not work (e.g., let b = vec![a];).

That said, I don't really feel that's necessary. I think the main goal is to support the simple example of vec.push(vec.len()), not to make a super general concept of "delayed activation" that people will use all the time. I think the simple definition I gave is also readily explainable. When you do let x = &mut..., the next time you refer to x (basically), it is activated. When you do foo(&mut) or some such thing, it is as if you did let tmp = &mut; foo(tmp), and hence it is activated. But I have to ponder it -- perhaps it winds up being confusing in practice.

arielb1 · March 2, 2017, 1:06pm

Yup. The advantage of true type-based approaches is that they work well with functions. And I want to pass multiphase borrows to functions because that solves at least the get/get_mut problem and allows writing the @comex foo function.

On the other thread, I think that discontiguous borrows are an inherent problem of may_dangle lifetimes:

struct Dangling<#[may_dangle] 'r> {
    data: *const u32,
    lifetime: PhantomDanglingLifetime<'r>,
}

fn main() {
    let mut x = None;
    loop {
        let a = 0;
        x = Some(Dangling::new(&a));
    }
}

The reason is that liveness does a very good job in preventing “lifetime use-after-free”, but if we allow dead lifetimes that guarantee is gone - so we need a special category for “non-freed but inactive” lifetimes, and we probably want to bound their lives with another lifetime.

nikomatsakis · March 2, 2017, 6:48pm

Can you elaborate more on how you envision this working?

arielb1 · March 2, 2017, 10:20pm

I mean, at least I would want (note: this only requires delayed initialization + ref-as-a-type):

impl<K, V> HashMap<K, V> {
    fn get_v<'r, #[may_dangle] 'w>(Self: Ref<'r, 'w, Self>, key: &K) -> Ref<'r, 'w, V>;
}

And @comex could write his function in the same way.

eddyb · March 2, 2017, 10:47pm

Did you mean to write V in the output type instead of Self?

comex · March 3, 2017, 6:51am

Yeah, my playground link was something like that.

Actually, here's a much simpler example.

struct Something { mutex: Mutex<String> }
impl Something {
    fn one(&mut self) -> &str {
        self.mutex.get_mut().unwrap()
    }
    fn two(&self) {
        *self.mutex.lock().unwrap() = "Surprise!".to_owned();
    }
}

No unsafe code, fixed or invariant lifetimes, or structure returns, just (&mut self) -> &T, but calling two with the borrow from one still active would be Bad.

Topic		Replies	Views
Accepting nested method calls with an `&mut self` receiver language design	43	14073	March 25, 2019
Trace mutable borrows across function call boundaries language design	6	1150	March 25, 2019
Allow method calls without borrowing entire struct language design	17	3066	March 25, 2019
Make borrow safe earlier language design	3	949	March 25, 2019
Smarter borrow checking for function calls compiler	2	1049	March 25, 2019

[blog post] Nested method calls via two-phase borrowing

Related topics