As I mentioned in another thread, I’ve been working on getting the MIR
to handle the full compiler and test suite. Right now it’s stuck on
rustdoc due to the use of slice patterns like [x, ..y]
. These
patterns raise some interesting questions about how to integrate
indices into the borrow checker in a MIR-like setting. I thought I’d
jot down quickly the conflict and my proposed solution, along with
some thoughts on future directions we might go.
The Problem
The problem is that the borrow checker does not, by and large, reason
about indices. But slice patterns only make sense if you can reason
about indices. For example, a pattern like
[ref mut x, ..y, ref mut z]
, when matching against foo
, is roughly
equivalent to:
let len = foo.len();
if len >= 2 {
x = &mut foo[0];
y = &foo[1..len-1];
z = &mut foo[len-1];
...
} else {
// other match arms
}
To know that patterns like this are safe, you must consider the length
len
and reason that foo[0]
is disjoint from foo[len-1]
, and that
both of those are disjoint from foo[1..len-1]
. Adding this sort of
logic to the borrow checker in general is a slippery scope that we
have tried to avoid. Of course, such detailed reasoning shouldn’t be
necessary – in the case of slice patterns, we know by construction
that the indices are disjoint.
Trusted parts of MIR
Slice patterns thus join a small set of other things (bounds checks, matches) that are obviously safe in the AST but not obviously safe when desugared to the MIR level. In those other cases, I’ve taken the simplifying approach of generating “unsafe” MIR. For example, MIR generally assumes that all array indices are in bounds, and trusts the front-end to produce bounds checking code. I’d like to do the same for slice patterns, as I describe below. In all of these cases – slice patterns included – it should be possible to write an extended static check for MIR that checks that the unsafe constructors are in fact used safely; this check would require a data-flow analysis, or else a dependent type system, and hence would be somewhat more complex than a typical type checker. I also talk a bit about that below.
What do we do today?
In fact the same problem occurs today. The ExprUseVisitor
tries to
translate matches into individual borrows and thus it must figure out
how to translate a match pattern. To handle this, the
mem_categorization
code marks borrows that arise from a slice
pattern in a special way, which is recognized by the borrow checker
and treated differently. In addition, the borrow checker today is ok
with the idea of a single node in the AST originating several borrows,
and it assumes that they are all disjoint. But in MIR this single
pattern node has been busted up into smaller pieces, so we need to
recover that assumption somehow.
How to fix it most easily?
Currently, the MIR contains “index lvalues” LV1[LV2]
, where the idea
was that the borrow checker will make no attempt to reason about what
the index LV2
might be (i.e., statically, we might simplify x[y]
to just x[_]
). The idea is to add a second form of path that is
generated by slice patterns. It would look like LV1[LV2; ALIAS]
where ALIAS
is a pair of integers (pattern, item)
. The idea is
that two paths LV[_; (P, I1)]
where LV[_; (P, I2)]
where I1 != I2
are statically known to be disjoint (this is a trusted assertion).
We also need to add a new rvalue for slicing LV[LV1..LV2; ALIAS]
(this is only used for slice patterns; slice expressions are just
overloaded method calls).
Let me show you a snippet of code to give you a rough idea of how this would work:
let x = &foo[0]; // => generates &foo[tmp1]
let y = &foo[2]; // => generates &foo[tmp2]
match foo {
[ref a, ref b, ..c, ref d, ref mut e] => {
// => generates:
// len = foo.len()
// a = &foo[0; (0, 0)]
// b = &foo[1; (0, 1)]
// c = &foo[2..len-2; (0, 2)]
// d = &foo[len-2; (0, 3)]
// e = &foo[len-1; (0, 4)]
}
}
Now when the code is expanded, the various paths for a
, b
, etc
will be considered disjoint with one another.
Note that while the uses of these alias-hinted paths is confined to the evaluation of the match pattern (after that point, bindings have been made), those uses can still be spread across the CFG, since the match evaluation can span basic blocks. (However, they always come in one DAG, there is never an internal loop.) For example consider this slice pattern:
match foo {
[(ref a, 22), b] => { /* ARM 0 */ }
_ => { /* ARM 1 */ }
}
This will desugar to something like:
BB0:
tmp0 = foo[0; (0,0)].1 == 22;
if tmp0 goto BB1 else BB2
BB1:
a = &foo[0; (0,0)].0;
b = foo[1; (0,1)];
/* ARM 0 */
BB2:
/* ARM 1 */
Here you can see that the same path foo[0; (0,0)]
appears twice,
once in BB0 and once in BB1. Those uses will naturally be considered
overlapping (but the fact that we then access the 1
and 0
fields
of the tuple renders them again disjoint).
Loops and correctness
I was somewhat concerned about what would happen in the face of borrows that persist over loops, but I think I have convinced myself that everything works out fine. Here is an example that I will use to explain:
fn f(foo: [i32; 2]) {
let mut p;
loop {
match foo {
[ref mut a, ref mut b] => {
p = a;
...
}
}
}
}
This code is (rightly) rejected today:
<anon>:7:14: 7:23 error: cannot borrow `foo[..]` as mutable more than once at a time
<anon>:7 [ref mut a, ref mut b] => {
^~~~~~~~~
<anon>:7:14: 7:23 note: previous borrow of `foo[..]` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `foo[..]` until the borrow ends
<anon>:7 [ref mut a, ref mut b] => {
^~~~~~~~~
The problem is that p
from the previous iteration will overlap with
the borrow a
. Under the newer model, this function translates to MIR
something like:
BB0:
a = &mut foo[0; (0,0)]; // (a)
b = &mut foo[1; (0,1)]; // (b)
p = a;
...
goto BB0;
From the POV of borrowck, line (a)
creates a loan. Due to the
assignment to p
, this loan will still be in scope when we branch
back around, which means we will still get an error, as expected.
Nonetheless, it seems creepy that a borrow from a previous iteration can persist until the next one – after all, the disjointedness reasoning applies only to a single match statement. If that same match statement executes twice, is there any guarantee that the paths are still disjoint/overlapping in the same way?
It turns out that I think there is such a guarantee, I think. Not, of course, in general, but specifically in the case where a loan flows through from a previous iteration. The reason is basically that the only case there would be a problem is if the slice has changed to a new pointer or the length of the slice has changed. Both of those would require mutation to the variable storing the slice, and that can’t happen if there is an outstanding loan.
Type checking
If we wanted to verify these alias sets, it is certainly doable. The basic idea is that the compiler must reason about the length of the vector and ensure that, whenever we create a value with a given alias set, it is disjoint from later values in that same alias set. However, this is somewhat non-trivial, particularly given loops; we would presumably want to infer such a region and insist that it has a dominator and post-dominator that are confined to a single loop nest.
Anyway, this is all doable, but it leads one to wonder if we can instead modify MIR to be more of a dependent type system that can express the invariants that we actually want. There is some progress and clues here (for example, the work that Ralf is doing on “lambda Rust”), but it’s something I’d like to keep thinking and maybe evolve towards. I imagine that immutable, SSA-like values – as opposed to the mutable stack slots I proposed – might be a useful component. They might be a useful addition, or of course we could go whole hog and migrate to an LLVM-like notation of allocas and values. This has its appeal, but it might complicate the borrowck (and mapping errors to a user’s model) if we traverse too far from the paths present in the language today, so I think we’ll have to feel our way.