The current region inference scheme has some known limitations. Many of these were uncovered and discussed as part of the work on RFC 1214. There have also been a number of recent regressions that are tangentially, at least, related to the way region inference currently works. At its worst, the shortcomings in region inference can lead to nondeterministic errors, where doing small things like adding redundant parentheses can cause or solve type inference failures. Yuck. This post covers a smattering of thoughts on the topic.

### Nondeterminism and lifetime parameters

#### Problem

The current region system allows a region to be inferred to one of:

- a scope or expression
- a named lifetime parameter (
`'a`

) `'static`

- a special empty region that has no corresponding syntax

To a first approximation, it works by starting region variables small and growing them in response to constraints. It assumes that there is a “least upper bound” operation that can be applied to any two regions to yield something sensible. When it was written, this was basically true, but in the meantime lifetime parameters have gotten a lot more complex. Now we have things like:

```
where 'b: 'a,
'b: 'c,
'd: 'a,
'd: 'c,
```

now what is the `LUB('a, 'c)`

? Basically, there is no single answer:
both `'b`

and `'d`

are greater than both `'a`

and `'c`

, but neither
`'b: 'd`

nor `'d: 'b`

holds. Right now, we pick something conservative
(`'static`

, in fact), but this is not great, because the order in
which we process constraints can affect the final solution.

For example, imagine three constraints:

`'a <= $0`

`'c <= $0`

`'b <= $0`

If we process these in order, we’ll first grow `$0`

to `'a`

and then
to `'static`

, where it remains. But if we process in the order 1, 3,
2, we get a different result. We first grow to `'a`

and then to `'b`

,
where `$0`

remains. Double yuck!

#### Proposed solution

Allow region variables to be inferred to the intersection of a list of
named lifetime parameters. So the result of `LUB('a, 'c)`

, in the
previous example, would be `{'b & 'd}`

, instead of `'static`

. If you
work this through, you’ll see that the order of processing constraints
no longer matters.

The LUB operation on two sets of named lifetime parameters X and Y can be defined roughly by:

```
concat([forall i, j. UBS(X[i], Y[j]))
```

where `UBS`

yields the set of disjoint upper-bounds (so, e.g.,
`UBS('a, 'c) = ['b, 'd]`

). (This UBS operation is already implemented,
see the method `minimal_upper_bounds`

on `transitive_relation`

.)

### Adopt a pure dataflow solution; impoverished constraint language

#### Problem

The regionck constraint model consists of a set of
constraints of the form `R1 <= R2`

, all of which must hold:

```
C = R <= R
| AND(C, C)
```

But, as discussed in RFC 1214, if you want to handle projections, you really need the ability to have “OR” constraints. Similar problems arise from where-clauses and a few other sources.

When solving this constraint set, the region inference algorithm also classifies nodes as “expanding” or “contracting” based upon what kinds of incoming edges they have. This is sort of a holdover from the days before there was an empty region. It is also problematic because it means we can’t incrementally solve region constraints at all: we have to wait until all constraints are known so that we can do these classifications.

This contraction mode means that if you have a variable with only
upper bounds, like a single constraint `$0 <= X`

, we will infer `$0`

to `X`

. Since this represents a maximal solution, it can be useful for
uncovering missing lower bounds, since often a solution of `X`

is too
large (for example, the missing constraint #29006 was uncovered by
such an error). nut the contraction mode also causes some difficulties
(e.g., changing this behavior happens to fix #29048).

#### Solution

Extend constraint language to include `OR`

:

```
C = R <= R
| AND(C, C)
| OR(C, C)
```

Of course, this has a big impact on how the code works. Basically, the idea would be that we have to exhaustively explore the options. Luckily, we mostly generate a pretty flat tree, so we ought to be able to do a strategy like:

- At any given level, gather up all the
`R <= R`

constraints that are reachable without going through an`OR`

node. - Solve those using data-flow, to find minimal regions. Call this interim solution X.
- Explore the required
`OR`

constraints, starting from X. For each constraint, at least one of the sides must hold. Note that there are a lot of possible combinations here. We certainly want to at minimum continue distingushing “verifies”, where inference is not affected.

Note that at any given level I have implicitly adopted a pure dataflow
model, where all variables start as empty and strictly grow. (We can
also grow the *root* level incrementally, which may allow us to make
some progress on problems like #21974, but there are also more
complications in that general area.)

We will lose the “bug-finding” behavior of the current algorithm. This
is, however, not a reliable way to uncovering bugs, so I’m not sure
that’s a great loss. Instead, I think we should just focus on doing
type-checking on the MIR as a kind of “double check” for inference.
In fact, I’d like to move *all* region checking to the MIR, as I
discuss in the next section.

### More principled treatment of regions in closures, non-lexical lifetimes, etc

This is not a real *solution* yet, more of a general direction. The
current typeck code feels overly complex, and it has a lot of trouble
squaring the type inference we need around closures with the region
hierarchy that closures ought to have. In particular, closures are
distinct functions, and so should have a distinct region hierarchy,
but we have to type check them altogether to get better inference for
local variables and the like.

In my ideal world, we’d address this by having typeck not handle
regions at all, instead substituting some kind of dummy region (let’s
call it `'dummy`

) everywhere (though we will have to track where
region bindings occur as well). We’d then build up the MIR, which will
have all regions just using `'dummy`

initially.

We’d then re-run a full type-check on the MIR. This will be relatively easy because the MIR is very simple. In this later pass, we will also replace regions with inference variables, and construct a proper graph. This should let us treat closures as completely distinct fns, like we want to. It also makes it easy to move to non-lexical lifetimes. (This doesn’t really solve #21974 yet, since we’ll still have to be doing trait selection.)

### All done.

Thanks for listening.