[Pre-RFC] Explicit region lifetimes for tutorials


#1

Summary

fn main() {
    'a: { // block label declaration
        let x = 4;
        let r : &'a usize = &'a x; // explicit lifetime 'a
    }
}

Motivation

Why are we doing this?

Lifetimes are complicated topic for novices to grasp. They are connected both to memory safety and to generics. I saw some tutorial (maybe past version of the Rust Book) using similar code, with “This is not an actual Rust code” disclaimer.

For usual generics T can be assigned some concrete type, and mysterious Vec reified into easier Vec. Here the switch is illustrated:

fn main() {
    let x = Vec::<u32>::new();
    fn qqq(s: Vec<u32>) -> Vec<u32> { s }
    let r2 = qqq(x);
}

to fn qqq<T>(s: Vec<T>) -> Vec<T> { s }

But for lifetimes there are no concrete types visible in code. You can’t provide simple case

fn main() {
    'a: {
        let x = 5;
        let r : &'a u32 = &'a x; 
        
        fn qqq(s: &'a u32) -> &'a u32 { s }
        
        let r2 = qqq(r);
    }
}

before explaining complicated one:

fn main() {
    'a: {
        let x = 5;
        let r : &'a u32 = &'a x; 
        
        fn qqq<'y>(s: &'y u32) -> &'y u32 { s }
        
        let r2 = qqq(r);
    }
}

which reduces by elision and removal of extras to usual

fn main() {
    {
        let x = 5;
        let r : &u32 = &x; 
        
        fn qqq(s: &u32) -> &u32 { s }
        
        let r2 = qqq(r);
    }
}

What use cases does it support?

Users manually annotating blocks of code with labels and assigning/asserting their reference lifetimes. Complier just ensures that user’s guess are correct (or maybe narrows down if user specified narrower than automatically calculated lifetime).

What is the expected outcome?

Documentation and especially tutorials use explicit lifetimes as a easier illustrative prior to explaining generic lifetimes. Maybe documentation using fully specified, lifetimed reference from start before switching to simplified inferred ones.

The & operator said to have kind ' -> T -> T and with this change it can also be used with both a lifetime and a type, without being forced to rely on inference.

Detailed design

TODO

Drawbacks

  • Possible confusion with generic lifetime parameters which are also have this style '-letter
  • Possible confusion with labeled break
  • As usual, complication of language, grammar and compiler

Alternatives

  • Status quo

Unresolved questions

  • Detailed design.
  • Shall reference-taking operator also support lifetimes or only reference type?
  • Will it be more or less useful after non-lexical borrows?
  • let r2 = qqq::<'a>(r); ?
  • Can compiler error message be improved by this?
  • 'a: { or just 'a {?
  • Is having loop label 'a: loop { ... } also mean lifetime region a good idea or not?
  • Recommended style. Maybe it should be verbose 'SomethingLonger: { ... &'SomethingLonger ... }, especially as the feature is aimed at documentation.
  • Attaching explicit lifetimes to lets instead of blocks 'a: let x = 5;

#2

:+1:

Explicit demarcation of lifetimes could also help make lifetime errors more comprehensible.


#3

Here’s an example of how messy it is to try and do this currently.

Edit: This would also open the door for a pretty-printing mode that writes out explicit lifetimes everywhere. Wire it up to playpen, and it might help people sort out lifetime issues by being able to see what the compiler’s inferring.


#4

I’m very much in favour of the principle here. I think it has great benefit for debugging lifetime errors as well as for tutorials.

There is some interaction with labelled blocks that needs to be addressed, although I don’t expect anything too tricky.

It would be nice (along with this) to allow explicit lifetimes in method calls etc. (in the same way we allow providing type params), this probably requires having '_ in order to be expressive enough.

It’s also worth thinking about exactly what it means to give an explicit lifetime - is it a bound on the underlying lifetimes, or should it be taken to mean exactly that lifetime (there has to be some subsumption somewhere, I think)


#5

My instinct is to make it exact. This makes it useful for demonstration and learning purposes: “with this lifetime it works, but if we substitute this one, it doesn’t” as opposed to “but I can use either and it just works to what gives?”


#6

In general this seems like a good idea, but the examples in the RFC are wrong, aren’t they? The 'a references are references to values declared within the 'a scope, so they can’t be references of the lifetime 'a.

I also don’t think these lifetimes should work any differently from other lifetimes; that is, they should have the same subtyping relation as an implicit lifetime would. That complexity is a real part of the lifetime system, we shouldn’t have these lifetimes work unusually just because the way it actually works can be confusing - rather the opposite, I would think.


#7

I also don’t think the syntactic change proposed around the reference operator is necessary or beneficial. We already have an explicitly named lifetime - 'static, and you declare a static reference like this const FOO: &'static i32 = &0;. The lifetime should only be provided in the type ascription, not as a part of the expression.


#8

The 'a references are references to values declared within the 'a scope, so they can’t be references of the lifetime 'a.

Don’t understand. Isn’t “reference of lifetime 'a” by definition a “reference to something in scope at least 'a”?


#9

Yes, but x is not in scope at least 'a. It should look like this:

    let x = 4;
    'a: {
        let r : &'a usize = &x;
    }

As it stands in the RFC, the lifetime of x is the longest lifetime that is less than 'a, but is still less than 'a.

This is also an example of where let .. in would be useful, because it makes the lifetime a variable very explicit:

    let x = 4 in {
        let r = &x;
    }

And presumably:

    'a: let x = 4 in {
        let r: &'a i32 = &x;
    }

#10

I’m not in favour of this RFC (also, I don’t really like the diagrams with named blocks), because it mixes two different meanings of the word “lifetime”:

  1. The lifetime of a variable. (This meaning is applicable to visualising as a named block)
  2. The “lifetime” as a parameter of a reference. I think that the better term for that meaning would be borrow context (this term also don’t have an ambiguity whether “lifetime” in “reference’s lifetime” means the lifetime of a variable or lifetime of the reference (see comment above by @withoutboats) (imho correct answer is neither)). @vi0 in RFC’s drawbacks lists “Possible confusion with generic lifetime parameters which are also have this style '-letter”, but i fail to understand how a parameter of a reference differs from a generic lifetime. And if it would reall differ, mixing notation for them would be really confusing.

I want to say that mixing those two meaning can help developing incorrect mental model – which indeed happened in my case. I’ve used Rust with that incorrect mental model for more than a year, and after I understood how it all works, I had to unlearn parts of the previous knowlege.

The (wrong) mental model I’ve had was based on “the lifetime parameters are lifetimes of variables”, and this conclusion can be easily made by looking at such diagrams as proposed in the RFC. This model would be correct for Cyclone-like region-based lifetimes (which inspired Rust), but Rust lifetimes are a totally different beast (that’s why I prefer to call them borrow contexts).

I’d like to show a few really simple examples on how the proposed notation can be misleading:

'a {
    let (x, mut y) = (1, 2);
    let xref: &'a i32 = &x;
    y = 3;
    println!("{}", x);
}

(Let’s not think about whether 'a block should cover only the reference’s lifetime or not). The example above totally fails to explain why the 'a parameter on a xref doesn’t prevent to modify y, but prevents to modify x. But you could hand-wave it away by saying "well, xref is a reference to an integer, and Rust borrows it from x, not y". But consider this:

fn foo<'a>(_: &'a mut i32, _: &'a i32) -> &'a ()

let mut x = 1;
let y = 2;
let z = 3;
let lock: '? () = foo(&mut x, &y);
// here

After the lock is stored in a variable, we can’t do anything with x, can’t modify y and can do anything with z, despite the lock being a reference to just ()! If there’s something you can put in place of '? it would be a full sentece: "x is borrowed mutably and y immutably, and their lifetimes end here and there". This is what I call borrow context. (Note that this description also fits the parameter 'a in this particular invocation of foo).

As you see, using named blocks as parameters for references works only for simple examples, and the only connection between the two meanings of “lifetime” is that every variable in reference’s borrow context has to be alive when the reference is.

Maybe these kind of diagrams actually help beginners understand borrowing (although I think that saying “x stops to exists ⟨here⟩” would be enough), but adding them to the language would be harmful. Moreover, I’d say that anywere such a diagram appears, it should be noted that it’s not only not a legal Rust code, but also only a mental shortcut. While Rust prevents misuing lifetimes even with a wrong mental model, deep understanding of lifetimes/borrow contexts is really important for writing safe abstractions for unsafe code.


#11

I agree that this is a subtle point, and in hindsight I regret my use of the term “lifetime”, since it is so readily confusable. I wish we had just stuck with “region”. My intention was to avoid confusion about “memory regions” (ranges of memory addresses) vs spans of the code (lifetimes), but I neglected to consider that there are two quite reasonable uses of the word lifetime:

  • (maximal) lifetime of a variable or memory location: this is the span of the code after which the value will be freed (I like to call this the “scope” of a variable).
  • lifetime of the reference: this is span of the code where the reference is used (what you called the “borrow context”).

Note that the lifetime of the reference must be shorter than the lifetime of the value it refers to – but people often confuse the two. Put another way, in a type like &'a T, the 'a refers to the lifetime of the reference, not the underlying value.

That said, I don’t see how introducing this syntax has to encourage this confusion. If anything, I think it could help eliminate the confusion. Whenever I teach lifetimes, for example, I try to use examples like this (using the proposed notation):

{
    let mut a = 1;
    a += 1;
    'b: {
        let p: &'b i32 = &'b a;
        a += 1; // ERROR -- `p` is in scope
    }
    a += 1; // OK -- `p` is out of scope
}

Hopefully this makes clear that 'b is not the scope of 'a but rather an independent notion corresponding to “the time spent executing the block”.

I am :thumbsup: on this basic proposal, my one concern is that it is hard to label the full range of lifetimes today, and when we move to NLL, it will be impossible. But I think it’s fine to just support labeling blocks – the main purpose here is to be able to use concrete syntax to teach lifetimes, from my POV, and maybe debug the occasional error, and for that purpose labeling blocks seems “good enough”.

(Note though that you can see this imprecision even in my code above: the lifetime of p without the annotation would start just after the let, but with it “starts” at the entry to the block. However, that really doesn’t make a material difference in this case (it can in some obscure cases, but in those cases you can add more blocks).)


#12

Maybe just make the reference with block-based lifetime to be artificially limited to that block?

If the specified block’s lifetime is smaller than inferred, limit it. If bigger - compilation failure.