[lang-team-minutes] Elision 2.0

One thing I think was a slam dunk for fns, that I haven’t seen for structs is “Permit referencing the name of a parameter instead of declaring a lifetime”.

// Should be legal
struct Foo<'a, 'data, 'z> {
	a: &i32,
	data: &i32,
	z: &str,
}

Thoughts?

1 Like

AFAIK named lifetimes are only really necessary (i.e. can’t be easily inferred by the compiler) only when multiple fields in a struct share the same one lifetime.

I am strongly in favor of adding the ability to "declare" lifetimes. However, I think that the lifetimeof keyword is not what you want. In particular, that's not the model that the compiler has internally.

For example here:

fn foo() {
    let x: i32 = 1;       // --- scope of x ------+
    let y: &'a i32 = &x;  // --- lifetime 'a --+  |
} //                   <-----------------------+  | 
  //               <------------------------------+

To be honest, I'm not crazy about the term lifetime. It's kind of hard to change it now, but I think there is a dangerous confusion that occurs. The "lifetime of x", or at least what I think you meant by that, corresponds to the span of code that begins when x is allocated (i.e., the let where it is pushed on the stack) and ends where x is popped (i.e., the exit from the enclosing block). I usually try to call that the scope of x.

In contrast, the lifetime of a reference corresponds to the region or span of the code where the reference is used. This is almost always shorter than the lifetime of the variable itself, as you can kind of see in the diagram above -- in this case, the lifetime 'a would end infinitesimally before the scope of x.

What I would prefer is if we can label blocks and expressions and then use those names as lifetimes in the code. This would permit us to explain the mechanisms of the type system with more clarity:

fn main() {
    let x: i32 = 1;
    'a: {
        // explicitly give this reference the lifetime `'a`,
        // corresponding to the labeled. The lifetime must
        // be the label of some enclosing block or loop.
        let y: &'a i32 = &'a x;
    }
}

This isn't perfect, since internally the compiler has a whole range of lifetimes that are not blocks. Basically every statement (e.g., let y = &x), expression and subexpression has its own lifetime, corresponding to the duration of time in which they execute. Once we move to NLL, then we'll have an even larger set of lifetimes, corresponding to arbitrary sets of paths through the control-flow graph. But at least being able to label blocks might help to communicate that the lifetime of a reference is not, in fact, tied to a variable, but rather it's just a region of the code (which must be some subregion of the scope of the owner).

5 Likes

Would it be consistent to allow a lifetime to be declared “on the item”? The following would be sweet and simple if it could work

trait MyTrait {

    // fn foo<'a,'b>(&'a self, data: &'b[i32]) -> &'b[i32] { } 
    fn foo(&self, data: &[i32]) -> &'data [i32] { }                  

    // fn bar<'a,'b>(&'a self, data: &'b[i32]) -> &'a[i32] { } 
    fn bar(&self, data: &[i32]) -> &'self [i32] { } 

    //  fn baz<'a,'b,´c:'a+'b>(&'a self, data: &'b[i32]) -> &'c[i32] { } 
    fn baz(&self, data: &[i32]) -> &'data+'self [i32] { }
}
1 Like

How about the following syntax for structs?

struct Foo<'self> {
   a: &i32
}

This syntax isn’t much shorter than the present syntax, but hopefully it is more googleable and easier to grasp for new users. In general I think that keywords are more googleable than sigils.

An assortment of embarrassingly random thoughts in no particular order:

  • In the const generics thread we were floating the possibility of removing ticks, and using lifetime a as the syntax to introduce lifetimes, and just a to refer to them (much like we’d have const X: Foo resp. just X for constants). This idea is in tension with adding more ticks, in <'>.

  • Recall that the spark for adding lifetime elision in the first place was a discussion some C++ programmers (might’ve been Chrome folk?) were having on a different forum, about Rust, which got shared to one of the Rust forums, where they were essentially WTFing over Rust’s noisy and verbose explicit lifetime syntax (fn foo<'a>(a: &'a Foo) -> &'a Bar was the only option at the time). Seeing non-Rustaceans having that reaction convinced us that it was in fact an actual problem and that we should do something about it. (I actually don’t remember where this pre-RFC discussion took place, and couldn’t find it just now, does anyone else?)

    Anyway, the point I’m getting around to is that for non-Rustaceans, Foo<'> also has the risk of coming across as line noise and leading to “what is this I can’t even”-style reactions. You need to already know a lot about Rust to even be able to guess at the meaning of an unmatched apostrophe standing on its own, there.

  • I feel like our thought process here is roughly:

    1. Lifetime elision not being apparent from the function signature for user-defined types is a problem.
    2. Should we fix it? Yes. Yes we should.
    3. Okay, so what syntax should we use?
    4. *surveys available options*
    5. It seems like all of these are pretty bad, but we said we were going to solve the problem, so I guess we have to choose one of them?

    Point being that we should at least consider the possibility that the cure could be worse than the disease. If all of the options for fixing the problem would result in ghastly syntax that would end up repelling people from the language on sight, it might be less bad to resign ourselves to continue living with the problem, as we’ve been doing so far.

    (Even better, of course, would be to find a non-ghastly syntax. Provided that we can.)

  • If we allow punning the name of a variable for the name of a lifetime associated with it, as also proposed, then the elided fn foo(x: &Foo) -> Bar<'> does not have that much of an advantage over the non-elided fn foo(x: &Foo) -> Bar<'x>, any more. Relative to the “current baseline”, the elided syntax has gotten more verbose, and the non-elided syntax has gotten less so. Could we live with deprecating elision for user-defined types outright, without a direct replacement syntax, and just have people use the name-punning syntax instead?

  • (Incidentally, if I remember correctly, a couple of years ago the ability to use function parameter names as lifetime names was proposed kind of frequently, and it was always shot down with the reasoning that while it would indeed be convenient, it’s founded on a misunderstanding of how lifetimes work and would cause people to form misleading mental models. Fast forward to the present, and the lang team itself is now proposing the change. Does anyone involved happen to remember when/why/how your thinking changed?)

  • I feel like a nice thing about the lifetime elision syntax when actual references are involved is how you can just visually match up the & symbols to see what is borrowing from what: fn foo(x: &Foo, y: Blah, z: Zzz) -> HashMap<int, &Bar>. I prefer one of the syntaxes involving an & symbol for this reason, most likely Foo<&>, if the plain postfix Foo& is a non-starter due to the potential for confusion w.r.t. C++.

  • If we ever add “full” HKTs, allowing us to abstract over type constructors, and to refer to & itself as a type constructor (of kind type<lifetime, type> using my preferred kind syntax, or Lifetime -> * -> * in the Haskell notation most people are familiar with), then Foo<&> could be valid syntax, with a meaning that conflicts with the aforementioned one. This could conceivably be worked around in a number of ways, like introducing a type alias type Ref = &; and writing Foo<Ref>, or requiring & to be written as <&> like we currently do when explicitly referencing associated items (<&T>::Foo), or potentially others. (I don’t think this is a significant issue, it’s just a random thought.)

  • What if instead of decorating the types, we were to decorate the function arrow itself to indicate “borrowing is taking place across this function call”? Like, fn foo(x: &T) &-> Foo, or ->&, or something along those lines. I’m not remotely sure that I like this idea (it’s also a bit cryptic and syntaxy), just putting it out there.

  • We also have a ref keyword, which we might incidentally be phasing out with the "match ergonomics" improvements. Maybe we could use that somehow?

3 Likes

Perhaps something like:

fn foo(x: &Foo) -> Bar ref x

Less noisy? Less intimidating for non-Rustaceans?

I’ve been running into this a little, and I’ve been thinking about let ... in syntax again:

...map(let foo = foo.clone() in move |x| foo.thing(x))

It’s equivalent to

...map({ let foo = foo.clone(); move |x| foo.thing(x) })

but when the closure spans many lines and is more complex, it’s nice to not have to close the {}.

It also more general than adding new syntax to closures.

That said, last time I floated the idea it was generally rejected as not adding enough new value.

1 Like

This is an interesting use case for let in! I’ve always been sort of surprised Rust doesn’t have this feature since:

  • All the keywords are already reserved.
  • I think its much more natural to scope lifetimes with let y = &x in { ... } than { let y = &x; ... }.
  • It doesn’t seem hard to me to figure out what it means when you see it (but I could be wrong I guess).

Of course this use case doesn’t scale super well to capturing multiple refs since you have to use tuples:

.map(let (foo, bar) = (foo.clone(), bar.clone()) in move |x| foo.thing(x, bar))
1 Like

Since the general syntax is let <binding> in <expr>, and it is itself an expr, I was assuming you could cascade:

let foo = foo.clone() in
let bar = bar.clone() in
move |x| ...

Yes, that would work, though I’m not sure its better than the tuple form.

Either falls directly out of the syntax, so it just becomes a matter of using the form that suits the situation. The tuple form works well if you want to simultaneously alias multiple things from the outer scope:

let (foo, bar) = (bar.thingy(foo), foo.frob(bar)) in ...

There is another use case that I think is also important, which is when a user-defined type appears in a parameter list. Basically, I think it should be visually evident when a type has references, regardless of where it appears:

// This version makes it clear that `x` and `y` 
// contain references, without having to consult
// the struct definition.
fn foo(x: Foo<'>, y: Foo<'>) { ... }

// This version, accepted today, does not.
fn foo(x: Foo, y: Foo) { ... }

I find that I rely frequently on the ability to visually scan for references and things in order to estimate whether refactorings will work, etc. For me this is an extended version of the principle that it's good to have a (lightweight) visual indicator of when borrowing / ownership transfer are at play.

I was debating about Foo<ref> as well earlier, though I don't think I ever floated it on the thread. It seems not entirely implausible. I was nervous because we are backing away from it in match, although I think that the too things aren't necessarily in conflict.

Crazy thought: what if instead of writing lifetime (a term that I do not like anymore, for reasons I've already enumerated in this thread), we used ref to introduce named lifetimes?

struct Foo<ref a> { // new version of 'a
    x: &a i32, // no need to write ' here
}

fn use_foo<ref a>(f: Foo<a>)

fn get_foo<ref a>(&a self) -> Foo<a>

Then we would be saying that this is the shorthand:

struct Foo<ref> {
    x: &i32
}

fn use_foo(f: Foo<ref>)

fn get_foo(&self) -> Foo<ref>

If you really wanted to go crazy, you'd replace the & and &mut type constructors with ref :), so that we write fn get_foo(ref self) -> Foo<ref>. But this would then motivate one to introduce ref a.b.c as an expression. At that point, you have a bit of a problem because ref P patterns are ... well ... already taken (this tension being what motivated us to introduce ref binding mode in the first place).

4 Likes

That's... crazy indeed. I think I like it quite a bit! It certainly looks a lot cleaner than the current lifetime syntax, not to mention the other proposed new syntaxes for elision.

(And ref is also much shorter than lifetime would've been, which is a definite plus.)

We'd still need something to actually call them informally though? Presumably we'd be phasing out "lifetime" (or no?), and I assume we wouldn't actually call them "refs" or "references" to avoid confusion with &. Did you have anything in mind?

I thought of this too :slight_smile: I'm not sure if I like it, but it seems plausible. & is both more convenient and has a lot of cultural precedent in systemsy languages across the board (C and C++, but also Go, Swift...), quite unlike '. On the other hand, ref might be more self-describing and less intimidating for people who aren't coming from other systems languages. (So I kind of like them both equally, I guess; which suggests staying with the status quo.)

If we also phase out ref in patterns as part of the "match ergonomics" effort, which I think would be a good thing to do on its own merits, then I don't think this would be an actual problem. It's not backwards-incompatible (so we don't even need epochs), because the old meaning only exists in patterns and the new meaning only exists outside of them, so they can co-exist if need be; and it's also not a significant issue in terms of explanation or mental models, because new and idiomatic code as well as documentation etc. would only have the new meaning in it, not the old one.

(ref in patterns would just be kept around to keep old code compiling, at some point maybe with a deprecation warning, or eventually phased out completely with epochs if we want to, etc.)

Using ref for this seems confusing to me. Even if match loses the possibility of specifying ref, other patterns will still have it. My first intuition when seeing get_foo<ref a> is “a const parameter turned into a reference”.

I’d also suggest that while some might find ' ugly, the lifetime-parameters concept is quite novel, so having it stand out makes sense to me both from an educational and readability standpoint, and I’d be quite sad to lose that. I certainly feel that hiding lifetimes too much will make it harder to use them. I’d also be worried about things like quickly seeing the lifetimes involved in complex error messages.

One further note: I can’t prove but I’m quite certain that outdated documentation will be an issue. My reasons are: It was already an issue pre 1.0, and the Rust community is rich in blog articles, talk videos and slides that aren’t going to get updated.

1 Like

It seems plausible that we can completely phase out ref and ref mut in patterns, though at the sacrifice of some precision (and it would require the move keyword to be usable on patterns).

In particular, I think this example has no direct equivalent:

struct Foo { x: Vec<i32>, y: Vec<i32> }

fn test(foo: &mut Foo) {
    match *foo {
        Foo { ref mut x, ref y } => ...
    }
}

if you do match foo { Foo { x, y } => .. }, you would get two mutable references. You can also do match &foo { Foo { x, y } => ... } to get two shared references. But I don't think you can get one mut and one shared.

I wasn't worried about this, because I figured we'd still have ref patterns for the most obscure cases, but if we had plans to remove it (and maybe repurpose it), might be something to keep in mind.

1 Like

Yes, certainly, the further we stray from existing syntax, the bigger this danger is.

I do have another selfish reason of dislike for ref: Repurposing it like this would probably mean match won’t keep a way of explicitly specifying bindings.

Yes. I've struggled with this. Certainly if the notation is ref r, people will call it a "reference", which seems at least potentially actively confusing. I guess you could imagine it being short for "reference lifetime".

Speaking more generally, I'm not sure how best to replace the word "lifetime". I always imagined it as the lifetime of the reference itself, but of course many people confuse it with the lifetime of referent (the resource being referenced), and this seems obvious in retrospect. Still, I've kind of given up for now on finding a better alternative, and when presenting or teaching I just make a point to try and define it clearly as a "span in the code", usually with some highlighting and examples, and sometimes explicitly contrasting with the lifetime of the underlying resource.

Some other words I've considered, just for fun:

  • region, as in "region of the code". This is sort of the default in academic literature and -- ironically -- was what I actively discouraged early on, since I felt it was too easily confused with "region of memory".
  • scope -- people often object to this, since it is quite overloaded.
  • span -- it's a compiler internal term with little inherent meaning, but I think talking about a "span of the code" is .. maybe clear-ish? Unsure. At least it doesn't mean a lot to people as far as I know. Could also be "span of time", which is sort of nice.
    • this would really make the rustc terminology confusing :slight_smile:, since we use span as "portion of the code to use for reporting errors", but it's actually a reasonable match in its own way
  • duration -- I like time-related words, and "duration of the loan" seems obvious, but the word somehow lacks pizazz.
  • interval -- also time-related. (full disclosure: I used this for a similar concept in my PhD research)

My assumption was that it would be gone from patterns entirely. The "reference inversion" feature discussed in the match ergonomics thread would apply to all patterns I believe.

I understand where you're coming from, but also consider the possibility that the strange and unfamiliar syntax might make it look scarier than it has to be.

(Also subjectively yes, the ticks are a huge eyesore to me. Almost unbearably so when they aren't followed by an alphanumeric identifier, like in the case of <'>, or the <'_> syntax that's been proposed for introducing an anonymous lifetime.)

Yes, this would definitely be a significant issue with any kind of syntactic changes we make. Not quite as big as pre- vs. post-1.0, though: the old syntax would presumably still work, perhaps with a deprecation warning, and even if we end up introducing epochs and it's an actual error, there would be a clear and comprehensible error message.