[lang-team-minutes] Elision 2.0

nikomatsakis · May 5, 2017, 3:30pm

Yesterday we had our “pro-active” lang-team meeting. The topic of discussion was “Elision 2.0”, which is our “code name” for a set of changes with two overarching goals:

make the “easy things easier” in the lifetime system
- allow elision in more places, such as type declarations
- a particular goal is making it easier to work with structs with a single lifetime parameter
help develop stronger intuitions and visual signals about when borrowing is happening
- as part of this, correct for some surprising or overzealous cases in elision

There is a draft RFC (more of an outline, at this point) that describes some early ideas in this direction. As part of this discussion, I’m very interested in (a) tweaking some of the details and (b) finding out if there are people who’d like to help work on the RFC! I’ve intentionally made it into a separate repo so that we can have multiple authors.

With that in mind, let me review the various pieces of the plan. I think that the consensus from the meeting was that we were all fairly comfortable with the “major ideas” I’m about to describe, but we are still not sure about some of the specific syntax to use (bikeshed time!). With that caveat aside, let’s dig into some of those ideas:

Allow lifetimes to be elided in the body of a type with just one lifetime parameter.

This is fairly straightforward. If you have a struct with just one lifetime parameter, elided lifetimes would be allowed just as in fn arguments, and we will supply that single lifetime parameter:

struct Foo<'a> {
    t: &i32 // assumed to be `&'a i32`
}

Allow structs with a single lifetime parameter to use an “anonymous” syntax.

There is some debate about what this syntax should be. I’ll give the two contenders here, and discuss the pros/cons below.

// Contender 1: The single tick.
struct Foo<'> {
    t: &i32
}

// Contender 2: Trailing ampersand.
struct Foo& {
    t: &i32
}

Prefer for this same “anonymous” syntax to be used in references to a type.

Right now, for a struct type with lifetime parameters like Foo, we have no way to “signal” that Foo contains a lifetime parameter without using an explicit name. This can lead to a lot of confusion. For example, in a signature like this, there is no obvious way to know that the receiver will remain borrowed as long as the return value is in use. The only way to tell is to consult the definition of Foo.

impl Bar {
    fn foo(&self) -> Foo { ... }
}

Similarly, in this case, it is not obvious that Foo has references in it which might prevent us from (e.g.) sending Foo to another thread. Unless we know the type definition of Foo, it appears visually that Foo is “fully owned data”, like a Vec<i32> would be:

impl Bar {
    fn use_foo(&self, f: Foo) { ... }
}

In contrast, in the explicit forms, the presence of borrows are visually obvious, but the syntax is wordy and clunky:

impl Bar {
    fn foo<'a>(&'a self) -> Foo<'a> { ... }
    fn use_foo<'a, 'b>(&'a self, Foo<'b>) { ... }
}

The proposed solution is to allow that same anonymous syntax that we use to declare Foo to also reference Foo without naming the lifetime; this would be filled in with the same value that would be used for an elided lifetime. We would then deprecate the current elision rules when being applied to a struct with lifetime parameters unless the new syntax is used. (When referencing a type, the syntax can in fact be used to elide any number of lifetimes; so if you have struct Foo2<'a, 'b>, you can still write Foo2& or Foo2<'>, in which case it is eliding both lifetime parameters.)

Here then are those same two examples, using the two contender syntaxes:

impl Bar {
    fn foo(&self) -> Foo<'> { ... }
    fn use_foo(&self, Foo<'>) { ... }

    fn foo(&self) -> Foo& { ... }
    fn use_foo(&self, Foo&) { ... }
}

There are some other cases of elision that I would like to deprecate as well. Many of these exist simply because of how the current implementation works; I don’t believe they were intended by the original RFC, necessarily. This is my current list (I am not sure of how much the lang team agrees to each individual item, and I may have forgotten some):

impl Bar {
    // Elided lifetimes that expand to a named lifetime.
    fn foo<'a>(&'a self) -> Foo& { }    // currently accepted
    fn foo<'a>(&'a self) -> Foo<'a> { } // preferred
}

struct Foo2<'a, 'b> { }

impl Bar {
    // Mixed elided and not elided.
    fn foo<'a>(&'a self) -> Foo2<'a> { } // currently accepted
    fn foo<'a>(&'a self) -> Foo2<'a, 'a> { } // preferred option 1
    fn foo(&self) -> Foo2& { } // preferred option 2
}

Permit referencing the name of a parameter instead of declaring a lifetime.

In all of the cases so far, we’ve been able to elide the lifetime name completely. However, there are cases where you want to use an explicit name – for example, if you don’t wish to use the default. In those cases, Rust currently requires that you start giving names to lifetime parameters. However, this has some downsides:

it is often easier and more intuitive think of which parameter the reference is borrowed from; the named lifetimes in these cases are just used to “link” the parameter and the return value.
it’s just ergonomically annoying to have to go back and add the <'a> to the function signature. Often, you only realize the need for it when writing the return type, in which case you have to stop and go backwards. This corresponds directly to what @aturon described as “friction” in accomplishing your task.

Therefore, we would like to introduce the ability to use the name of a parameter without declaring a named lifetime at all. This would be permitted so long as the type of that parameter has exactly one lifetime that appears in it; anything else is ambiguous, and would require the more explicit syntax.

An example should explain. Consider this snippet:

impl Bar {
    // Here, the result references the argument `data`, so we tag them both with `'a`.
    fn foo<'a>(&self, data: &'a [i32]) -> Foo<'a> { ... }

    // Here is an alternative, using the new feature:
    fn foo(&self, data: &[i32]) -> Foo<'data> { ... }
}

Naturally, there are some backwards compatibility concerns to address. For example, what happens if there is already a named lifetime whose name shadows an existing parameter? Naturally, that should take precedence. However, to avoid confusion, I would propose that we issue a deprecation in cases where the named lifetime does not appear in the type of the parameter with the same name:

// OK; you could just remove the `<'data>` though.
fn foo<'data>(&self, data: &'data [i32]) -> Foo<'data> { ... }

// Also OK; again you could remove the `<'data>` without changing the meaning.
fn foo<'data>(&'data self, data: &'data [i32]) -> Foo<'data> { ... }

// OK; in this case, you could not remove the explicit names,
// because `'data` would be ambiguous since the type of `data` has
// two lifetimes in it, but it's still allowed since `data` referes
// to `'data`.
fn foo<'a, 'data>(&self, data: Foo2<'a, 'data>) -> Foo<'data> { ... }

// Deprecated, because `'data` does not appear in the type of `data`.
fn foo<'data>(&'data self, data: &[i32]) -> Foo<'data> { ... }

impl<'data> Foo<'data> {
    // Deprecated: `'data` shadows name of a parameter but does not
    // appear in its type.
    fn get(&self, data: &[i32]) -> &'data [i32] { }
}

One thing I do not know is whether we should allow explicit names and parameter names to intermix on a single fn. I suspect not, for clarity’s sake:

// Error: can't use `'data` shorthand on this fn,
// because it declares a named lifetime
// parameter `'a`.
fn foo<'a>(&'a self, data: &[i32], data2: &'a [i32]) -> Foo<'data> { ... }

impl<'a> Foo<'a> {
    // OK: But I would allow it here, even though there is
    // a named lifetime parameter in scope, because it is not
    // declared **on this item**. Note that it'd be a deprecation
    // warning if `'data` were declared on the impl.
    fn get(&self, data: &[i32]) -> &'data [i32] { }
}

An interaction: elision in impl Trait

Under the current RFC to “expand and stabilize impl Trait”, we proposed that lifetime bounds would not be “captured by default” in impl trait. This means that if you plan to have an impl Trait that will (e.g.) use data from your &self, that needs to be declared using the + syntax. At present, this requires a named lifetime parameter:

impl Bar {
    fn iter<'self>(&'self self) -> impl (Iterator<Item=u32> + 'self) {
        self.data.iter().cloned()
    }
}

Clearly, this case could be made more concise with the ability to elide a lifetime name if it is the same as a parameter. That might be sufficient; it’d also be nice if we could use the “anonymous” syntax to cover this case, but it’s not obvious that either of the two candidates are a good fit. More on that below.

Infer the `T: 'a` annotations on type definitions.

Finally, last but not least, for all of this to work (in particular, for the anonymous struct decls to work), we need to be able to infer the “outlives requirements” that we currently require in a struct declaration. These requirements effectively “signal” what generic types are borrowed in the body of the type, and for how long. So, for example:

// `T` not borrowed, no `T: 'a` annotation
struct Foo<'a, T> {
    x: &'a i32,
    y: T
}

// `T` borrowed, hence `T: 'a` required
struct Bar<'a, T: 'a> {
    x: &'a T,
}

I think it’s safe to say that these annotations are annoying and not widely understood. They also add little value, since they can effectively be “derived” from the types of the fields (unlike, say, a K: Hash + Eq constraint). We already do not require these annotations on functions or in impl bodies, for the most part, because we allow fns and impls to assume that the lifetime requirements declared on their types hold.

Unfortunately, we can’t use that same approach on types, because it relies on the fact that the types already have annotations; we have to do something a bit more sophisticated. Basically the idea is to use a global inference step (analagous too variance inference). This will be a fixed-point iteration: for those structs that directly contain references, we infer that the T: 'a annotation is neeed, then we propagate to other types that contain that struct. There may be some complications but it should basically work.

Bikeshed / ASCII Golf

So, I promised a good bikeshed, and I plan to deliver! As you’ve already seen, there are two candidate syntaxes. We spent some time discussing their pros and cons. Here are some notes. Maybe you can think of a third alternative.

The single tick

The first contender was Foo<'>. First off, here are some examples of it in practice:

Foo<'>
Foo<', T>
&Foo<'>
&Foo<', T>
&mut Foo<'>
&mut Foo<', T>

// Combined with `impl Iterator`:
impl Iterator<T> + ' // rather odd since there is nothing "to the right"

// NOT legal:
Foo<', 'a, T> // <-- can only use `'` if you elide *all* parameters

The pros of this approach:

Very close to the existing 'a

The cons:

Doubles down on “the tick”, which many users report as feeling strangely unbalanced
When combined with generic parameters, requires a comma
Seems strange in the “impl Iterator” context, though I guess it technically works ok

The storyline:

It’s useful to think about the story of someone learning about lifetimes in Rust. If the question is “ok, you’ve used some basic references, so what if you want to put one in a struct?”, the answer will be that you write struct Foo<'>, where the ' is a “visual signal” that there are references within the struct (important so the compiler can keep them from escaping the enclosing stack frame). You may then get into explaining named lifetimes already, or at least hinting that they are to come.

Variations

Instead of “the single tick” ', there were some other variations that I personally did not like as much, simply because of aesthetics:

'_ – kind of looks like inference, but it’s not inference; doesn’t represent multiple lifetimes
'.. – represesents multiple lifetimes

The trailing ampersand

The next contender was Foo&. First off, here are some basic examples of it in practice:

Foo&
Foo&<T>

// When the `Foo` appears behind a reference, do not
// require the trailing `&`:
&Foo
&Foo<T>
&mut Foo
&mut Foo<T>

It is interesting to consider what to do in the case of a shorthand for &Foo<'a, T>. I chose to modify the elision rules to say that you only need to use the "trailing &" to signal a lifetime if the struct is not already borrowed (this would be very similar to the trait object lifetime default rules, basically). This slightly weaks the “visual signal of borrowing”. There is still an &, but you don’t have a visual indication that there are also references in the struct itself. This doesn’t seem that important to me; unless Foo is Copy, you wouldn’t be able to “escape” the referent of the reference anyway. I don’t see it causing confusion in the same way.

If you did want to write the trailing ampersand explicitly for some reason, it would look like:

&Foo&
&Foo&<T>
&mut Foo&
&mut Foo&<T>

You can combine this with impl Iterator by writing impl& Iterator<Item=u32>. This does however require the impl keyword, and would not work if we changed the meaning of a “bare trait” like Iterator<Item=u32>, as has been discussed (we’d only do that in a “new epoch”, of course).

I think that, like the single tick, you cannot combine anonymous and named lifetime parameters with this syntax. So Foo&<'a, T> would not be allowed. (It doesn’t have to be this way, though, conceivably we could allow you to supply a “trailing suffix” of the named lifetimes; I would actually like that in the compiler, but it seems confusing.)

The pros of this approach:

For simple cases, & is the consistent “borrowing symbol”
Many people report confusion about how named lifetimes are in the generic parameter list
- From a type theory perspective, it makes perfect sense…
Works reasonably well with impl& Iterator

The cons:

When you do need named lifetimes, they are more foreign
When combined with generic parameters, Foo&<T> is “heavy” (but no comma!)
Potential for confusion between &Foo (reference to a Foo) and Foo& (struct with references)

The storyline:

It’s useful to think about the story of someone learning about lifetimes in Rust. If the question is “ok, you’ve used some basic references, so what if you want to put one in a struct?”, the answer will be that you “write the & after the struct name to show it has references in it”, e.g. struct Foo&.

A question:

Should we also permit Foo&'a, which would be consistent?

Conclusion

That’s it! Thoughts?

ubsan · May 5, 2017, 3:38pm

Foo& is a really ugly syntax, and makes me think that Foo is a reference, rather than owning references; and &Foo& is absolutely awful, imo. for <'>.

Edit: this wasn’t worded well. I don’t personally like Foo& because it reminds me of references in C++ (and I know it’d be a point of weirdness for C++ programmers). It also doesn’t feel “right” to me; it’s not introducing a reference, it’s introducing implicit lifetime parameters. &Foo& looks like an attempt to make & a delimiter , but I have found out that it’s not necessary. I also don’t like that a Foo& would be allowed to be a (or multiple) “mutable lifetimes”. I personally really like the <'> syntax.

leonardo · May 5, 2017, 3:39pm

You probably meant to write: The next contender was Foo&.

phaylon · May 5, 2017, 3:43pm

I just wanted to say that I’d vastly prefer the <'> variant to postfix &. It’s a cleaner transition to explicit <'a, 'b> syntax and back, and it feels a lot less noisy.

Would something like '? be possible? I do agree that a lone ' seems out of place. Something followed by a sigil seems like it would also make it easier to spot in macro invocations.

aturon · May 5, 2017, 3:56pm

I want to encourage everyone reading this thread to try to spend some time “sitting with” the various ideas and proposals, and trying to carefully take in the constraints. Syntax changes for something a core as lifetimes are going to feel weird to you if you know Rust well. So it’s important to give it some time, to imagine what code would actually look like (or even play with some real code).

I don’t think any of the proposals so far have fully “cracked the nut” here. We have a chance to push hard on the learning curve and productivity by improving the design, and we really need to take our time and think deeply. Please try to internalize the rationale for each of the existing proposals, and see if you can push the ideas further!

(This reminds me a lot of early design work around closure syntax before 1.0; we iterated through what was, in retrospect, some pretty depressing syntaxes for expressing captures/ownership. But it was necessary to spend the time in discomfort and exploration to land on the wonderfully slick way we determine ownership in closures today.)

KasMA1990 · May 5, 2017, 4:10pm

As a Rust noob, the idea of referencing parameter names as lifetimes feels like a fantastic improvement; it’s very intuitive!

As for the two syntaxes, I feel like using ampersand would make things more confusing above all. I can definitely see the downsides to the tick, but off the top of my head, I still think it makes for an overall good reading experience at least.

ubsan · May 5, 2017, 4:12pm

@aturon

so, I disagree with you on two points; one, I would really like (optional) explicit capturing for closures!

On to the other thing; I really, really like the <'> syntax; I actually do think it’s “cracked the nut”, at least for me. I don’t agree with the cons list, personally. I think the first is actually a pro; people will get used to “the tick” faster. The second is fine. And the third… I don’t really mind it? It might be my mental model of lifetime syntax, that, in 'a, ' is a thing that tells you that a lifetime is there, and a is the name of the lifetime itself.

kornel · May 5, 2017, 4:29pm

IMHO the need to have struct definitions with lifetimes explicit and obvious at the first glance exists only because struct definitions have to be consulted often, because function declarations using them are not self-explanatory.

Therefore, if function declarations are made self-explanatory, there won’t be such a strong need to check struct definitions, and the struct definitions can be made easier to write.

I’m in favour of encouraging function args and return types being self-explanatory (e.g. fn foo(&self) -> Foo<'>), which will allow simplifying struct definitions even further:

struct Foo {
    t: &i32
}

To me this is still explicit and clear, because there’s & in the body. If this type is used as Foo<'> elsewhere, then I won’t need to look at this definition to find the &.

There’s probably a concern “what about nested types with references?” — I think it’d be fine if the zero-tick syntax was allowed just the simplest case of & being literally present in the definition. For nested lifetimes either current syntax could remain to be required, or perhaps just the <'> in the type containing a reference:

struct Bar {
   f: Foo<'>,
}

so as long as the body of the definition is explicit (and there’s only one/unambiguous lifetime involved), the name shouldn’t need to repeat the same information.

kornel · May 5, 2017, 4:31pm

For the tick bikeshed, I’d like to propose Foo<&>

Foo<'a, 'b, T, U> ≈ Foo<&, T, U>

& is a reference. Vec<i32> is widely known as a type containing i32s somewhere, so Foo<&> could be read as a type containing references somewhere.

fn foo(&self, bar: Bar<&>) -> Baz<&> {…}

kornel · May 5, 2017, 4:39pm

For impl Type + 'a, how about using impl<'a> Type?

In regular impl {} blocks impl<'a> already means “this implementation is going to use these lifetimes”, so fn foo() -> impl<'a> Foo seems close enough to me.

aturon · May 5, 2017, 5:04pm

I'd love to hear more about this -- can you give an example or two where you had to work around the lack of capture clauses?

nikomatsakis · May 5, 2017, 5:07pm

Not sure what @ubsan has in mind, but I've noticed a problem with the "binary" nature of move in Rayon:

fn foo() {
    let x = vec![3];

    rayon::scope(|s| {
        let x = &x; // have to do this, because we want to borrow `x`

        for i in 0..10 {
            // here we really want `move` to apply to just `i`, not `x`
            s.spawn(move |_| use(x, i));
        }
    });
}

nikomatsakis · May 5, 2017, 5:09pm

I had two thoughts on this.

The most obvious is move(i) || ..., though that might be annoying if there are many variables. (It’s basically the inverse of the let x = &x that I used to solve it.)

The next option is leveraging labeled blocks (which we don’t yet support, but I wish we did):

fn foo() {
    let x = vec![3];

    rayon::scope(|s| 'scope: { // <-- give name to this block
        for i in 0..10 {
            // declare that we want to move everything "inside" `'scope`
            s.spawn(move('scope) |_| use(x, i));
        }
    });
}

In this model, move || ... is short for move('static) || ....

nikomatsakis · May 5, 2017, 5:11pm

I think the shortcoming here is that the struct declaration no longer "mirrors" the use very well. For example, you could use this struct as any of the following: Foo<'>, Foo<'a>, but you could not use it as Foo (without a deprecation warning), even though that is the way it is declared.

I also would prefer not to have to scan the types of the fields to know if there are references. For example, in rustdoc, those types aren't even visible!

Ah, this was actually something I proposed way back when as well, though most people at the time seemed to prefer Foo<'>.

leodasvacas · May 5, 2017, 5:28pm

Yay a bikeshed. What if the single tick and the trailing ampersand had a baby, the trailing tick:

Foo'
Foo'<T>
&Foo'
&Foo'<T>
&mut Foo'
&mut Foo'<T>

aturon · May 5, 2017, 5:51pm

Interesting! For impl Trait, I suppose we'd get impl' SomeTrait. (And of course, impl SomeTrait' means that the trait has elided lifetime params, rather than the underlying concrete type).

By the way: one potential additional constraint is forward-compatibility with the world in which "bare trait" syntax is used for today's impl Trait, i.e. fn foo(self) -> Iterator<Item=u32>. The trouble is that, without the impl, there's not much "syntactic space" to put extra things like a ' or &.

Also, it's worth thinking through how all of this should play for trait objects. Box<Trait + '..>?

kornel · May 5, 2017, 6:23pm

I'd say that's purely a rustdoc deficiency. I don't think rustdoc has to literally copy the syntax as written in the source, and it could add explicit annotations where it is helpful.

For example if code is written as fn foo() -> Foo, I'd prefer rustdoc to document it as fn foo() -> Foo<'>. And similarly Foo { bar: &u8 } can be shown as Foo<'> { /* some fields omitted */ } in rustdoc.

jnicklas · May 5, 2017, 10:02pm

I’m guessing that it’s probably ambiguous in the grammar, so won’t work, but how about just using Foo' instead of Foo<'>?

struct Foo' {
    t: &i32
}

impl Bar {
    fn foo(&self) -> Foo' { ... }
    fn use_foo(&self, Foo') { ... }
}

withoutboats · May 5, 2017, 10:25pm

What I wanted to ask at the end of the meeting, but everyone had to go, was whether or not we should reconsider the original syntax we considered - the same as the ' syntax but using an & instead - Foo<&, T> and so on.

It seems like of the choices presented so far they are the combinations of two independently moving choices:

Put the marker inside the param angle brackes - Foo<', T> and Foo<&, T> or put the marker at the end of the type name - Foo'<T> and Foo&<T>.
The marker should be a ' or an &.

I’m excited to hear about other possibilities, I also don’t think any of these are a slam dunk.

briansmith · May 5, 2017, 10:51pm

I think the current syntax is OK and it isn’t worth spending a significant effort trying to make it more convenient at this time. There are lifetime-related semantic changes that are a much higher priority, IMO. For example this:

struct A {}

impl A {
    fn f(&self) -> usize { 1 } 
    fn g(&mut self, _: usize) { }
}

fn main() {
    let mut a = A {};

    // Works fine, unsurprisingly.
    let x = a.f();
    a.g(x);

    // Fails to compile, surprisingly, but it should work.
    a.g(a.f()); 
}

Further, lots of this work seems to be optimizing for minimizing the effort of typing in code at the cost of readability of said code. IMO, it is better to leave the work of minimizing typing to editors and IDEs and optimize the language itself for readability.

More generally I’d rather see more effort spent on improving the borrow checker and type system, and I’d be very happy to trade syntax improvements (including macros) for them. (Although I shouldn’t need so, for clarity: I don’t mean to imply that this and the macro work isn’t great.)

Topic		Replies	Views
pre-RFC: Lifetime elision 1.1 - structs with one reference field language design	12	1913	March 25, 2019
Lifetime elision with only the return type elided	4	978	March 25, 2019
Lifetime Elision for Associated Types (Unbaked idea) language design	14	978	March 25, 2019
Pre-RFC: usagetimes (partial mutability) language design	18	1281	December 11, 2023
Nicer syntax for lifetime arguments?	2	888	March 25, 2019