Yesterday we had our “pro-active” lang-team meeting. The topic of discussion was “Elision 2.0”, which is our “code name” for a set of changes with two overarching goals:
-
make the “easy things easier” in the lifetime system
- allow elision in more places, such as type declarations
- a particular goal is making it easier to work with structs with a single lifetime parameter
-
help develop stronger intuitions and visual signals about when borrowing is happening
- as part of this, correct for some surprising or overzealous cases in elision
There is a draft RFC (more of an outline, at this point) that describes some early ideas in this direction. As part of this discussion, I’m very interested in (a) tweaking some of the details and (b) finding out if there are people who’d like to help work on the RFC! I’ve intentionally made it into a separate repo so that we can have multiple authors.
With that in mind, let me review the various pieces of the plan. I think that the consensus from the meeting was that we were all fairly comfortable with the “major ideas” I’m about to describe, but we are still not sure about some of the specific syntax to use (bikeshed time!). With that caveat aside, let’s dig into some of those ideas:
Allow lifetimes to be elided in the body of a type with just one lifetime parameter.
This is fairly straightforward. If you have a struct with just one lifetime parameter, elided lifetimes would be allowed just as in fn arguments, and we will supply that single lifetime parameter:
struct Foo<'a> {
t: &i32 // assumed to be `&'a i32`
}
Allow structs with a single lifetime parameter to use an “anonymous” syntax.
There is some debate about what this syntax should be. I’ll give the two contenders here, and discuss the pros/cons below.
// Contender 1: The single tick.
struct Foo<'> {
t: &i32
}
// Contender 2: Trailing ampersand.
struct Foo& {
t: &i32
}
Prefer for this same “anonymous” syntax to be used in references to a type.
Right now, for a struct type with lifetime parameters like Foo
, we have no way to “signal” that Foo
contains a lifetime parameter without using an explicit name. This can lead to a lot of confusion. For example, in a signature like this, there is no obvious way to know that the receiver will remain borrowed as long as the return value is in use. The only way to tell is to consult the definition of Foo
.
impl Bar {
fn foo(&self) -> Foo { ... }
}
Similarly, in this case, it is not obvious that Foo
has references in it which might prevent us from (e.g.) sending Foo
to another thread. Unless we know the type definition of Foo
, it appears visually that Foo
is “fully owned data”, like a Vec<i32>
would be:
impl Bar {
fn use_foo(&self, f: Foo) { ... }
}
In contrast, in the explicit forms, the presence of borrows are visually obvious, but the syntax is wordy and clunky:
impl Bar {
fn foo<'a>(&'a self) -> Foo<'a> { ... }
fn use_foo<'a, 'b>(&'a self, Foo<'b>) { ... }
}
The proposed solution is to allow that same anonymous syntax that we use to declare Foo
to also reference Foo
without naming the lifetime; this would be filled in with the same value that would be used for an elided lifetime. We would then deprecate the current elision rules when being applied to a struct with lifetime parameters unless the new syntax is used.
(When referencing a type, the syntax can in fact be used to elide any number of lifetimes; so if you have struct Foo2<'a, 'b>
, you can still write Foo2&
or Foo2<'>
, in which case it is eliding both lifetime parameters.)
Here then are those same two examples, using the two contender syntaxes:
impl Bar {
fn foo(&self) -> Foo<'> { ... }
fn use_foo(&self, Foo<'>) { ... }
fn foo(&self) -> Foo& { ... }
fn use_foo(&self, Foo&) { ... }
}
There are some other cases of elision that I would like to deprecate as well. Many of these exist simply because of how the current implementation works; I don’t believe they were intended by the original RFC, necessarily. This is my current list (I am not sure of how much the lang team agrees to each individual item, and I may have forgotten some):
impl Bar {
// Elided lifetimes that expand to a named lifetime.
fn foo<'a>(&'a self) -> Foo& { } // currently accepted
fn foo<'a>(&'a self) -> Foo<'a> { } // preferred
}
struct Foo2<'a, 'b> { }
impl Bar {
// Mixed elided and not elided.
fn foo<'a>(&'a self) -> Foo2<'a> { } // currently accepted
fn foo<'a>(&'a self) -> Foo2<'a, 'a> { } // preferred option 1
fn foo(&self) -> Foo2& { } // preferred option 2
}
Permit referencing the name of a parameter instead of declaring a lifetime.
In all of the cases so far, we’ve been able to elide the lifetime name completely. However, there are cases where you want to use an explicit name – for example, if you don’t wish to use the default. In those cases, Rust currently requires that you start giving names to lifetime parameters. However, this has some downsides:
- it is often easier and more intuitive think of which parameter the reference is borrowed from; the named lifetimes in these cases are just used to “link” the parameter and the return value.
- it’s just ergonomically annoying to have to go back and add the
<'a>
to the function signature. Often, you only realize the need for it when writing the return type, in which case you have to stop and go backwards. This corresponds directly to what @aturon described as “friction” in accomplishing your task.
Therefore, we would like to introduce the ability to use the name of a parameter without declaring a named lifetime at all. This would be permitted so long as the type of that parameter has exactly one lifetime that appears in it; anything else is ambiguous, and would require the more explicit syntax.
An example should explain. Consider this snippet:
impl Bar {
// Here, the result references the argument `data`, so we tag them both with `'a`.
fn foo<'a>(&self, data: &'a [i32]) -> Foo<'a> { ... }
// Here is an alternative, using the new feature:
fn foo(&self, data: &[i32]) -> Foo<'data> { ... }
}
Naturally, there are some backwards compatibility concerns to address. For example, what happens if there is already a named lifetime whose name shadows an existing parameter? Naturally, that should take precedence. However, to avoid confusion, I would propose that we issue a deprecation in cases where the named lifetime does not appear in the type of the parameter with the same name:
// OK; you could just remove the `<'data>` though.
fn foo<'data>(&self, data: &'data [i32]) -> Foo<'data> { ... }
// Also OK; again you could remove the `<'data>` without changing the meaning.
fn foo<'data>(&'data self, data: &'data [i32]) -> Foo<'data> { ... }
// OK; in this case, you could not remove the explicit names,
// because `'data` would be ambiguous since the type of `data` has
// two lifetimes in it, but it's still allowed since `data` referes
// to `'data`.
fn foo<'a, 'data>(&self, data: Foo2<'a, 'data>) -> Foo<'data> { ... }
// Deprecated, because `'data` does not appear in the type of `data`.
fn foo<'data>(&'data self, data: &[i32]) -> Foo<'data> { ... }
impl<'data> Foo<'data> {
// Deprecated: `'data` shadows name of a parameter but does not
// appear in its type.
fn get(&self, data: &[i32]) -> &'data [i32] { }
}
One thing I do not know is whether we should allow explicit names and parameter names to intermix on a single fn. I suspect not, for clarity’s sake:
// Error: can't use `'data` shorthand on this fn,
// because it declares a named lifetime
// parameter `'a`.
fn foo<'a>(&'a self, data: &[i32], data2: &'a [i32]) -> Foo<'data> { ... }
impl<'a> Foo<'a> {
// OK: But I would allow it here, even though there is
// a named lifetime parameter in scope, because it is not
// declared **on this item**. Note that it'd be a deprecation
// warning if `'data` were declared on the impl.
fn get(&self, data: &[i32]) -> &'data [i32] { }
}
An interaction: elision in impl Trait
Under the current RFC to “expand and stabilize impl Trait”, we proposed that lifetime bounds would not be “captured by default” in impl trait. This means that if you plan to have an impl Trait
that will (e.g.) use data from your &self
, that needs to be declared using the +
syntax. At present, this requires a named lifetime parameter:
impl Bar {
fn iter<'self>(&'self self) -> impl (Iterator<Item=u32> + 'self) {
self.data.iter().cloned()
}
}
Clearly, this case could be made more concise with the ability to elide a lifetime name if it is the same as a parameter. That might be sufficient; it’d also be nice if we could use the “anonymous” syntax to cover this case, but it’s not obvious that either of the two candidates are a good fit. More on that below.
Infer the T: 'a
annotations on type definitions.
Finally, last but not least, for all of this to work (in particular, for the anonymous struct decls to work), we need to be able to infer the “outlives requirements” that we currently require in a struct declaration. These requirements effectively “signal” what generic types are borrowed in the body of the type, and for how long. So, for example:
// `T` not borrowed, no `T: 'a` annotation
struct Foo<'a, T> {
x: &'a i32,
y: T
}
// `T` borrowed, hence `T: 'a` required
struct Bar<'a, T: 'a> {
x: &'a T,
}
I think it’s safe to say that these annotations are annoying and not widely understood. They also add little value, since they can effectively be “derived” from the types of the fields (unlike, say, a K: Hash + Eq
constraint). We already do not require these annotations on functions or in impl bodies, for the most part, because we allow fns and impls to assume that the lifetime requirements declared on their types hold.
Unfortunately, we can’t use that same approach on types, because it relies on the fact that the types already have annotations; we have to do something a bit more sophisticated. Basically the idea is to use a global inference step (analagous too variance inference). This will be a fixed-point iteration: for those structs that directly contain references, we infer that the T: 'a
annotation is neeed, then we propagate to other types that contain that struct. There may be some complications but it should basically work.
Bikeshed / ASCII Golf
So, I promised a good bikeshed, and I plan to deliver! As you’ve already seen, there are two candidate syntaxes. We spent some time discussing their pros and cons. Here are some notes. Maybe you can think of a third alternative.
The single tick
The first contender was Foo<'>
. First off, here are some examples of it in practice:
Foo<'>
Foo<', T>
&Foo<'>
&Foo<', T>
&mut Foo<'>
&mut Foo<', T>
// Combined with `impl Iterator`:
impl Iterator<T> + ' // rather odd since there is nothing "to the right"
// NOT legal:
Foo<', 'a, T> // <-- can only use `'` if you elide *all* parameters
The pros of this approach:
- Very close to the existing
'a
The cons:
- Doubles down on “the tick”, which many users report as feeling strangely unbalanced
- When combined with generic parameters, requires a comma
- Seems strange in the “impl Iterator” context, though I guess it technically works ok
The storyline:
It’s useful to think about the story of someone learning about lifetimes in Rust. If the question is “ok, you’ve used some basic references, so what if you want to put one in a struct?”, the answer will be that you write struct Foo<'>
, where the '
is a “visual signal” that there are references within the struct (important so the compiler can keep them from escaping the enclosing stack frame). You may then get into explaining named lifetimes already, or at least hinting that they are to come.
Variations
Instead of “the single tick” '
, there were some other variations that I personally did not like as much, simply because of aesthetics:
-
'_
– kind of looks like inference, but it’s not inference; doesn’t represent multiple lifetimes -
'..
– represesents multiple lifetimes
The trailing ampersand
The next contender was Foo&
. First off, here are some basic examples of it in practice:
Foo&
Foo&<T>
// When the `Foo` appears behind a reference, do not
// require the trailing `&`:
&Foo
&Foo<T>
&mut Foo
&mut Foo<T>
It is interesting to consider what to do in the case of a shorthand for &Foo<'a, T>
. I chose to modify the elision rules to say that you only need to use the "trailing &
" to signal a lifetime if the struct is not already borrowed (this would be very similar to the trait object lifetime default rules, basically). This slightly weaks the “visual signal of borrowing”. There is still an &
, but you don’t have a visual indication that there are also references in the struct itself. This doesn’t seem that important to me; unless Foo
is Copy
, you wouldn’t be able to “escape” the referent of the reference anyway. I don’t see it causing confusion in the same way.
If you did want to write the trailing ampersand explicitly for some reason, it would look like:
&Foo&
&Foo&<T>
&mut Foo&
&mut Foo&<T>
You can combine this with impl Iterator
by writing impl& Iterator<Item=u32>
. This does however require the impl
keyword, and would not work if we changed the meaning of a “bare trait” like Iterator<Item=u32>
, as has been discussed (we’d only do that in a “new epoch”, of course).
I think that, like the single tick, you cannot combine anonymous and named lifetime parameters with this syntax. So Foo&<'a, T>
would not be allowed. (It doesn’t have to be this way, though, conceivably we could allow you to supply a “trailing suffix” of the named lifetimes; I would actually like that in the compiler, but it seems confusing.)
The pros of this approach:
- For simple cases,
&
is the consistent “borrowing symbol” - Many people report confusion about how named lifetimes are in the generic parameter list
- From a type theory perspective, it makes perfect sense…
- Works reasonably well with
impl& Iterator
The cons:
- When you do need named lifetimes, they are more foreign
- When combined with generic parameters,
Foo&<T>
is “heavy” (but no comma!) - Potential for confusion between
&Foo
(reference to aFoo
) andFoo&
(struct with references)
The storyline:
It’s useful to think about the story of someone learning about lifetimes in Rust. If the question is “ok, you’ve used some basic references, so what if you want to put one in a struct?”, the answer will be that you “write the &
after the struct name to show it has references in it”, e.g. struct Foo&
.
A question:
Should we also permit Foo&'a
, which would be consistent?
Conclusion
That’s it! Thoughts?