[lang-team-minutes] Elision 2.0

It seems plausible that we can completely phase out ref and ref mut in patterns, though at the sacrifice of some precision (and it would require the move keyword to be usable on patterns).

In particular, I think this example has no direct equivalent:

struct Foo { x: Vec<i32>, y: Vec<i32> }

fn test(foo: &mut Foo) {
    match *foo {
        Foo { ref mut x, ref y } => ...
    }
}

if you do match foo { Foo { x, y } => .. }, you would get two mutable references. You can also do match &foo { Foo { x, y } => ... } to get two shared references. But I don't think you can get one mut and one shared.

I wasn't worried about this, because I figured we'd still have ref patterns for the most obscure cases, but if we had plans to remove it (and maybe repurpose it), might be something to keep in mind.

1 Like

Yes, certainly, the further we stray from existing syntax, the bigger this danger is.

I do have another selfish reason of dislike for ref: Repurposing it like this would probably mean match won’t keep a way of explicitly specifying bindings.

Yes. I've struggled with this. Certainly if the notation is ref r, people will call it a "reference", which seems at least potentially actively confusing. I guess you could imagine it being short for "reference lifetime".

Speaking more generally, I'm not sure how best to replace the word "lifetime". I always imagined it as the lifetime of the reference itself, but of course many people confuse it with the lifetime of referent (the resource being referenced), and this seems obvious in retrospect. Still, I've kind of given up for now on finding a better alternative, and when presenting or teaching I just make a point to try and define it clearly as a "span in the code", usually with some highlighting and examples, and sometimes explicitly contrasting with the lifetime of the underlying resource.

Some other words I've considered, just for fun:

  • region, as in "region of the code". This is sort of the default in academic literature and -- ironically -- was what I actively discouraged early on, since I felt it was too easily confused with "region of memory".
  • scope -- people often object to this, since it is quite overloaded.
  • span -- it's a compiler internal term with little inherent meaning, but I think talking about a "span of the code" is .. maybe clear-ish? Unsure. At least it doesn't mean a lot to people as far as I know. Could also be "span of time", which is sort of nice.
    • this would really make the rustc terminology confusing :slight_smile:, since we use span as "portion of the code to use for reporting errors", but it's actually a reasonable match in its own way
  • duration -- I like time-related words, and "duration of the loan" seems obvious, but the word somehow lacks pizazz.
  • interval -- also time-related. (full disclosure: I used this for a similar concept in my PhD research)

My assumption was that it would be gone from patterns entirely. The "reference inversion" feature discussed in the match ergonomics thread would apply to all patterns I believe.

I understand where you're coming from, but also consider the possibility that the strange and unfamiliar syntax might make it look scarier than it has to be.

(Also subjectively yes, the ticks are a huge eyesore to me. Almost unbearably so when they aren't followed by an alphanumeric identifier, like in the case of <'>, or the <'_> syntax that's been proposed for introducing an anonymous lifetime.)

Yes, this would definitely be a significant issue with any kind of syntactic changes we make. Not quite as big as pre- vs. post-1.0, though: the old syntax would presumably still work, perhaps with a deprecation warning, and even if we end up introducing epochs and it's an actual error, there would be a clear and comprehensible error message.

I'd be very disappointed with that. Is the reference inversion something new? I have to admit I'm not really following that discussion anymore.

I wouldn't really call it "scary". More "unusual", which it is. And ref x to me only acts like it is something familiar, because it's not actually specifying a reference. As in struct Foo<ref a>{ head: &a str, tail: &a str } has two references (head and tail), not one (a).

There's also the question of how it's going to look outside of type parameters. For example, would

'items: loop {
    if cond {
        break 'items;
    }
}

become

ref items: loop {
    if cond {
        break ref items;
    }
}

?

1 Like

As a new user of Rust, I would like to say that I find foo<', T> very very unfortunate. Try reading it without the monospace font: foo<’, T>. The symbols are dancing around on the baseline… is it a speck of dust? or is it an apostrophe? :slight_smile: Also, the apostrophe followed by a comma almost looks like a ;.

This kind of syntax is anything but elegant and moves Rust more towards the line-noise camp of languages — I’m a Python guy, so lots of special characters and symbols are not “elegant” to my eyes :slight_smile: I feel code should be optimized for readability since it’s normally read by many more people that the person who writes it.

Do I understand it correctly that the proposal simply allows the programmer to use ' instead of 'a when there is just one lifetime? If so, please just keep the more explicit syntax. Explicit is better than implicit, and all that, as they say in the Python world :wink:

There is some precedense for using 'foo in programming languages – Lisp use that syntax for its atoms. It’s pretty readable even though apostorphes are normally attached to the end of a word. But a free floating apostrophe? Please don’t.

12 Likes

I'm not sure how bad of an idea that is - accessing the referent through a reference after the references lifetime is over is sketchy and probably UB. So from the point of the "callee" the lifetime of the reference is all that matters.

Of course there's also the confusion that occurs because "values" of the kind lifetime can never be actually observed (because they are "erased" before run-time), so you must always work on some kind of abstract level.

So, at this point, I would like to go and update the Elision 2.0 RFC draft. I feel like we’ve got a lot of resolution on most issues except for the syntax. I think there are basically three four contenders here:

Foo'     Foo'<T>
Foo<'>   Foo<', T>
Foo<&>   Foo<&, T>
Foo<ref> Foo<ref, T>

I see the advantages and disadvantages as being roughly something like:

  • the tick ('): similar to 'a, but can be surprising on its own
  • the ampersand (&): similar to &T, doesn’t “lead you” to 'a notation, perhaps surprising on its own
  • the keyword (ref): not a sigil, but doesn’t “lead you” to 'a; maybe works best if we migrate away from 'a and towards <ref a>, but then it does sort of sound like a value, not a lifetime

I like that Foo<ref> basically means “a struct with references inside of it”, though, that seems…like something we could explain.

UPDATE: I added Foo' after the fact. I had originally left it out, but I think it’s too early to rule it out.

I would add:

  • for <'>: Consistent with current lifetimes, free of any current meaning. Kind of implies that it can apply to multiple lifetimes (' with a name refers to a specific lifetime, without a name can refer to many.
  • for <&>: Currently read as a borrow operator. Inconsistent with individual lifetimes (a single 'a can apply to multiple references (or none, if it’s about phantom data, but the & operator usually used for specifying individual borrows now implies multiple (or none).
  • for <ref>: Has current different meaning for bindings. Requires language changes with big impact to become consistent. Like & it currently doesn’t imply multiplicity.

I also feel “a struct with references inside it” is actually the wrong reading. I like that it currently communicates that it is generic over a lifetime, because they are important outside of types outright containing references. There’s types just being generic over references via phantom data, there’s trait objects, there’s HRTB, which I believe should be included in the evaluation.

Another possibility is: Foo<'_>

(has some lifetime constraint)

It occurs to me that & and ' have kind of opposite advantages. We can look at it in terms of what things a function signature involves:

&/&mut and elided lifetimes <-> user types and elided lifetimes <-> explicit lifetimes

where the middle case is the one we're trying to find syntax for, and the other two are known quantities. Then Foo<', T> is consistent with the logic of explicit lifetimes, with respect to what symbol to use and where to put it. Meanwhile, Foo<&, T> is consistent with the logic of elided signatures with & and &mut, with respect to what symbol to use and where elision happens. That is, the existing syntax for elision with & and &mut, as well as the syntax extended with Foo<&, T>, both follow a simple rule: elision happens where there is an & symbol without a specified lifetime.

In other words, the ampersand & leads you from elided lifetimes with built-in types (&, &mut) to elided lifetimes with user-defined types, while the tick ' leads you from elided lifetimes with user-defined types to explicit lifetimes.

Purely logically speaking I find both of these roughly similarly compelling; the deal-breaker for me is really just that I find Foo<', T> to be an incredibly big eyesore and Foo<&, T> merely a normally-sized eyesore.

@mgeisler (and everyone else who liked their comment), what do you think about Foo<&, T>? Obviously it's not ideal either, but I find it considerably less "lexically offensive" than Foo<', T>; I'm not sure whether I'm representative.

2 Likes

Here's a still-relatively-new-to-the-language, haven't-done-anything-fancy-with-lifetimes opinion: I find Foo<'> no worse than Foo<'a>, but I dislike both of them, because twenty years of C, shell, and Python have trained me to expect U+0027 to be an open quote, so my eyes automatically scan for the close quote and when it isn't there there's a sort of mental jar as I remind myself that this is not a syntax error in this language. (It doesn't help that U+0027 is also used to introduce character literals, and is balanced in that context.)

I deeply dislike Foo<&>, because that should mean something to do with references, not lifetimes. This is again twenty years of C, or more precisely C++, where you might write template <typename T> struct Foo<T&> { ... } to specialize on reference-to-T. Yes, the ampersand is on the other side of the T there; it's still even more of a mental jar than the unbalanced single quote.

Foo<ref> ... well, again, that keyword should mean something to do with references, not lifetimes, but I don't have such deeply ingrained associations for ref because it's only a keyword in Rust, and you only use it inside patterns. Recycling it to mean something to do with lifetimes in this context is only about as bad as the many different meanings of static in C++; I could get used to it.

What about #? I don't think that's used for anything in this language yet, and #a actually looks like a symbol rather than an unbalanced string constant.

I'm not 100% sure what you mean by this. I think lifetimes have a lot to do with references. In fact, part of what I find appealing about Foo<&> is precisely that it brings references to mind. So I'm trying to understand just what it is that is so jarring -- is it because you expect that to mean "reference to Foo"? (That is sort of how i interpreted the rest of your comment.)

1 Like

I expect Foo<&> to be something not entirely unlike “the variant of Foo that applies to immutable borrows of some unspecified type”. I expect it to be in contrast to Foo<> which is the variant applying to moves of some unspecified type, and Foo<&mut> the variant applying to mutable borrows. Maybe it only makes sense to write these things with a concrete type, or at least a trait bound.

I don’t feel that I fully understand lifetimes, but in my head lifetimes are a property of objects, not references. Foo<'a> is an object with lifetime 'a, and the most obvious reason you bother writing that is because it internally refers to some other object and you want the compiler to ensure that that object’s lifetime is as least as long as 'a, but there are other reasons; for instance, perhaps it is only valid while a lock is held, but it doesn’t refer to the lock itself.

4 Likes

I don't feel that I fully understand lifetimes, but in my head lifetimes are a property of objects, not references.

In Rust, lifetime is a property of references, not objects. Foo<'a> is not an object with lifetime 'a. Such understanding is precisely what (I think) we are trying to prevent.

2 Likes

I’ve seen much more complaints about Rust’s ugly noisy syntax with lifetimes than about lifetime elision in user-defined types (e.g. S<'a> => S) being confusing. So, I think I agree with points from @glaebhoerl’s list that suppose that maybe S => S<'> is trying to solve a non-problem (or at least not-sufficiently-large problem). At the same time

struct S<'a> {
    field: &'a u8
}

=>

struct S {
    field: &u8
}

certainly solves a real problem.

It would be also very nice if adding/removing a field with lifetime to/from a struct wasn’t as “viral” as it is now, requiring adjusting dozens of other places where S is used. Look at what people do to avoid the churn. The S => S<'> change seems to make it worse.

9 Likes

Does this mean the Foo' syntax is out? To me it looks by far the nicest, and so far no one has objected to it from a technical standpoint.

Foo'     Foo'<T>
Foo<'>   Foo<', T>
Foo<&>   Foo<&, T>
Foo<ref> Foo<ref, T>
3 Likes

I did make a mistake: what I should have said is "Foo<'a> is an object whose lifetime is constrained to be shorter than 'a, whatever 'a is". But I simply cannot wrap my head around the notion that "lifetime is a property of references, not objects." References are a subset of objects. It has to make sense to talk about the lifetime of an arbitrary object.

I'm personaly feeling skeptical of annonymous lifetimes. Anything that is not <'> is inconsistent with how explicit lifetimes are used. Elision is simple and beautiful and forms a nice mental model because it just removes stuff from the explicit form, that is kept in <'a> to <'>. When doing <'a> to <&> you swap a symbol for the other which seems arbitrary. But <', T> is just symbol spaghetti.

This is what I think things should look like:

I think we should back off from annonymous lifetimes and explore "parameter name instead of declaring a lifetime".

5 Likes