pre-RFC: Splitting `'lifetime` into two tokens

Currently the apostrophe ' is a part of the lifetime name itself for historical reasons that are no longer relevant. I don’t see obvious reasons for it to be the case now.

I suggest to move treatment of ' from the lexer to the parser and treat ' and lifetime as two consecutive tokens - token::Apostrophe and token::Ident.

Consequences:

Any problems with this change?

(This is not the entirely thought out proposal, but I wanted to leave it here for feedback today.)

5 Likes

It seems incredibly strange to me that the lexer needs to care about name resolution, but this is just an implementation detail.

However, I am not sure that allowing spaces between ` and the ident is a good idea, and I think splitting it into two tokens would allow that?

4 Likes

I'm curious why? Wouldn't it just be bad style?

2 Likes

I personally prefer bad style to be unrepresentable, if possible :slight_smile:

There are also a couple of small technical advantages to keep lifetimes as tokens:

  1. proper syntax highlighting can be done in two phases: a fast lexer based one, and a slower syntax/semantic based one. It’s better to keep highlighting logic on the fast lexer level if possible.

  2. it’s easier to recover from unclosed quote: you either eat a single char or hit a whitespace

4 Likes

@petrochenkov what is the primary motivation here? I guess it’s that it might be nice to match '$foo and just avoid a “second set” of hygiene-like operations?

There has in the past been talk of trying to move away from 'a-like names, so that 'a can be understood as shorthand for something else (e.g., lifetime a) which I guess fits in here.

2 Likes

I have to agree with @matklad.

There are also a couple of small technical advantages to keep lifetimes as tokens:

  1. proper syntax highlighting can be done in two phases: a fast lexer based one, and a slower syntax/semantic based one. It’s better to keep highlighting logic on the fast lexer level if possible.

  2. it’s easier to recover from unclosed quote: you either eat a single char or hit a whitespace

Beside this, there is the fact that this would not really improve the language a whole lot...

  • ' lifetime would be bad style, and would confuse programmers and others.
  • ' lifetime doesn't really add any features to the language over 'lifetime.

Comparing this to past syntactical changes, there seems to be much less of an improvement in the language.

i.e. when use of the where keyword came to replace use of a colon for type parameters there was a language improvement because one could use <T> ... where Box<T> : ... which was not possible with just <T: ...>.

The other thing is that the use of where for that syntax made reading generics easier because it separated out the generic types from their trait requirements. This does not seem to make lifetime syntax easier to read.

Yes, there are two kinds of identifiers right now - normal identifiers and lifetime identifiers. Every time we need to perform some operation on normal identifiers, we have to add a separate equivalent feature for lifetime identifiers (e.g. ident vs lifetime matchers in macros, #ident vs #'ident hygiene opt-out).

At the same time lifetime identifiers are just identifiers prefixed by ' by definition, so it would be nice to consistently treat them like that (more orthogonality and less feature duplication).

Yes, I think this is kinda a natural consequence of splitting into two tokens and it makes sense to update lexer to behave like this.
I agree this is a bad style (like a space after any other "unary operator" - & x, * x).

Lexer will have to work slightly harder in general now to disambiguate lifetimes '\s*ident (hard requirement), character literals 'ONE_CHARACTER'/\xNN/\u{NNNN} (hard requirement) and invalid single-quoted strings 'abcdef' (best effort, for error reporting).
It seems like there's nothing too hard here though, the only new thing compared to the current lexer is skipping whitespaces after '.

Alternatively, we can just avoid updating lexing rules beyond supporting tokenization of '#ident, '$ident and '$#ident.

This may be an ignorant suggestion, but why can't lifetime identifiers just have their own lexical grammar and be defined as starting with a single quote '? That would permit unification of identifiers in matches when that is desired.

Of course this suggestion would not work if there is ever a case where a lifetime identifier is used without the preceding single-quote.

1 Like

Do you mean both normal idents and lifetimes are ident? So, something like this is a valid macro definition and use

macro m($actually_expects_ident: ident, $actually_expects_lifetime: ident) {
    fn $actually_expects_ident<$actually_expects_lifetime>() -> &$actually_expects_lifetime u8 { panic!() }
}

m!(i, 'lt);

, if wrong kind of ident is passed, then it's reported during or after expansion

m!(i, j); // ERROR expected a lifetime identifier, not normal identifier in `&$actually_expects_lifetime u8`

, and hygiene opt-out still looks like #'ident?

I don’t like the idea of separating the lifetime-identifier marker, the single-quote, from the lifetime identifier. Permitting arbitrary whitespace and comments between the ' marker and the following alphanumeric string seems like an unnecessary footgun.

It seems preferable to include the marker ' as the first character of a lifetime ident, similar to inclusion of a leading underscore in some non-lifetime idents. That unification permits both lifetime and non-lifetime idents to be in a common hash table or hash map, or share match arms, yet still be distinguishable by inspection of the first character, just as idents with a leading underscore are distinguishable from ones without a leading underscore.

Such identifier unification can lead to unintended pattern matching, as in your example in the prior post. If the formal parameter also begins with a ' that clearly mandates a lifetime ident for the match. In the more common scenario where the formal parameter does not specify a lifetime ident, both types of idents match and any error needs to be detected and reported via the post-match processing.

As I wrote initially, this may be an ignorant suggestion; it’s applicability depends on what happens when a lifetime ident is passed where a non-lifetime parameter is expected.

1 Like

This is a nice idea. So the example now looks like this:

macro m($ident: ident, $'lifetime: ident) {
    fn $ident<$'lifetime>() -> &$'lifetime u8 { panic!() }
}

m!(i, 'lt);

Looks like a viable alternative to me.

1 Like

FWIW, I was wondering if treating ' as a lifetime operator expr -> lifetime wouldn’t be a neat thing to have. While I’m not sure it would necessarily add a lot to the language per se, it may make things more succinct in some places (YMMV, or course). For example:

fn pick_x<'a, 'b>(x: &'a u32, y: &'b u32) -> &'a u32 {
    x
}

could be written as

fn pick_x(x: &u32, y: &u32) -> &'*x u32 {
    x
}

This makes the ' a full-fledged prefix-operator token, which would permit arbitrary whitespace, including many lines of comments, to come between a ' lifetime marker and the following ident. That's the footgun I was trying to avoid.

I think that making ' a stand-alone prefix operator also implies that the identifier spaces for lifetimes and non-lifetimes would be unified, leading to collisions in any hashtable or hashmap ident lookup unless lifetime designators were always stored with a leading '.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.