I personally prefer bad style to be unrepresentable, if possible
There are also a couple of small technical advantages to keep lifetimes as tokens:
proper syntax highlighting can be done in two phases: a fast lexer based one, and a slower syntax/semantic based one. It’s better to keep highlighting logic on the fast lexer level if possible.
it’s easier to recover from unclosed quote: you either eat a single char or hit a whitespace
@petrochenkov what is the primary motivation here? I guess it’s that it might be nice to match '$foo and just avoid a “second set” of hygiene-like operations?
There has in the past been talk of trying to move away from 'a-like names, so that 'a can be understood as shorthand for something else (e.g., lifetime a) which I guess fits in here.
There are also a couple of small technical advantages to keep lifetimes as tokens:
proper syntax highlighting can be done in two phases: a fast lexer based one, and a slower syntax/semantic based one. It’s better to keep highlighting logic on the fast lexer level if possible.
it’s easier to recover from unclosed quote: you either eat a single char or hit a whitespace
Beside this, there is the fact that this would not really improve the language a whole lot...
' lifetime would be bad style, and would confuse programmers and others.
' lifetime doesn't really add any features to the language over 'lifetime.
Comparing this to past syntactical changes, there seems to be much less of an improvement in the language.
i.e. when use of the where keyword came to replace use of a colon for type parameters there was a language improvement because one could use <T> ... where Box<T> : ... which was not possible with just <T: ...>.
The other thing is that the use of where for that syntax made reading generics easier because it separated out the generic types from their trait requirements. This does not seem to make lifetime syntax easier to read.
Yes, there are two kinds of identifiers right now - normal identifiers and lifetime identifiers.
Every time we need to perform some operation on normal identifiers, we have to add a separate equivalent feature for lifetime identifiers (e.g. ident vs lifetime matchers in macros, #ident vs #'ident hygiene opt-out).
At the same time lifetime identifiers are just identifiers prefixed by ' by definition, so it would be nice to consistently treat them like that (more orthogonality and less feature duplication).
Yes, I think this is kinda a natural consequence of splitting into two tokens and it makes sense to update lexer to behave like this.
I agree this is a bad style (like a space after any other "unary operator" - & x, * x).
Lexer will have to work slightly harder in general now to disambiguate lifetimes '\s*ident (hard requirement), character literals 'ONE_CHARACTER'/\xNN/\u{NNNN} (hard requirement) and invalid single-quoted strings 'abcdef' (best effort, for error reporting).
It seems like there's nothing too hard here though, the only new thing compared to the current lexer is skipping whitespaces after '.
Alternatively, we can just avoid updating lexing rules beyond supporting tokenization of '#ident, '$ident and '$#ident.
This may be an ignorant suggestion, but why can't lifetime identifiers just have their own lexical grammar and be defined as starting with a single quote '? That would permit unification of identifiers in matches when that is desired.
Of course this suggestion would not work if there is ever a case where a lifetime identifier is used without the preceding single-quote.
I don’t like the idea of separating the lifetime-identifier marker, the single-quote, from the lifetime identifier. Permitting arbitrary whitespace and comments between the ' marker and the following alphanumeric string seems like an unnecessary footgun.
It seems preferable to include the marker ' as the first character of a lifetime ident, similar to inclusion of a leading underscore in some non-lifetime idents. That unification permits both lifetime and non-lifetime idents to be in a common hash table or hash map, or share match arms, yet still be distinguishable by inspection of the first character, just as idents with a leading underscore are distinguishable from ones without a leading underscore.
Such identifier unification can lead to unintended pattern matching, as in your example in the prior post. If the formal parameter also begins with a ' that clearly mandates a lifetime ident for the match. In the more common scenario where the formal parameter does not specify a lifetime ident, both types of idents match and any error needs to be detected and reported via the post-match processing.
As I wrote initially, this may be an ignorant suggestion; it’s applicability depends on what happens when a lifetime ident is passed where a non-lifetime parameter is expected.
FWIW, I was wondering if treating ' as a lifetime operator expr -> lifetime wouldn’t be a neat thing to have. While I’m not sure it would necessarily add a lot to the language per se, it may make things more succinct in some places (YMMV, or course). For example:
This makes the ' a full-fledged prefix-operator token, which would permit arbitrary whitespace, including many lines of comments, to come between a ' lifetime marker and the following ident. That's the footgun I was trying to avoid.
I think that making ' a stand-alone prefix operator also implies that the identifier spaces for lifetimes and non-lifetimes would be unified, leading to collisions in any hashtable or hashmap ident lookup unless lifetime designators were always stored with a leading '.