Support special character in identifier?

Rust already has the spacial form r# to escape keyword to a identity.

Is it possible to support the special character in identity which start with r# and end with #, such as:

let r#foo-bar# = 1;

fn r#foo?#() {}

use r#the-crate#::Bar;

I think Rust identifiers follow the Unicode specification for identifiers*. Basicly an identifier can start with characters with the XID_Start property (mostly letters), followed by characters with the XID_Continue property (digits, letters, underscore). Raw identifiers are still valid Unicode identifiers, just not valid Rust identifiers because they conflict with a keyword.

Because a character like ? is not XID_Start or XID_Continue, it cannot be part of an identifier (without dropping the specification).

* this is also why you might see the crate 'unicode_xid' somewhere in your cargo dependency list when building.

I think you mean "identifier" rather than "identity".

But even ignoring the Unicode specification and all the rationale behind its recommendation, why should we do this? All feature requests need stronger motivation than "why not?"

In particular, the only reason the r# "raw identifier" syntax exists in the first place is to work around a limitation of the editions system (a public method named await that some other crate wants to call from 2018 code). It had nothing to do with changing the character set for identifiers.

6 Likes

Not yet, it is still being worked on.

The use case r#any-text# was discussed briefly in RFC 2151 but not considered, because r# was introduced for cross-edition dependencies as explained above by @lxrec.

Most use cases requiring non-Rust-identifiers can be solved using #[link_name] (for FFI) or #[serde(rename)] (for serialization).

1 Like

This change would not be backwards compatible:

foo!(r#await, r#async);

Are there two identifiers r#await and r#async, or is it r#await, r# followed by the keyword async?

2 Likes

@Aloso any change to syntax beyond adding new keywords breaks macros, so it isn't a big reason to not change syntax. Even the change to add the raw identifiers was a breaking change if you consider macros.

When breaking changes are made, a crater run is performed to see how much code is affected.

If this change would forbid macros to have two consecutive idents or a string following an ident, this would break a lot of code I believe. In either case, it would make the parser more complicated.

Also, I think that this feature is missing some motivation/justification.

2 Likes

Only if raw identifiers are used, which isn't often. Either way, I don't think that this is a good idea. It just makes the code harder to read, without any significant gain.

This was carefully considered in the RFC and its discussion, and it actually was not a breaking change. Previously, r# could only be tokenized as the start of a raw string, and would be an error otherwise, even in a macro. So anything that might have looked like a raw identifier before was already invalid.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.