Support special character in identifier?

jmjoy · November 16, 2019, 11:40am

Rust already has the spacial form r# to escape keyword to a identity.

Is it possible to support the special character in identity which start with r# and end with #, such as:

let r#foo-bar# = 1;

fn r#foo?#() {}

use r#the-crate#::Bar;

CDirkx · November 16, 2019, 12:07pm

I think Rust identifiers follow the Unicode specification for identifiers*. Basicly an identifier can start with characters with the XID_Start property (mostly letters), followed by characters with the XID_Continue property (digits, letters, underscore). Raw identifiers are still valid Unicode identifiers, just not valid Rust identifiers because they conflict with a keyword.

Because a character like ? is not XID_Start or XID_Continue, it cannot be part of an identifier (without dropping the specification).

* this is also why you might see the crate 'unicode_xid' somewhere in your cargo dependency list when building.

Ixrec · November 16, 2019, 12:09pm

I think you mean "identifier" rather than "identity".

But even ignoring the Unicode specification and all the rationale behind its recommendation, why should we do this? All feature requests need stronger motivation than "why not?"

In particular, the only reason the r# "raw identifier" syntax exists in the first place is to work around a limitation of the editions system (a public method named await that some other crate wants to call from 2018 code). It had nothing to do with changing the character set for identifiers.

spunit262 · November 16, 2019, 1:33pm

Not yet, it is still being worked on.

kennytm · November 16, 2019, 3:56pm

The use case r#any-text# was discussed briefly in RFC 2151 but not considered, because r# was introduced for cross-edition dependencies as explained above by @lxrec.

Most use cases requiring non-Rust-identifiers can be solved using #[link_name] (for FFI) or #[serde(rename)] (for serialization).

Aloso · November 16, 2019, 5:05pm

This change would not be backwards compatible:

foo!(r#await, r#async);

Are there two identifiers r#await and r#async, or is it r#await, r# followed by the keyword async?

RustyYato · November 16, 2019, 5:31pm

@Aloso any change to syntax beyond adding new keywords breaks macros, so it isn't a big reason to not change syntax. Even the change to add the raw identifiers was a breaking change if you consider macros.

Aloso · November 16, 2019, 5:57pm

When breaking changes are made, a crater run is performed to see how much code is affected.

If this change would forbid macros to have two consecutive idents or a string following an ident, this would break a lot of code I believe. In either case, it would make the parser more complicated.

Also, I think that this feature is missing some motivation/justification.

RustyYato · November 16, 2019, 5:59pm

Only if raw identifiers are used, which isn't often. Either way, I don't think that this is a good idea. It just makes the code harder to read, without any significant gain.

cuviper · November 16, 2019, 6:33pm

This was carefully considered in the RFC and its discussion, and it actually was not a breaking change. Previously, r# could only be tokenized as the start of a raw string, and would be an error otherwise, even in a macro. So anything that might have looked like a raw identifier before was already invalid.

system · February 14, 2020, 6:33pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: Raw identifiers language design	45	6235	March 25, 2019
Should r#<digit> be a valid raw ident? language design	11	1026	August 6, 2019
Supporting Emoji in identifiers language design	16	4446	September 30, 2022
Raw identifiers don't work for all identifiers language design	6	3901	March 25, 2019
Expand Unicode characters allowed in identifiers compiler	1	1451	March 25, 2019

Support special character in identifier?

Related topics