Pre-RFC: Raw identifiers


#1
  • Feature Name: raw_identifiers
  • Start Date: 2017-07-07
  • RFC PR: (leave this empty)
  • Rust Issue: (leave this empty)

Summary

Add a raw identifier format r#catch, so crates written in future language epochs/versions can still use an older API that overlaps with new keywords.

Motivation

One of the primary examples of breaking changes in the epoch proposal is to add new keywords, and specifically catch is the first candidate. However, since that’s seeking crate compatibility across epochs, this would leave a crate in a newer epoch unable to use catch identifiers in the API of a crate in an older epoch.

A raw syntax that’s always an identifier would allow these to remain compatible.

Detailed design

I propose r#catch as a syntax which always parses as a catch identifier, ignoring any possible keywords. Specifically, r# is the beginning of an escaped identifier token, followed by any legal identifer sequence. The r# is not kept as part of the resulting identifier in any way.

How We Teach This

We can call these simply “raw identifiers”. It may also be called stropping, but this term is not very common.

Using r#catch is visually similar to the existing r#"raw "escaped" string"# syntax, which will hopefully give an intuitive clue to advanced users what is going on.

This feature will most likely only be seen in rare corner cases. It may only need to be taught at all in documentation for a breaking change which is introducing a new keyword.

Documentation updates:

  • The Rust Programming Language Appendix A “Keywords” should mention the r# syntax.
  • Rust Reference 2.2 “Identifiers” should specify the r# syntax.
  • Perhaps also Grammar section 3.2.1 “Identifiers” and 3.5.1 “Keywords”.

Drawbacks

New syntax is always scary/noisy/etc.

It might not be intuitively “raw” to a user coming upon this the first time.

Alternatives

  • The exact syntax is a huge bikeshed.

    • C# uses @class, which might work for Rust, although it collides with boxed pointer of yore. It might be also be ambiguous with pattern @ bindings.
    • Other escapes are possible, like \catch.
    • We could allow multiple # and terminate them to allow even non-identifier characters
      • let r##The #1 weirdest syntax!## = this;
  • The status quo: epochs with new keywords may not be able to use those in the API of prior epochs.

    • There may be contextual mitigations. In the case of catch, it couldn’t be a fully contextual keyword because catch { ... } could be a struct literal. That context might be worked around with a path, like old_epoch::catch { ... } to use an identifier instead. Contexts that don’t make sense for a catch expression can just be identifiers, like foo.catch(). This might not be possible for all future keywords.
  • There might also be a need for raw keywords in the other direction, e.g. so the older epoch can still use the new catch functionality somehow. I think this particular case is already served well enough by do catch { ... }, if we choose to stabilize it that way.

Unresolved questions

  • Do macros need any special care with such identifier tokens?
  • Should diagnostics use the r# syntax when printing identifiers that overlap keywords?
  • Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*

Evolving Rust through Epochs
#2

catch was discussed a lot in the “epoch” RFC, but this is a really bad example to work with.
It’s not necessary to make it a keyword, catch for catch blocks can be introduces as a context dependent identifier with zero breakage in practice, which is much less than what is routinely done by library additions and type inference.

This is kinda offtopic for raw identifiers in general, but still.


#3

Do you have a better motivating example?


#4

Off the top of my head, the only non-keywords I can come up with that we might plausibly want to make into (real, non-contextual) keywords someday are catch, async, await, dyn, and union. Though it’s likely half of those will always be mere contextual keywords, and I’ve probably missed some, not to mention excluded a lot of implausible suggestions (like lifetime)…

The big question I have about this RFC is how often any of these maybe-someday-keywords are actually used in public Rust APIs. Over in the epochs thread it was very important to point out that no epoch proposal can guarantee flawless interoperability between differently-epoch’d crates because of issues like this, but this is only one of many possible changes epochs might be used for, and it needs to be demonstrated that this is or will be a significant enough problem in practice to justify new syntax. For what little it’s worth, I’ve personally never seen a public method with one of those names.


#5

Not right now. Maybe in a few years, when language constructions start fighting each other over remaining pieces of syntax.

I want is and is not reserved to some extent for x is Some(..), but it seems doable with context dependent identifiers as well (I also like my for i in is { for j in js { .... } } :slight_smile: ).


#6

I’d rather we just allow aliasing via use e.g. use baseball::catch as catch_baseball; as it localizes the keyword as an identifier to just the use syntax. On the other hand, stropping would be useful if e.g. one wanted to name a function that same thing as a Rust keyword that already exists for interopt with some other system, but I don’t know of any such systems.


#7

This feature is also pretty important for cross-epoch compatible macros, particularly procedural ones. The motivation section should definitely mention that.


#8

Actually, raw identifiers are the opposite of that. Stropping, according to the linked article, is explicitly marking something as not being an identifier.


#9

Would you kick out HashSet::union and BTreeSet::union!?! :wink:

My hope is to firm up the compatibility story. Rather than having to say “it’s broken but hopefully not a problem in practice,” we can offer a definite means of supporting it.

That doesn’t help for methods or associated items.

Can you elaborate what you would say about that? It’s not clear to me what the advice for macro writers would actually be. Should they have to write all their identifiers in the raw form? (I hope not…) See @aturon’s reply about macros too.

It’s inconsistent – the examples in the Modern use section are all marking identifiers.


#10

@cuviper Then extend use to also create aliases to associated items.


#11

Inadequate for methods and fields, though (fields especially).


#12

I like the idea. On the other hand, I prefer following C#'s syntax, that is @.


#13

The biggest problem I see with C#'s syntax is confusion in patterns:

match Some(5) {
    @match @ Some(_) => println!("{:?}", @match),
    None => (),
}

I think it’s guaranteed to be unambiguous, and I have no idea if it would ever be used in the same line in real world usage, but it’s one downside to that syntax I see. I also dislike the r#ident syntax because it’s too visually similar to the raw string syntax.


Some other languages syntaxes:

  • VB.Net: square brackets ([ident]). Definitely seems like it would be ambiguous with array expressions.
  • Swift and Kotlin: backticks (`ident`). I don’t think backticks are used anywhere in Rust yet.
  • Verilog: leading backslash (\ident). I’m not certain if this actually escapes keywords, seems to be for escaping strings that would be invalid as identifiers normally (like \1234), but I think it would work fine for Rust.
  • VHDL93: surrounding backslashes (\ident\). This is different to the other languages in that \ident\ is not treated the same as ident (assuming that ident is not a keyword so the escaping is not strictly required).
  • F#: double backticks (``ident``). Again should be valid in Rust since backticks are unused.
  • JavaScript doesn’t allow escaping reserved words, but does allow using them in some places. For example you can have a reserved word as a property on an object (var a = { if: 5 }), you just cannot use them for variable names, free functions, etc. Not sure whether supporting something like that is possible/makes sense in Rust.
  • Java/C/C++/Go: I believe do not support escaping reserved words as identifiers.

Personally I’m in favour of backticks (`ident`), it seems the most lightweight and matching other contemporary languages has some advantage.


#14

I could imagine a delimited syntax being useful for possible future FFI cases (for example naming methods in languages with different identifier syntax; r#operator==()# or something). I could also imagine it not being useful for that, though.


#15

Actually, @aturon’s method seems more robust. I was envisioning manual escapes being useful and the compiler auto-escaping but these don’t necessarily need to be the same representation. Providing the full epoch information for the macro seems more generally applicable.


#16

I’d really like to have this feature. Especially when generating FFI-code I always have to make sure the omnipresent type and the like get’s mangled accordingly.

I’d also vote for backticks as delimiter (as long as they’re not reserved for things like inline-assembly) and allowing all printable characters in between.


#17

Let’s try a few examples. I’m not trying to be exhaustive, just using a few interesting positions.

use foo::r#catch::bar;
let foo = r#catch { bar: 42 };
let foo = Foo { r#catch: 42 };
foo.r#catch = 42;
foo.r#catch(42);
use foo::@catch::bar;
let foo = @catch { bar: 42 };
let foo = Foo { @catch: 42 };
foo.@catch = 42;
foo.@catch(42);
use foo::\catch::bar;
let foo = \catch { bar: 42 };
let foo = Foo { \catch: 42 };
foo.\catch = 42;
foo.\catch(42);
use foo::`catch`::bar;
let foo = `catch` { bar: 42 };
let foo = Foo { `catch`: 42 };
foo.`catch` = 42;
foo.`catch`(42);

(catch has ceased to look like a real word to me…)

Wouldn’t FFI cases like this have to be mangled anyway? Linkers are usually a limiting factor too.

I don’t actually like allowing non-identifier characters myself. What would you use it for?


#18

since we have 0xFF, 0o007, eh … 0icatch? A bit ugly.

use foo::0icatch::bar;
let foo = 0icatch { bar: 42 };
let foo = Foo { 0icatch: 42 };
foo.0icatch = 42;
foo.0icatch(42);
`catch` is shinning. 

:+1:


#19

I have a use case: Frequently within the compiler, clippy or other code I see things like krate, match_, … Raw identifiers would once and for all create an idiomatic naming scheme for such identifiers.


#20

I have not really read this discussion, so I may be repeating someone else’s point, but I would consider a more general syntax for escaping any identifier (with whitespace, punctuation, etc). We have something like this in Kotlin, and it works really great for naming tests:

fun `test trait with bounds on itself`() = checkByCode("""
    trait Foo<T: Foo<T>> {
        fn foo(&self) { }
    }     //X

    impl Foo<()> for () { }

    fn bar<T: Foo<T>>(t: T) {
        t.foo()
    }    //^

    fn main() { bar(()) }
""")

Here, the stuff between backtics is an identifier.