pre-RFC: Extend Hash-sequences to all String Literals

Currently, one can use extra hash-marks with raw string literals. This is vital, since raw string literals cannot use a backslash to escape inner quotation marks.

let foo = r#"Hello, Bob "The Builder" Smith"#;

The main purpose for raw string literals is to avoid the need to escape backslashes, like those in Regex syntax (Regex::new(r"\w+")). But because quotation marks are often used in user interfaces and in various forms of code, people will reach for raw string literals just for the quotation-mark feature.

But there are many use cases where someone wants to avoid escaping quotation marks, but would still like to use other escape sequences (and therefore can't use byte strings):

  1. null-terminated strings
let foo = #"first line: "Hello, World!" \x0"#;
  1. byte strings
let bytes = b#"some stuff in another encoding: "\xF7\x84" "#;
  1. splitting a long string literal across lines
let long = #"\
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do \
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut \
enim ad minim veniam, quis nostrud exercitation ullamco laboris \
nisi ut aliquip ex ea commodo consequat.\
"#;

I'm sure there are more reasons I haven't thought of. People may even just prefer not switching to a different kind of string.

Proposed Change

I propose adding support for symmetric # wrapping, behaving exactly as it does on raw string literals (but without the r prefix), to strings literals and byte string literals.

7 Likes

I highly recommend following Swift’s lead if anything is done here with non-raw hash-delimited string literals: adding hashes to the string boundaries adds hashes to the escape characters used within the string. So \0 becomes \#0*, {foo} becomes #{foo}, etc. As with raw strings, if you need hashes in your string body that would form an escape sequence, you can add additional hashes to the start and end delimiters.

This system is flexible enough that Swift does not have a raw string syntax and nobody has asked for one since this proposal was implemented.

* Or #\0, Swift had consistency reasons to prefer \#0 but Rust already has multiple kinds of escape characters in format strings, at least.

13 Likes

I've done this multiple times on accident, so I'd definitely love to see it.

Wow. Those Swift strings offer such an elegant way to merge our normal and raw string literals. After seeing that, I think we should just copy those wholesale. I love your idea of using the same hash mechanism for format placeholders as well.

I would probably prefer #\ for consistency with #{, but this is fantastic. Thank you so much for bringing these up.

Yeah, this seems like a great plan.

I do think there's an advantage to using \# rather than #\. If we use \#, we can unambiguously parse that whether you're in a #"-delimited string or not; that then allows us to either permit it all the time, or parse it and issue a rustfix-able error if we'd prefer not to allow it. If we use #\, we can't unambiguously parse that in non-#"-delimited strings, because it might have been meant as a literal # followed by an escape sequence.

7 Likes

Opened a draft PR for the RFC:

Regarding format strings... at the moment, I believe raw string literals and ordinary string literals are essentially unified into the same kind of thing for proc macros to process them.[1] Which sounds like a reasonable thing to do for these guarded strings, too, especially as it automatically gives compatibility for existing macros. Unless you want format_args (which I think of as somehring that should behave like an ordinary proc macro) to be parsing the braces based on the kind of the string literal at hand, then they can't be the same thing...


  1. I'm not an expert on this though. Perhaps macros are somehow able to detect raw string literals already? ↩︎

2 Likes

indoc detects raw strings using the span

I don't believe there's a way for proc macros to even get the string literal token to start with. I don't even see an API that allows you to check what type of literal it is, let alone which form of string 1

I haven't looked into it but I'm guessing syn has to parse the span directly as well. In which case it can pretty easily provide that information to proc macros.

Unless you want format_args (which I think of as somehring that should behave like an ordinary proc macro) to be parsing the braces based on the kind of the string literal at hand, then they can't be the same thing...

That's exactly what I want, and I think the ergonomics are worth it.

1 Like

Looks like I misremembered, indeed. Only syn does have an API that ends up unifying raw string literals with string literals, and it does do some parsing of the same kind to detect the token, and to extract the un-escaped string. The basic proc_macro API only gives access to the escaped string, anyways (via the Display/ToString implementation).

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.