pre-RFC: Extend Hash-sequences to all String Literals

pitaj · August 7, 2023, 4:12am

Currently, one can use extra hash-marks with raw string literals. This is vital, since raw string literals cannot use a backslash to escape inner quotation marks.

let foo = r#"Hello, Bob "The Builder" Smith"#;

The main purpose for raw string literals is to avoid the need to escape backslashes, like those in Regex syntax (Regex::new(r"\w+")). But because quotation marks are often used in user interfaces and in various forms of code, people will reach for raw string literals just for the quotation-mark feature.

But there are many use cases where someone wants to avoid escaping quotation marks, but would still like to use other escape sequences (and therefore can't use byte strings):

null-terminated strings

let foo = #"first line: "Hello, World!" \x0"#;

byte strings

let bytes = b#"some stuff in another encoding: "\xF7\x84" "#;

splitting a long string literal across lines

let long = #"\
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do \
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut \
enim ad minim veniam, quis nostrud exercitation ullamco laboris \
nisi ut aliquip ex ea commodo consequat.\
"#;

I'm sure there are more reasons I haven't thought of. People may even just prefer not switching to a different kind of string.

Proposed Change

I propose adding support for symmetric # wrapping, behaving exactly as it does on raw string literals (but without the r prefix), to strings literals and byte string literals.

jrose · August 7, 2023, 5:20am

I highly recommend following Swift’s lead if anything is done here with non-raw hash-delimited string literals: adding hashes to the string boundaries adds hashes to the escape characters used within the string. So \0 becomes \#0*, {foo} becomes #{foo}, etc. As with raw strings, if you need hashes in your string body that would form an escape sequence, you can add additional hashes to the start and end delimiters.

This system is flexible enough that Swift does not have a raw string syntax and nobody has asked for one since this proposal was implemented.

github.com

apple/swift-evolution/blob/main/proposals/0200-raw-string-escaping.md

# Enhancing String Literals Delimiters to Support Raw Text

* Proposal: [SE-0200](0200-raw-string-escaping.md)
* Authors: [John Holdsworth](https://github.com/johnno1962), [Becca Royal-Gordon](https://github.com/beccadax), [Erica Sadun](https://github.com/erica)
* Review Manager: [Doug Gregor](https://github.com/DougGregor)
* Previous Revision: [1](https://github.com/apple/swift-evolution/blob/102b2f2770f0dab29f254a254063847388647a4a/proposals/0200-raw-string-escaping.md)
* Status: **Implemented (Swift 5)**
* Implementation: [apple/swift#17668](https://github.com/apple/swift/pull/17668)
* Bugs: [SR-6362](https://bugs.swift.org/browse/SR-6362)
* Review: [Discussion thread](https://forums.swift.org/t/se-0200-enhancing-string-literals-delimiters-to-support-raw-text/15420), [Announcement thread](https://forums.swift.org/t/accepted-se-0200-enhancing-string-literals-delimiters-to-support-raw-text/15822/2)

## Introduction

Like many computer languages, Swift uses an escape character (`\`) to create a special interpretation of subsequent characters within a string literal. Escape character sequences represent a set of predefined, non-printing characters as well as string delimiters (the double quote), the escape character (the backslash itself), and (uniquely in Swift) to allow in-string expression interpolation.

Escape characters provide useful and necessary capabilities but strings containing many escape sequences are difficult to read. Other languages have solved this problem by providing an alternate "raw" string literal syntax which does not process escape sequences. As the name suggests, raw string literals allow you to use "raw" text, incorporating backslashes and double quotes without escaping.

We propose to alter Swift's string literal design to do the same, using a new design which we believe fits Swift's simple and clean syntax. This design supports both single-line and multi-line string literals, and can contain any content whatsoever.

This proposal has been extensively revised based on the Core Team feedback for [SE-0200](https://forums.swift.org/t/returned-for-revision-se-0200-raw-mode-string-literals/11630). It was discussed on the [Swift online forums](https://forums.swift.org/t/pure-bikeshedding-raw-strings-why-yes-again/13866).

This file has been truncated. show original

* Or #\0, Swift had consistency reasons to prefer \#0 but Rust already has multiple kinds of escape characters in format strings, at least.

jhpratt · August 7, 2023, 1:39pm

I've done this multiple times on accident, so I'd definitely love to see it.

pitaj · August 7, 2023, 5:13pm

Wow. Those Swift strings offer such an elegant way to merge our normal and raw string literals. After seeing that, I think we should just copy those wholesale. I love your idea of using the same hash mechanism for format placeholders as well.

I would probably prefer #\ for consistency with #{, but this is fantastic. Thank you so much for bringing these up.

josh · August 7, 2023, 7:26pm

Yeah, this seems like a great plan.

I do think there's an advantage to using \# rather than #\. If we use \#, we can unambiguously parse that whether you're in a #"-delimited string or not; that then allows us to either permit it all the time, or parse it and issue a rustfix-able error if we'd prefer not to allow it. If we use #\, we can't unambiguously parse that in non-#"-delimited strings, because it might have been meant as a literal # followed by an escape sequence.

pitaj · August 15, 2023, 3:46am

Opened a draft PR for the RFC:

github.com/rust-lang/rfcs

Unified String Literals

rust-lang:master ← pitaj:master

opened 03:45AM - 15 Aug 23 UTC

pitaj

+308 -0

[Rendered](https://github.com/pitaj/rfcs/blob/1a392007564d0bc8ffc62ad4d1fc133ec5…fc433d/text/0000-unified-string-literals.md) This RFC proposes to unify the syntax of the existing _string literal_ and _raw string literal_ forms, supporting both the use of escape sequences and avoiding the need to escape backslashes and quotation marks. This proposal also uses the new syntax to improve format string ergonomics, reducing the need for double-brace escapes.

steffahn · August 15, 2023, 4:19am

Regarding format strings... at the moment, I believe raw string literals and ordinary string literals are essentially unified into the same kind of thing for proc macros to process them.^[1] Which sounds like a reasonable thing to do for these guarded strings, too, especially as it automatically gives compatibility for existing macros. Unless you want format_args (which I think of as somehring that should behave like an ordinary proc macro) to be parsing the braces based on the kind of the string literal at hand, then they can't be the same thing...

I'm not an expert on this though. Perhaps macros are somehow able to detect raw string literals already? ↩︎

pitaj · August 15, 2023, 1:35pm

indoc detects raw strings using the span

I don't believe there's a way for proc macros to even get the string literal token to start with. I don't even see an API that allows you to check what type of literal it is, let alone which form of string 1

I haven't looked into it but I'm guessing syn has to parse the span directly as well. In which case it can pretty easily provide that information to proc macros.

Unless you want format_args (which I think of as somehring that should behave like an ordinary proc macro) to be parsing the braces based on the kind of the string literal at hand, then they can't be the same thing...

That's exactly what I want, and I think the ergonomics are worth it.

steffahn · August 15, 2023, 1:47pm

Looks like I misremembered, indeed. Only syn does have an API that ends up unifying raw string literals with string literals, and it does do some parsing of the same kind to detect the token, and to extract the un-escaped string. The basic proc_macro API only gives access to the escaped string, anyways (via the Display/ToString implementation).

system · November 13, 2023, 1:47pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Raw r#"..."# string literals in Rust vs named R"abc(...)abc" strings in C++ language design	8	2592	July 9, 2022
[1st April joke] [pre-RFC] Improving the ergonomics of creating owned string objects language design	16	3027	March 25, 2019
Pre-RFC: `String` literals through prefixes language design	19	6078	March 25, 2019
pre-pre-...-pre-RFC: Long String literal support language design	3	1379	March 25, 2019
[feature request] format to raw Style and Formatting	5	548	December 21, 2022

pre-RFC: Extend Hash-sequences to all String Literals

Proposed Change

Related Topics