Raw r#"..."# string literals in Rust vs named R"abc(...)abc" strings in C++

mgeisler · April 9, 2022, 10:32am

Hi all,

In C++, there is a raw string syntax that looks like this:

std::string query = R"sql(
  SELECT email
  FROM Users
  WHERE username = "foo";
)sql";

The sql part serves as a delimiter, similar to how # is used in raw Rust string literals:

let query = r#"
  SELECT email
  FROM Users
  WHERE username = "foo";
"#;

In addition to making it unnecessary to escape things like double-quotes, the string between R" and ( also serves to describe the content of the string: in this case, readers will know that the string is an SQL query.

Auto formatters like clang-format can use this information to format the code inside the string literal (see RawStringFormats in the style options as well as the example in this StackOverflow answer).

This has been a super useful feature in practice for me. Has something similar been discussed for Rust?

When searching to see if this had been discussed already, I learned that Rust supports arbitrary suffixes on literals. So it's possible to write

macro_rules! blackhole { ($tt:tt) => () }
blackhole!("string"suffix); // OK

today. However, this is very limited since you have to use the syntax inside a macro call. So this doesn't work:

let x = "string"nope;
blackhole!(x);

mgeisler · April 9, 2022, 12:17pm

Ah, it seems the discussion was here:

github.com/rust-lang/rust

RFC: Syntax for raw string literals

opened 09:40PM - 22 Sep 13 UTC

closed 04:15PM - 08 Oct 13 UTC

lilyball

C-enhancement A-grammar

A raw string literal is a string literal that does not interpret any embedded se…quence, meaning no backslash-escapes. A lot of languages (certainly most that I've used) support some syntax for raw string literals. They're useful for embedding any string that wants to have a bunch of backslashes in it (typically because the function the string is passed to wants to interpret them itself), such as regular expressions. Unfortunately, Rust does not have a raw string literal syntax. There's been [a discussion](https://mail.mozilla.org/pipermail/rust-dev/2013-September/005635.html) on the mailing list for the past few days about this. I will try to put a quick summary here. There's two questions at stake. The first is, should Rust have a raw string literal syntax? The second is, if so, what particular syntax should be used? I think the answer to the first is definitely Yes. It's useful enough, and has enough overwhelming precedence in other languages, that we should add it. The question of concrete syntax is the harder one. The syntaxes that have been proposed so far, along with their Pros and Cons: 1. C++11 syntax, e.g. `R"delim(raw text)delim"`. Pros: - Reasonably straightforward - Can embed any character sequence Cons: - Syntax is slightly complicated (editorial note: I think any syntax that's flexible enough to contain any character is going to be considered slightly complicated). 2. Python syntax, e.g. `r"foo"` Pros: - Simple syntax Cons: - Can't embed any character sequence. - Python's implementation has really wacky handling of backslash escapes in conjunction with the quote character. Even reproducing that behavior does not allow for embedding any sequence, as `r"foo\""` evaluates to the string `foo\"` (with the literal backslash). 3. D syntax, e.g. `r"raw text"`, ``raw text``, or `q"(raw text)"`/`q"delim\nraw text\ndelim"` Pros: - Can embed any character sequence (with the third variant) Cons: - The first two forms aren't flexible enough, and the third form is a bit confusing. The delimiter behaves differently depending on whether it's a "nesting" delimiter (one of ([<{), another token, or an identifier. 4. C#/SQL/something else, using a simple raw string syntax such as `r"text"` where doubling up the quote inserts a single quote, as in `r"foo""bar"` Pros: - Simple syntax Cons: - Does not reproduce verbatim every character found in the source sequence, which makes it slightly harder/more confusing to read, and more annoying to do things like pasting a raw string into your source file (e.g. raw HTML). 5. Perl quote-like operators, e.g. `q{text}`. Unfortunately, most viable delimiters will result in an ambiguous parse. 6. Ruby quote-like operators, e.g. `%q{text}`. Unfortunately, this also is ambiguous (with the % token). 7. Lua syntax, e.g. `[=[text]=]` Pros: - Simple syntax - Can embed any character sequence Cons: - Syntax looks decidedly non-string-like - Custom delimiters are limited to sequences of = - Alex Chrichton opined that seeing `println!([[Hello, {}!]], "world")` in an introduction to Rust would be awfully confusing (see previous point about being non-string-like). 8. Go syntax, e.g. ``raw text``. This is one of the variants of D strings as well Pros: - Simple syntax Cons: - Cannot embed any character sequence (notably, cannot embed backtick) - It's difficult or impossible to embed backticks in a markdown code sequence, which will make it awkward to use raw strings in markdown editors. May also be confusing with the usage of `foo` in doc comments. 9. A new syntax using ASCII Control characters STX and ETX Pros: - I don't think there are any Cons: - Can't type the keys on any keyboard - Text editors probably won't render the characters correctly either - Can't technically embed any character sequence, because ETX cannot be embedded, but in fairness it can embed any _printable_ sequence. 10. A syntax proposed over IRC is ``delim"raw text"delim``. Pros: - Can embed any character Cons: - Unusual syntax with no precedent in other languages. Functionally identical to C++11 syntax. - Hard to type in Markdown editors Some form of Heredoc syntax was also suggested, but heredocs are really primarily concerned with embedding multiline input, not raw input. They also have issues around dealing with indentation and the first/last newline. During this discussion, only two Rust team members (that I'm aware of) chimed in. Alex Chricton raised issues with the Lua syntax, and threw out the suggestion of Go's syntax, though only as something to consider rather than a recommendation. Felix Klock expressed a preference for C++11 syntax, and more generally stated that he wants a syntax with user-delimited sequences. There was also at least one community member in favor of C++11 syntax. My own preference at this point is for C++11 syntax as well. At the very least, something similar to C++11 syntax, that shares all of its properties, but there seems to be no value in inventing a new syntax when there's precedent in C++11.

and the feature was then implemented in

mgeisler · April 9, 2022, 8:14pm

My question is now how people feel about the raw strings now that we've had them for a few years?

Personally, I'm repeatedly bitten by quoting strings which start or end with a double-quote: r#""hello world""#. The doubled double-quote is hard to decipher when that happens. In addition, the # character is quite a "hard" or "noisy" character to me. So overall, I feel that Rust raw strings are harder to read than C++ raw strings.

Have others felt the same?

chrefr · April 9, 2022, 9:35pm

I start with zero #'s and add more as necessary. I usually start the actual string after a newline, and also put a newline before the end, so I don't feel it's rather clumsy.

mgeisler · April 10, 2022, 3:16pm

Yeah, I do the same. But how does it compare with C++ for you? Assuming you use C++, of course

I've now read through the discussion in #9411 and the use of delimiters to carry semantic information does not seem to have been discussed. Perhaps this was not a thing in C++ back in 2013?

CAD97 · April 10, 2022, 5:24pm

FWIW, the way to add semantic language injection information to Rust source would be one of three things, currently:

stable: comments

process(
    // #ide:inject-language=sql
    r##"
        ...
    "##,
);

unstable: expression attributes

process(#[ide::inject_language(sql)] r##"
    ...
"##);

stable (with major caveats): proc macros
```
process(sql!(
    ...
));
```

IIRC, the comment method is used by IntelliJ IDEA for language injection in Java and Kotlin sources.

mgeisler · April 10, 2022, 5:42pm

Thanks for that overview! I've used the proc-macro approach recently via the quote! macro for embedding Rust code. It's been awesome to have proper indentation support as well as syntax highlighting inside the macro.

I use Emacs and I think it's more by accident than by design that I get those things, but it's nevertheless been very helpful

chrefr · April 10, 2022, 11:52pm

I almost don't use C++ for the same things I use Rust now, and so I barely used C++ raw strings (only once, IIRC), but I didn't realized the text may have semantic meaning, and so I just wrote some gibberish

system · July 9, 2022, 11:53pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[feature request] format to raw Style and Formatting	5	623	December 21, 2022
String::from("...") => d"..." language design	27	2383	March 25, 2019
'r&' - coherent and minimal raw pointer syntax language design	6	888	February 19, 2024
pre-RFC: Extend Hash-sequences to all String Literals	9	1157	November 13, 2023
[pre-RFC] custom string literals language design	7	3982	March 25, 2019

Raw r#"..."# string literals in Rust vs named R"abc(...)abc" strings in C++

Related topics