[pre-RFC] custom string literals


#1

This is my first RFC, so even if you think this is a terrible idea I would appreciate feedback on the RFC itself.

  • Feature name: string_like_macros
  • Start date:
  • RFC PR:
  • Rust Issue:

Summary

Allow users to define custom literals with a prefixed string.

Motivation

The current macro system requires the input to the macro to be valid rust syntax. However, there is a class of potential macros where it makes sense for the input to be arbitrary text, or a DSL that isn’t valid Rust syntax (for example regex or SQL syntax).

One workaround for this is to pass a string as the only argument to a macro, but this has a few limitations:

  • Any compile-time manipulation requires function-like procedural macros, which as far as I can tell aren’te very close to stabilization.
  • The signature of the macro doesn’t indicate that the input must be a literal string, and the macro must verify that the input is a string (literal) itself.
  • There is a redundant set of braces. This is mostly just aesthetic.

This RFC proposes a method to define macros which perform a compile-time transformation on string literals. This is similar to the “b”, “r”, and “br” prefixes which produce bytestring, raw, and raw bytestring literals respectively.

Example use cases

  • String literals:S!"Owned string" compiles to "Owned string".to_string()
  • Hex byte array literals: x!"0e5fa2" compiles to [0x0e, 0x5f, 0xa2]
  • Regex literals: re!"abc.*(def)?" compiles to Regex::new(r"abc.*(def)?")
  • Interpolation: interp!"x=$x, y=$y" compiles to format!("x={}, y={}", x, y)

Detailed Design

This adds a new type of procedural macro: a string-like macro. This macro is invoked similarly to a function-like macro, or macro-by-example, with the following differences:

  • The argument is always bracketed with the " character rather than [], {}, or ()
  • The content inside the quotes can be arbitrary text (possibly with escapes, see Unresolved Questions)

If the macro is called foo, then it is used with syntax like foo!"Some string"

To define a string-like procedural macro, the programmer must write a function with a specific signature and attribute. Where foo is the name of the string-like macro:

  #[proc_macro_string]
  pub fn foo(s: &str) -> TokenStream

Note that in most cases, no hygiene is needed for the resulting TokenStream, so this could probably be stabalized before hygiene for TokenStreams has been flusehd out.

Escaping

The input string is not escaped before it is passed to macro function. This is because the macro may treat escapes in a non-standard way, and so that the macro doesn’t have to re-escape the string when including it in the TokenStream.

However, in order for the lexer to determine when the string literal ends it must have some knowledge of escapes. By default, it will assume that if a " is preceded by a backslash, then the quote is escaped, and therefor not the closing quote, unless the backslash is itself preceded by a backslash (meaning the backslash was escaped). Otherwise, the " character ends the literal.

Alternatively, if the function has the attribute #[proc_macro_string(raw=true)], then the string literal is treated with the same semantics as a raw string. That is, it assumes no escapes, and there may be one or more wrapping # characters to ensure a unique delimiter. See Rust Reference.

Similar implementations

How We Teach This

Add documentation to The Rust Reference and the Rust Book.

Drawbacks

It adds some complexity to the language.

Alternatives

  • The same proposal, but without the exclamation point between the identifier and the quoted text in the syntax
  • Make ident!"string" (or ident"string") syntactic sugar for ident!("string"). This would mean full procedural macros are necessary to do any compile-time manipulation of the string.
  • No change

Unresolved questions

  • How does this interact with RFC 1561?
  • Should escaping be done before passing to the macro function, or should that be the responsibility of the macro? If the former, which characters are escaped? If the latter, then how do we know if a quotation mark is escaped or not?
  • Should the prefix syntax work for char literals as well?

[pre-RFC] Allowing string literals to be either &'static str or String, similar to numeric literals
Pre-RFC: Custom suffixes for integer and float literals
#2

Given that RFC 1913 (function-like procedural macro) is rescinded due to strong opposition about the motivation and other issues (custom_derive got a special pass due to serde and diesel), I highly doubt if this RFC will pass.

  1. it is even less needed than function-like procedural macros

  2. RFC 1576 is already merged so you could use $s:literal. This specifier is not restricted to string literals but still pretty close.

  3. If you don’t care about the redundant set of braces, things like x!("123abc") is already possible today in stable Rust using the procedural-masquerade crate.


And I prefer the second alternative. The advantage is simple transformations like macro_rules! S { ($s:expr) => { $s.to_string() } } and macro_rules! re { ($s:expr) => { Regex::new($s).unwrap() } } do not need a plugin. It is a very minor change compared with the main text. Also, interaction with other features like RFC-1561 is not a problem since it is just a syntactic sugar of existing system.

But you have to write re!r"\d+" instead of re!"\d+".

(But in turn you can write sql!r#"SELECT * FROM a WHERE "Column" = {x}"# instead of sql!"SELECT * FROM a WHERE \"Column\" = {x}". I don’t know how you are going to make #[proc_macro_string(raw=true)] work with this.)

And for simple transformations like S and re, there is not much advantage compared with an inline function.


The D programming language take a similar approach to the second alternative: ! is considered an infix operator and x!y is equivalent to x!(y), not just restricted to strings. D users claimed great ergonomic improvement by removing the (…), but since x!y is the syntax for templated things in D which is used everywhere, it is probably not a comparable experience here.

C++11 uses operator overloading to provide user-defined string literals ("foo"_bar). Similarly ES6’s template literal (bar `foo`) is also a sugar for a function call. This runtime approach is not capable if we want to implement interp!.


#3

I wasn’t aware of RFC 1576. With that I think the second alternative works well for simple transformations (although there would probably be better error messages if there was a literal_stringfragment specifier).

And for simple transformations like S and re, there is not much advantage compared with an inline function.

The main advantage is the shorter syntax. (The S macro is a possible solution to needing a .to_string() to get a String from a string literal). But to your point, there’s no reason these simple transformations need to be macros. The second alternative could be expanded to also translate S"foo" to S("foo") or something like that.

However, I think the most interesting applications would be procedural macros, such as interpolation, interpolation with proper contextual escaping (for example sql or html), and other more complicated transformations, such as building a regex state machine at compile time, translating a usage message into an argument parser, or building a parser from a grammar in a DSL (at compile time). The last two could be done by functions if full CTFE was supported. Interpolation would either require a procedural macro or something like scala’s custom interpolations, where the string is split into parts by the compiler and a function is called with the parts and the values to interpolate with.

For procedural macros, it would be nice to have something that reduces the boilerplate to make sure the TokenStream is a string literal and extract the contained string. But that could easily be a function in the proc_macro crate.


#4

I like this proposal, e.g. hex-literal could use it. Another possible direction is to use slightly changed postfix macros so we could write 100.ms! which will get desugared to Duration::from_millis(100), thus making using units of measure slightly more ergonomic.


#5

Note that rust syntax already supports literal suffixes. That is, 92u32 is not a special case, arbitrary suffixes are allowed. IIRC, this works for strings as well, and the intention was to use this syntax for custom literals.


#6

No, that is not the case. There are some restrictions, but the macro input doesn’t have to be complete, syntactically correct Rust.

I’m also not sure I understand the motivation for “custom string literals”. What you propose isn’t really custom string literals, is it? It seems to me that what you have in mind is macros which are able to parse string input. Which is a fine idea, but:

  1. it doesn’t really need (and IMO shouldn’t introduce) any new syntax, because it’s just macros all the way down, and
  2. once function-like procedural macros are stabilized, this will be possible without any further addition to the language, so this RFC would be redundant.

#7

Yeah, I think doing this as postfix procedural macros is probably a better way to go.