This is my first RFC, so even if you think this is a terrible idea I would appreciate feedback on the RFC itself.
- Feature name:
string_like_macros
- Start date:
- RFC PR:
- Rust Issue:
Summary
Allow users to define custom literals with a prefixed string.
Motivation
The current macro system requires the input to the macro to be valid rust syntax. However, there is a class of potential macros where it makes sense for the input to be arbitrary text, or a DSL that isn’t valid Rust syntax (for example regex or SQL syntax).
One workaround for this is to pass a string as the only argument to a macro, but this has a few limitations:
- Any compile-time manipulation requires function-like procedural macros, which as far as I can tell aren’te very close to stabilization.
- The signature of the macro doesn’t indicate that the input must be a literal string, and the macro must verify that the input is a string (literal) itself.
- There is a redundant set of braces. This is mostly just aesthetic.
This RFC proposes a method to define macros which perform a compile-time transformation on string literals. This is similar to the “b”, “r”, and “br” prefixes which produce bytestring, raw, and raw bytestring literals respectively.
Example use cases
-
String literals:
S!"Owned string"
compiles to"Owned string".to_string()
-
Hex byte array literals:
x!"0e5fa2"
compiles to[0x0e, 0x5f, 0xa2]
-
Regex literals:
re!"abc.*(def)?"
compiles toRegex::new(r"abc.*(def)?")
-
Interpolation:
interp!"x=$x, y=$y"
compiles toformat!("x={}, y={}", x, y)
Detailed Design
This adds a new type of procedural macro: a string-like macro. This macro is invoked similarly to a function-like macro, or macro-by-example, with the following differences:
- The argument is always bracketed with the " character rather than [], {}, or ()
- The content inside the quotes can be arbitrary text (possibly with escapes, see Unresolved Questions)
If the macro is called foo, then it is used with syntax like foo!"Some string"
To define a string-like procedural macro, the programmer must write a function with a specific signature and attribute. Where foo is the name of the string-like macro:
#[proc_macro_string]
pub fn foo(s: &str) -> TokenStream
Note that in most cases, no hygiene is needed for the resulting TokenStream
, so this could probably be stabalized before hygiene for TokenStream
s has been flusehd out.
Escaping
The input string is not escaped before it is passed to macro function. This is because the macro may treat escapes in a non-standard way, and so that the macro doesn’t have to re-escape the string when including it in the TokenStream
.
However, in order for the lexer to determine when the string literal ends it must have some knowledge of escapes. By default, it will assume that if a " is preceded by a backslash, then the quote is escaped, and therefor not the closing quote, unless the backslash is itself preceded by a backslash (meaning the backslash was escaped). Otherwise, the " character ends the literal.
Alternatively, if the function has the attribute #[proc_macro_string(raw=true)]
, then the string literal is treated with the same semantics as a raw string. That is, it assumes no escapes, and there may be one or more wrapping # characters to ensure a unique delimiter. See Rust Reference.
Similar implementations
How We Teach This
Add documentation to The Rust Reference and the Rust Book.
Drawbacks
It adds some complexity to the language.
Alternatives
- The same proposal, but without the exclamation point between the identifier and the quoted text in the syntax
- Make
ident!"string"
(orident"string"
) syntactic sugar forident!("string")
. This would mean full procedural macros are necessary to do any compile-time manipulation of the string. - No change
Unresolved questions
- How does this interact with RFC 1561?
- Should escaping be done before passing to the macro function, or should that be the responsibility of the macro? If the former, which characters are escaped? If the latter, then how do we know if a quotation mark is escaped or not?
- Should the prefix syntax work for char literals as well?