Idea: allow escaping tokens in macros so that they can be used as separators. For example, suppose that # is the escape character (please bikeshed), the following would allow using + as a separator, which is currently not accepted.
macro_rules! foo {
($(a)#++) => {}
}
This macro would accept:
foo!(a+a+a+a);
Of course, to use # as a separator, one would use ##.
I believe that this would allow using any token as a separator, including some useful tokens that are currently forbidden (e.g. +, *).
I like this idea. Itâs basically what I had in mind. I concur with @gbutler though: using the conventional \ as an escape char would sound most sensible.
Since this isnât a string literal, I would personally expect \t to just produce t and probably a warning about an unnecessary backslash telling me to change it to either t or \\t.
But that aside, doesnât the ambiguity objection apply to any potential escape metacharacter? Does \ have some pre-existing meaning in Rust macros that makes it worse than #?
\ has no meaing, but # does (in particular, $($x:ident)#* is currently valid and matches a#b), so \ seems like a better choice. (Inasmuch as thereâs any good choice here; every option looks ugly to me, but itâs a rare use case anyway.) However, weâd have to modify the definition of a token to be able to use \.
Terrible thought: what if separators didnât have to be a single terminal?
// Matches `he said that you said that I talk too much`
($($x:ident)(said that)* talk too much) => { ... }
And look! We get escaping for (cough) âfree!â
// Matches `Add + Sub + Mul + Div`
($($x:ident)(+)*) => { ... }
(actually I really quite like how it looks. And itâs probably not an obscene implementation challenge as long as it is limited to fixed token sequences⌠but a design like this just begs to support matchers appearing in the separator)
This sounds cool⌠but I think the lexer will want your blood. Though, maybe it would be enough for $(...)(x y)* to parse for two lexed terminals? Hereâs my weak attempt at something pathological:
macro_rules! foo {
($($x:ident)(+-)+ $k:ident +) => {}
}
foo!(a +- b +);
// ^ ^ once we get *here*, we EOF and see no - sign, then realize
// | that `$k` captures `b`, but we have no way to know
// +-- that over *here*. I can imagine a worse scenario than this
// that induces really nasty lookahead
Also, general question for the thread, whatâs the tracking issue for Macros (by example) 2.0? Iâm curious what the UX is like right now. As nice as this idea is, it feels like a bandaid for making macro_rules! less painful, and I think this kind of thing should be built into the design of of macro, so that parsing isnât nearly this exciting.
Doing something like that in the lexer would be a major language change because you would have to teach the lexer about all sorts of new tokens. Doing it in the parser is conceivable but sounds rather hard because you would need to support arbitrary amounts of lookahead (There might be a more efficient way, but I donât know it)âŚ
In general, I think starting with single tokens is probably enough for now.
What I wrote down is unambiguous, sure, but my point is that I think you either have to make the lexer hate you by re-lexing the contents of a macro call after the definition is parsed, or you need unbounded lookahead, neither of which is an exciting prospect.
Iâve thought that the whole grammar here needs to change with macro macros. We should have a grammar in which these ambiguities just donât occur. The parens around the separator seems good; it could be mandatory in macro, so that no separator would be written $($x:ident)()* instead of $($x:ident)*, avoiding any potential ambiguity.
Hmm, letâs see how they look nested in close quarters:
// nesting on the left
previously: $($($(a)* b)* c)*
now: $($($(a)()* b)()* c)()*
// nesting on the right
previously: $(c $(b $(a)*)*)*
now: $(c $(b $(a)()*)()*)()* // (ouch)
$(c $[b ${a}{}*][]*)()* // if we could customize the delimiters...
// (still ouch?)
and a âblock-styleâ repetition, for people who like to format it that way:
($($Add:ident for $Type;)()*) => {
$(
impl $Add for $Type {
...
}
)()*
};
// or the "bunched-together egyptian style" sometimes used
($($Add:ident for $Type;)()*) => {$(
impl $Add for $Type {
...
}
)()*};
A piece of a terrible incremental muncher:
(
// Munch the options one at a time into the $opt list.
($b:expr) [$kind:tt ($($opt:tt)()*)]
#opts# [$($opt_tok:tt)()+] $($rest:tt)()* // find a [] tt
)
=> {arg_impl!{
($b) [$kind ($($opt)()* [$($opt_tok)()+] )] // append to end
#opts# $($rest)()* // check for another
}};
It doesnât seem too bad except for the ânesting on the rightâ example. Though if it occurred, I think I might like to see a âtoken streamâ matcher ($x:ts or maybe $x:tts) that matches like $($x:tt)()*.