Available here: https://github.com/cmr/rfcs/blob/macro-future-proofing/text/0000-macro-future-proofing.md Snapshot as of posting:
- Start Date: 2014-12-21
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
NOTE: Draft, not finalized.
Key Terminology
-
macro
: anything invokable asfoo!(...)
in source code. -
MBE
: macro-by-example, a macro defined bymacro_rules
. -
matcher
: the left-hand-side of a rule in amacro_rules
invocation. -
macro parser
: the bit of code in the Rust parser that will parse the input using a grammar derived from all of the matchers. -
NT
: non-terminal, the various āmeta-variablesā that can appear in a matcher. -
fragment
: The piece of Rust syntax that an NT can accept. -
fragment specifier
: The identifier in an NT that specifies which fragment the NT accepts. -
language
: a context-free language.
Example:
macro_rules! i_am_an_mbe {
(start $foo:expr end) => ($foo)
}
(start $foo:expr end)
is a matcher, $foo
is an NT with expr
as its
fragment specifier.
Summary
Future-proof the allowed forms that input to an MBE can take by requiring certain delimiters following NTs in a matcher. In the future, it will be possible to lift these restrictions backwards compatibly if desired.
Motivation
In current Rust, the macro_rules
parser is very liberal in what it accepts
in a matcher. This can cause problems, because it is possible to write an
MBE which corresponds to an ambiguous grammar. When an MBE is invoked, if the
macro parser encounters an ambiguity while parsing, it will bail out with a
"local ambiguity" error. As an example for this, take the following MBE:
macro_rules! foo {
($($foo:expr)* $bar:block) => (/*...*/)
};
Attempts to invoke this MBE will never succeed, because the macro parser
will always emit an ambiguity error rather than make a choice when presented
an ambiguity. In particular, it needs to decide when to stop accepting
expressions for foo
and look for a block for bar
(noting that blocks are
valid expressions). Situations like this are inherent to the macro system. On
the other hand, itās possible to write an unambiguous matcher that becomes
ambiguous due to changes in the syntax for the various fragments. As a
concrete example:
macro_rules! bar {
($in:ty ( $($arg:ident)*, ) -> $out:ty;) => (/*...*/)
};
When the type syntax was extended to include the unboxed closure traits,
an input such as FnMut(i8, u8) -> i8;
became ambiguous. The goal of this
proposal is to prevent such scenarios in the future by requiring certain
"delimiter tokens" after an NT. When extending Rustās syntax in the future,
ambiguity need only be considered when combined with these sets of delimiters,
rather than any possible arbitrary matcher.
Detailed design
The algorithm for recognizing valid matchers M
follows. Note that a matcher
is merely a token tree. A āsimple NTā is an NT without repetitions. That is,
$foo:ty
is a simple NT but $($foo:ty)+
is not. FOLLOW(NT)
is the set of
allowed tokens for the given NTās fragment specifier, and is defined below.
F
is used for representing the separator in complex NTs. In $($foo:ty),+
,
F
would be ,
, and for $($foo:ty)+
, F
would be EOF
.
input: a token tree M
representing a matcher and a token F
output: whether M is valid
- If there are no tokens in
M
, accept. - For each token
T
inM
:- If
T
is not an NT, continue. - If
T
is a simple NT, look ahead to the next tokenT'
inM
. IfT'
isEOF
, replaceT'
withF
. IfT'
is in the setFOLLOW(NT)
,T'
is EOF,T'
is any NT, orT'
is any identifier, continue. Else, reject. - Else,
T
is a complex NT.- If
T
has the form$(...)+
or$(...)*
, run the algorithm on the contents withF
set toEOF
. If it accepts, continue, else, reject. - If
T
has the form$(...)U+
or$(...)U*
for some tokenU
, run the algorithm on the contents withF
set toU
. If it accepts, continue, else, reject.
- If
- If
This algorithm should be run on every matcher in every macro_rules
invocation, with F
as EOF
. If it rejects a matcher, an error should be
emitted and compilation should not complete.
The current legal fragment specifiers are: item
, block
, stmt
, pat
,
expr
, ty
, ident
, path
, meta
, and tt
.
-
FOLLOW(item)
={}
-
FOLLOW(block)
=FOLLOW(expr)
-
FOLLOW(stmt)
=FOLLOW(expr)
-
FOLLOW(pat)
={FatArrow, Comma, Pipe}
-
FOLLOW(expr)
={Comma, FatArrow, CloseBrace, CloseParen, Lit}
(whereLit
is any literal, string or numeric) -
FOLLOW(ty)
={Comma, Eq, Gt, Lt, RArrow, FatArrow, OpenBrace, OpenParen, CloseBrace, CloseParen}
-
FOLLOW(ident)
= any token -
FOLLOW(path)
= any token -
FOLLOW(meta)
= any token -
FOLLOW(tt)
= any token
Note: the FOLLOW
sets as given are based on every MBE in the Rust
distribution, but should probably be tuned before the RFC is accepted.
Drawbacks
It does restrict the input to a MBE, but the choice of delimiters provides reasonable freedom.
Alternatives
- Fix the syntax that a fragment can parse. This would create a situation
where a future MBE might not be able to accept certain inputs because the
input uses newer features than the fragment that was fixed at 1.0. For
example, in the
bar
MBE above, if thety
fragment was fixed before the unboxed closure sugar was introduced, the MBE would not be able to accept such a type. While this approach is feasible, it would cause unnecessary confusion for future users of MBEs when they canāt put certain perfectly valid Rust code in the input to an MBE. Versioned fragments could avoid this problem but only for new code. - Keep
macro_rules
unstable. Given the great syntactical abstraction thatmacro_rules
provides, it would be a shame for it to be unusable in a release version of Rust. If evermacro_rules
were to be stabilized, this same issue would come up. - Do nothing. This is very dangerous, and has the potential to essentially freeze Rustās syntax for fear of accidentally breaking a macro.
Unresolved questions
Are the given FOLLOW
sets adequate?