Concept: Resolving macro_rules! and proc_macro tokens

...aka removing the "macro_rules! prevents new operators" concern.

This is a plan to convert macro_rules! from its current behavior that relies on "old style" glued tokens and token splitting to a new "mostly compatible" behavior that uses "new style" decomposed tokens with "joint" flags (that remember whether they are followed directly by another operator character).

  • For macros defined in and called from the new edition(s):
    • If the macro pattern contains Punct { joint: true }, the call site must have Punct { joint: true }. This means that e.g. a macro pattern of ++ must be provided ++ and not + +.
    • If the macro pattern contains Punct { joint: false }, the call site may provide a punct with or without the joint flag. This means that e.g. a macro pattern of + + can be provided ++.
    • If the macro pattern captures a $:tt matcher, then the matcher consumes all "joint" punctuation followed by the trailing non-"joint" punctuation. This is, like all $:tt matches, completely transparent and can be further reparsed/broken down by further matches.
  • For macros defined in the new edition(s) and called from the old edition(s):
    • Behave as shown above for macros defined in the new edition(s). No compatibility hazards here.
  • For macros defined in and called from the old edition(s):
    • Punct tokens only require jointness if they're the leading characters of a "known" operator cluster (the current behavior).
    • $:tt matchers always match exactly one "known" operator, even if the trailing punctuation is still joint to further punctuation.
  • For macros defined in the old edition(s) and called from the new edition(s):
    • Punct tokens: either behavior is reasonable, actually! Macros that write e.g. ~= at their definition site likely expect to be invoked with ~=, not ~ =. Requiring jointness more aggressively would be more consistent with other called macros from the new edition(s), but sticking to the "well known" operator clusters is more consistent to the behavior that exists where the macro was defined.
    • $:tt matchers match only the "well known" operators from the edition the macro is defined in. Alternatively, $:tt matchers could consume the whole "well known" operators of the calling edition, or consume the entire joint operator cluster like in new-edition-defined macros. The joint-cluster approach is more consistent with other macros called from the new edition, and is magically forward-compatible with new operators in the new edition, but is less consistent with the environment the macro was defined for. (For example, maybe I have a macro like ($x:ident $op:tt= $expr:expr) => { $x = $x $op $expr } that relies on the known operator set of my edition to fake &&= and ||= operators. Perhaps that can be solved with token splitting, though?) Using "well known" operators of the new edition is a nonstarter, as the point is to allow new operators within an edition; still exposing the known set to macros defined in old editions would require adding new "well known" operator clusters only in a new edition.
    • Ultimately I think having operator jointness behavior depend solely on the edition that the macro was defined in is more consistent than trying to be clever with "improving" things when called from a newer edition.

I think the described "new" behavior better matches intuition on how this should behave, anyway. Proc macros can already define new custom operators that have required jointness, and while the compiler still internally uses "glued" tokens and token splitting, it is desired to move it to using the proc macro model (and perhaps something even more decomposed) already.

I'd like to be able to stop singing "no new punctuation" every time there's a proposal that includes a new multi-character operator. What do you think of this plan for eliminating that semantic dependency on the set of operators known to Rust (for macros defined in new editions, anyway)?

1 Like

This specific concern would be naturally resolved with, when $tt becomes a usual parser-driven matcher, like $item, and not “whatever “token” happens to mean in the current impl”.

The overall model, at least at the first glance, does seem better to me. But the benefits seem marginal, and maintaining two impls in the compiler here would be painful I think. So, maybe we should use this new semantics only for macros 2.0.