Macros by example: splicing repetitions which don't have any captured fragments

TL;DR: Should we have some way to reference repetitions themselves at code substitution, without needing their inner fragment metavars?

If one defines repetition in macro-by-example like $(foo)?, he won't be able to perform macro substitution based on that repetition - because it doesn't have captures. However such scenario allows using flag-like keywords in macros without needing to resorting to proc-macros, or just substituting fragment without using any of its captures.

The closest thing I found is macro metavar RFC, although it wasn't implemented and doesn't cover repetition fragments without captures.

I see two options here:

  1. Dummy fragment $frag:dummy (metavar type here just for illustration) which always captures empty set of tokens and expands into empty set of tokens. As a result, $(keyword $kwd:dummy)? would match keyword optional repetition and expand it using $(code for keyword $kwd)?. I've seen some attempts to discuss this variant here and there, though can't find links now.

    Pros:

    • doesn't need special syntax

    Cons:

    • fragment capture may look nontrivial
    • possible issues with parser
  2. Give labels to repetitions themselves. Something like $'rep:(keyword)? expanded using $'rep:(). Haven't seen this variant proposed.

    Pros:

    • clearly references repetition itself without ambiguity
    • shouldn't influence sources parser, as it just gives name to existing macro entity

    Cons

    • requires new syntax which may not fit normal declaration $var:metatype vs substitution $var pattern
Use cases

My personal use-case was a macro which defines certain struct with a subset of fields from a fixed full set. It indeed could've been done using Option, but I had to optimize code size, and normal reference implementation (which used those options) bloated crazily because it had to match all combinations of fields presence to serialize just the needed subset. I wanted to create macro with matcher like $struct_name:ident { $(field_0)? $(field_1)? $(field_2)? } which would generate specific subset of the struct in question based on fields mentioned. Alas, it requires proc macros ATM, so I resorted to manual boilerplate.

5 Likes

I have stumbled on the same issue. It's really easy to hit with a bit more complex macros. The solutions generally involve either proc macros, or tt-munchers, neither of which is satisfactory. Proc macros are complex to write and maintain, require a separate crate, require repeatedly reparsing existing tokens, and don't preserve macro hygiene, or can access the $crate metavariable. tt-munchers are annoying to write, inscrutable in more complex cases, and negatively affect macro expansion performance, since one must basically implement a recursive token parser.

I see issues with your solutions, though. $var: dummy looks like clever solution, but it can cause parsing ambiguities (what do $($v: dummy)* or $($v: dummy)? $($u: dummy)? match?). Maybe it's not a big issue in practice, though, since it probably can be easily detected in simple cases, and anything more complex can be declared a macro error.

I think labeled matchers aren't a good solution, because it looks like it duplicates the functionality of usual matchers, but in a different way. Are there any other use cases for labeled matchers, besides splicing in the same repetitions the same number of times? If not, I don't think having a separate splicing concept would pull its weight.

Since the primary use case is splicing in matchers consisting only of tokens, I have thought about introducing a special metavariable type tok, which just matches a literal token. I.e. if we have $v: tok(bar), then it matches bar with $v having the literal token bar as its value. Thus, you can splice $v at use site as usual. For example, consider a simple macro which duplicates elements of an array literal, preserving the trailing comma:

macro_rules! arr {
    ($($e: expr),+ $($comma: tok(,))?) => {
        [
            $($e, $e),+ $($comma)?
        ]
    };
}

let _x = arr![1];
// expands to
let _x = [1, 1]

let _y = arr![1,];
// expands to
let _y = [1, 1, ]

Here $comma matches a literal token ,, and expands to it.

The issue that I see with my suggestion is that one may want to change the literal token in the expansion. For example, we may want a comma-separated list of expressions to expand into a semicolon-separated list of corresponding statements. In this case a trailing comma should change into a trailing semicolon, and the syntax above doesn't support it. It can be solved with both of your suggestions.

2 Likes

:tok(...) looks quite interesting, though it doesn't solve the case of swapping match with completely different tokens, as you mentioned.

I mentioned that dummy fragment may introduce troubles to parser/matcher. $($tok:dummy) should in principle work like $(), cause compile error.

Labeled repetitions look the cleanest solution to me so far, as they IMO equalize repetitions with other matchers. Though they introduce new syntax :thinking:

Damn I'm confused. Redesign declarative macros from scratch? Looks like task for a lunch to me :rofl:
/sarcasm

1 Like

What does $()* and $()? $()? match today? Or do we not diagnose it because they can't be "observed" since they don't have a way to reference them?

Testing, it appears Rust already handles this by detecting and rejecting empty repetitions:

There's another thing that labeled repetitions would solve: nesting repetitions.

macro_rules! descartes_product{
    ({ $($set_a: expr),* }, { ($set_b: expr),* }) => {
        [
        //  vv---- which is which?
            $(
                $(
                    ($set_a, $set_b)
                ),*
            ),*
        ]
    }
}

Or even this, which I'd expect to work even today, but sadly it doesn't:

macro_rules! array_add{
    ($($mul: expr)?; $($left: expr, $right: expr);*) => {
        [
            $(
                $($mul * )? $left + $right
            ),*
        ]
    }
}

These are of course silly examples, but I've written marcos where such a feature would've helped.


While we are at it, why do we need to specify how many times a given expression will expand, if it is determined by (and needs to be equal to) the capturing part anyway?