Pre-RFC: at most one repetition macro patterns


#1

EDIT: updated alternatives

EDIT 2: more updates to alternatives, especially {m,n}, and add @kennytm’s comment on ambiguity

Summary

Add a repetition specifiers to macros to repeat a pattern at most once: $(pat)?. ? behaves like + or * but represents at most one repetition of pat.

Motivation

There are two specific use cases in mind:

  1. Any macro rule with optional parts. Currently, you just have to write two rules and possibly have one “desugar” to the other.

    macro_rules! foo {
      (do $b:block) => {
        $b
      }
      (do $b1:block and $b2:block) => {
        foo!($b1)
        $b2
      }
    }
    

    Under this RFC, one would simply write:

    macro_rules! foo {
      (do $b1:block $(and $b2:block)?) => {
        $b1
        $($b2)?
      }
    }
    
  2. Trailing commas. It’s kind of infuriating that the best way to make a rule tolerate trailing commas TMK is to create another identical rule that has a comma at the end:

    macro_rules! foo {
      ($(pat),*,) => { foo!( $(pat),* ) };
      ($(pat),*) => {
        // do stuff
      }
    }
    

    or to allow multiple trailing commas:

    macro_rules! foo {
      ($(pat),* $(,)*) => {
        // do stuff
      }
    }
    

    Under this RFC, one would simply write:

    macro_rules! foo {
      ($(pat),* $(,)?) => {
        // do stuff
      }
    }
    

Guide-level explanation

In Rust macros, you specify some “rules” which define how the macro is used and what it transforms to. For each rule, there is a pattern and a body:

macro_rules! foo {
  (pattern) => { body }
}

The pattern portion is composed of zero or more subpatterns concatenated together. One possible subpattern is to repeat another subpattern some number of times. This is extremely useful when writing variadic macros (e.g. println):

macro_rules! println {
  // Takes a variable number of arguments after the template
  ($tempate:expr, $($args:expr),*) => { ... }
}

which can be invoked like so:

println!("")           // 0 args
println!("", foo)      // 1 args
println!("", foo, bar) // 2 args
...

The * in the pattern of this example indicates “0 or more repetitions”. One can also use + for “at least one repetition” or ? for “at most one repetition”.

In the body of a rule, one can specify to repeat some code for every occurence of the pattern in the invokation:

macro_rules! foo {
  ($($pat:expr),*) => {
    $(
      println!("{}", $pat)
    )* // Repeat for each `expr` passed to the macro
  }
}

The same can be done for + and ?.

The ? operator is particularly useful for making macro rules with optional components in the invocation or for making macros tolerate trailing commas.

Reference-level explanation

? is identical to + and * in use except that it represents “at most once” repetition. The implementation ought to be very similar to them. IIUC only the parser needs to change. I don’t think it would be technically difficult to implement, nor do I think it would add much complexity to the compiler.

The ? character is chosen because

  • While there are grammar ambiguities, they can be easily fixed, as noted by @kennytm here:

    There is ambiguity: $($x:ident)?+ today matches a?b?c and not a+. Fortunately this is easy to resolve: you just look one more token ahead and always treat ?* and ?+ to mean separate by the question mark token.

  • It is consistent with common regex syntax, as are + and *

  • It intuitively expresses “this pattern is optional”

Drawbacks

I can’t really think of anything. Feel free to suggest.

Rationale and Alternatives

One alternative to alleviate the trailing comma paper cut is to allow trailing commas automatically for any pattern repetitions. This would be a breaking change. Also, it would allow trailing commas in potentially unwanted places. For example:

macro_rules! foo {
  ($($pat:expr),*; $(foo),*) => {
    $(
      println!("{}", $pat)
    )* // Repeat for each `expr` passed to the macro
  }
}

would allow

foo! {
  x,; foo
}

Also, rather than have ? be a repetition operator, we could have the compiler do a “copy/paste” of the rule and insert the optional pattern. Implementation-wise, this might reuse less code than the proposal. Also, it’s probably less easy to teach; this RFC is very easy to teach because ? is another operator like + or *.

We could use another symbol other than ?, but it’s not clear what other options might be better. ? has the advantage of already being known in common regex syntax as “optional”.

It has also been suggested to add {M, N} (at least M but no more than N) either in addition to or as an alternative to ?. Like ?, {M, N} is common regex syntax and has the same implementation difficulty level. However, it’s not clear how useful such a pattern would be. In particular, we can’t think of any other language to include this sort of “partially-variadic” argument list. It is also questionable why one would want to syntactically repeat some piece of code between M and N times. Thus, this RFC does not propose to add {M, N} at this time (though we note that it is forward-compatible).

Finally, we could do nothing and wait for macros 2.0. However, it will be a while (possibly years) before that lands in stable rust. The current implementation and proposals are not very well-defined yet. Having something until that time would be nice to fix this paper cut. This proposal does not add a lot of complexity, but does nicely fill the gap.

Unresolved Questions

None that I can think of…


#2

Why is $b2 expanded using *?

There is ambiguity: $($x:ident)?+ today matches a?b?c and not a+. Fortunately this is easy to resolve: you just look one more token ahead and always treat ?* and ?+ to mean separate by the question mark token.

$($x:tt){2,7} :wink:


#3

@kennytm Thanks!

TBH, I thought this was the standard way to expand the repetition… Is it not? The Book doesn’t seem to say. I’m now guessing you are supposed to use *, +, or ? – which ever you used in the pattern?

Ah, I wasn’t sure if this was allowed or not… Also not very well specified in the Book.

I still don’t quite understand when this would be useful, though. Could you give an example use case?


#4

Having ? would be super useful precisely for the reasons you mention…

… However, do we want to improve macros by example at all?

Should we not wait for macros 2.0 and provide this there?

AFAIK macro_rules! will be deprecated eventually?


#5

@Centril Macros 2.0 is moving a bit slowly, and all the documentation and drafts I can find are a vague and hand-wavy. I suspect it will be years before we get a stable macros 2.0, and in the meantime, this should be a workable, relatively simple fix for a paper cut.


#6

Yes if it’s defined as $(…)+ then you’re supposed to use + to expand it, though I’m not sure if the compiler will check this.

I don’t think it is useful, I just mean instead of N?, we could reuse the standard regex notation {0,N}.


#7

I guess it’s a possibility… will add to the “alternatives” section

Also, will add this to alternatives


#8

This is all super true - I’ve never read an RFC as vague as the macros 2.0 one. I feel that after the fact, this should have been an eRFC.

So I buy completely your answer.

That’s good. But given your answer, I wouldn’t too much emphasis on the alternative.


#9

Even if we had {0, N} notation, I still think ? should be there. So I don’t see them as complementary. The ? qualifier on rules is so pervasive and common that it makes sense to have custom syntax for it.


#10

I don’t mean we replace ? with {0,1}, I mean if we do want to support “repeat at most 8 times” then {0,8} should be better than 8?. This means we cover all regex quantifiers: *, +, ? and {m,n}.


#11

Sure, that sounds all good to me =) But {m, n} may cause more breakage?


#12

{m,n} does not cause any breakage more than ?, it only has special meaning when appearing after $(…). That said, as OP mentioned, {m,n} is not very useful unlike ?.


#13

I personally have seldom needed {m,n}. However, I would like it considered during the RFC process of adding ?, precisely because it is included among the regex qualifiers due to sufficiently frequent need. Just because I haven’t needed it doesn’t mean that others won’t need it.


#14

After thinking about it, I don’t think {m,n} should be included in the RFC. {m,n} is meaningful in regex because many regexes actually do have meaningful repeat-at-least-m-but-no-more-than-n patterns. The same is not true of code IMHO:

  • Many languages have variadic or optional function arguments, but I don’t know of any that have semi-variadic function that can take between m and n arguments.
  • It’s very rare to want to repeat some construct syntactically between m and n times in the source code. Usually, people use loops. One could imagine doing some sort of loop-unrolling in a macro, but this seems extremely niche.

#15

Updated the original post again…


#16

Ok, I think the conversation here seems to have settled, so I will open an actual RFC :slight_smile:


#17

Thanks for all the discussion!

RFC: https://github.com/rust-lang/rfcs/pull/2298


#18

Request for comments/FCP on newest implementation: https://github.com/rust-lang/rust/issues/51934