Pre-RFC: named capture groups for macros

Problem Statement

Working with multiple capture (repetition) groups in macros gets confusing fast. For example, this metavar expressions example from the Little Book of Rust Macros:

macro_rules! attach_iteration_counts {
    ( $( ( $( $inner:ident ),* ) ; )* ) => {
        ( $(
            $((
                stringify!($inner),
                ${index(1)}, // this targets the outer repetition
                // and this, being an alias for `index(0)` targets the inner repetition
                ${index()}
            ),)*
        )* )
    };
}

(full link)

The nested repetition is somewhat difficult to understand, and I think it gets even more confusing with soon-to-be-stabilized metavariable expressions, where index(0) and index(1) refer to the capture groups. Diagnostics are also not great if using the wrong index or mismatching repetitions.

Proposal

Adopt the syntax $group_ident:( $inner ),* to create a capture group by the name of ident. This can then be similarly expanded in the macro body with $group_ident( /* ... */)*.

Additionally, allow using this name when indicating the group to metavariable expressions.

Example

The above macro could then be rewritten as:

macro_rules! attach_iteration_counts {
    ( $outer_group:( ( $inner_group:( $inner:ident ),* ) ; )* ) => {
        ( $outer_group(
            $inner_group((
                stringify!($inner),
                ${index($outer_group)}, // this targets the outer repetition
                // and this, being an alias for `index(0)`/ `index($inner_group)` targets
                // the inner repetition
                ${index()}
            ),)*
        )* )
    };
}

I find this quite a bit easier to understand. Repetition-related diagnostics would also be able to refer to the groups by name, rather than being limited to point to code locations and referring to groups by index.

This is roughly equivalent named capture groups in regex that contain repeated tokens, e.g. (?<group1>(?<ident1>\w+),?)* (not exactly equivalent since regex indexing rules mean the inner ident1 groups only gets matched once).

Future Possibilities

Since named capture groups provide an unambiguous way to refer to specific repetitions within the macro body, it may be possible to use capture groups without capturing variables.

macro_rules! demo {
    ( $group1:($ident1:ident),*; $group2($ident2:ident),* ) => {
        // repetition capturing $ident1: currently works
        $group1( const $ident1: u32 = 0; )*
        // repetition without capturing `$ident1`: no currently stable solution
        // since it does not know which group to capture.
        $group1( println!("Hello, world!"); )*
    }
}

Note that the metavariable expression ${ignore($ident1)} will provide a way to do this in the near future, but referring to the capture groups directly by name seems cleaner than indirectly referring to them by what they capture.

This would also make for more obvious nested repetition that doesn't conflict:

macro_rules! foo {
    (
        sig: ($arg_group:($arg:ident, $ty:ty),+) -> $ret_ty:ty;
        $name_group:(name: $fn_name:ident)?
    ) => {
        // This does not work and is confusing currently, partially
        // because it is ambiguous whether this group refers to the one
        // |<-- that captures `$arg` or the one that captures `$fn_name`
        // $(
        //     $fn_name($($arg: $ty),+) -> $ret_ty {
        //         todo!()
        //     }
        // )?
        //
        // The error for this is:
        // `error: meta-variable `fn_name` repeats 0 times, but `arg` repeats 1 time`

        // More clear about what the groups refer to, so nested repetition
        // is less problematic
        $name_group(
            $fn_name($arg_group($arg: $ty),+) -> $ret_ty {
                todo!()
            }
        )?

         // (a full example would need some way to handle `fn_name`
         // not being specified, e.g. coalesce for a default)
    }
}

foo!(
    sig: (i32, i32) -> i32;
);
4 Likes

It's there a specific grammar reason that there's a : between the name and the parens?

Not necessarily, it just seemed more consistent with regular fragment specifiers.

I turned this into an RFC: [RFC] Named macro capture groups by tgross35 · Pull Request #3649 · rust-lang/rfcs · GitHub

2 Likes

Ran into the missing feature of matching non-metavariable groups recently. Big fan of this.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.