Pattern Grammar Ambiguity

(previous discussion on URLO)

The pattern grammar given in the reference has an ambiguity in it, which can lead to the compiler generating an unexpected error.

Consider the case where you want to initialize a mutable variable y as the target of an immutable reference x. Following the grammar rules listed, you can build a pattern like this:

  1. IdentifierPattern: ref? mut? IDENTIFIER (@ PatternNoTopAlt ) ? ⇒ mut y
  2. PatternWithoutRange: ... | IdentifierPattern | ... ⇒ mut y
  3. ReferencePattern: (& |&& ) mut? PatternWithoutRange&mut y

Continuing unchanged to Pattern&mut y.

So, according to the grammar listed in the reference, the following code should be legal:

fn main() {
    let x = &42;
    let &mut y = x;
    y *= 2;
    dbg!(y);
}

It, however, fails to compile with a type mismatch error. This is because the compiler assigns the mut token not to the mut? clause in IdentityPattern but to the mut? clause in ReferencePattern instead.

There is a practical workaround (use &(mut y) as the pattern instead), but it feels uncomfortable for the language grammar to contain this kind of ambiguity: A different parser operating on the same grammar could come up with different interpretations of the same program.

4 Likes

Does the grammar definition use an ordered set of alternates? That results in the current interpretation being the valid one.

The "Notation" section describes | as "Either one or another" and ? as "An optional item". Though it isn't explicit, these imply a BNF-style (generative) grammar instead of a PEG-style (ordered/greedy) grammar.

1 Like

Ambiguity (and to some degree, precedence) currently isn't covered in the reference. There isn't a full description of Rust syntax at the moment (other than the rustc implementation as a reference).

3 Likes

If &mut y was interpreted as &(mut y) by default, then would you have to write &mut (y) to match an actual mutable reference? This seems like it would be much more painful.

1 Like

An alternative for this specific case would be to let type inference resolve the ambiguity, as only one of the two choices will typecheck in any given situation. I suspect that would be more trouble than it’s worth to implement, however.

A more practical approach is to figure out how to add these small nuances of Rust’s actual behavior to the documentation, without making it too cumbersome to read.

Is that even a thing with LR-style parsers? The only place I've seen it so far is with PEGs, which are essentially versions of LL(1)-style parsers that are nerfed in terms of expressive power, but are easier to write. Since LR-style parsers can track multiple possible recognitions at the same time (kind of a consequence of the R in the LR) ordered choice isn't usually a restriction that needs to be made.

This seems like an overly-narrow view of PEGs, given that they can recognize some languages that are provably impossible for BNF-family grammars (with 𝒜𝓃𝓃𝒞𝓃 being the classic example). Last I checked, it was an open question whether or not there are any BNF languages that cannot be expressed in PEG.

FWIW, &mut mut y does already work too, parsed as a mutable reference and mutable binding.

1 Like

Given that a slightly extended version of (E)BNF in general is in principle capable of describing any context-free language¹ and even beyond², it would surprise me if there weren't any. PEGs are not capable of recognizing just any old context-free language, and are restricted to a proper subset precisely because of that ordered choice. Also note that 𝒜𝓃ℬ𝓃𝒞𝓃 is not context-free, so I'd love to see how a PEG-derived parser could parse that.

¹As demonstrated by the existence of e.g. SGLR parsing. The extensions are to provide disambiguations of the grammar, mainly through precedence and associativity.

² It seems to be possible to coax tree-sitter into using a context-aware lexer

This reminds me of when I solved this problem:

fn main() {
    let mut i = 0;
    let x = Some(&mut i);
    // challenge: increment the `0` without marking `x` mutable!

    // <- your code here
    
    // this should pass:
    assert_eq!(**x.as_ref().unwrap(), 1);
}
Solution:
fn main() {
    let mut i = 0;
    let x = Some(&mut i);
    // challenge: increment the `0` without marking `x` mutable!

    if let Some(&mut ref mut r) = x {
        *r += 1;
    }
    
    // this should pass:
    assert_eq!(**x.as_ref().unwrap(), 1);
}
3 Likes

Using the notation from the original PEG paper:

ABC <- &(AB !'b') 'a'* BC !.
AB <- 'a' AB 'b' / ε
BC <- 'b' BC 'c' / ε

Edit: A version of this appears in the paper on page 6, under the heading “Theorem: The class of PELs includes non-context-free languages.”


There are plenty of non-context-free languages that can be expressed in PEG, mostly thanks to the & and ! operators combined with the ordered choice scheme. It’s still unclear, however, whether or not PEG is a superset of CFG, or simply an unrelated set with some overlap.

1 Like

How does the ferrocene draft specification handle this ambiguity? If it doesn't, that should be reported.

This does seem like the correct binding strength. It should at the current stage of the game be specified in prose to resolve the ambiguity.

Long-term, we'd like to have a machine-readable grammar with machine-readable ambiguity resolution rules. However, the tooling to properly do so doesn't really exist yet.

2 Likes

I can’t find anything in that document that addresses the ambiguity. In the preamble, they explicitly define the grammar to be context-free, with prose extensions to describe contextual restrictions. The formal grammar there is essentially the same as the one in the reference:

§5.1.1 Identifier Patterns

IdentifierPattern ::=
   ref? mut? Binding BoundPattern?

BoundPattern ::=
   @ Pattern

An identifier pattern yields a binding. An identifier pattern with keyword mut yields a mutable binding. …


§5.1.6 Reference Patterns

ReferencePattern ::=
   & mut? PatternWithoutRange

The type of a reference pattern is determined as follows:

So, the effects of the ambiguity are spelled out more explicitly but not resolved. The pattern & mut y has two valid parse trees with different semantics: One assigns the mut keyword to the ReferencePattern, matching a mutable reference, and the other assigns it to the IdentifierPattern, generating a mutable binding.


Edit: Issue #239 filed against Ferrocene’s specification repository.