Possible parser defect

according to my best reading of the rust reference, the following code should compile, but it fails with a parse error:

fn main() {
    { 8 } == 1;
}

playground link

note that this does not fail when the block is on the right side of the comparison. this probably belongs in weird-exprs.rs. in general a lot of the forms of ExpressionStatement are rarely used, and thus probably under-tested. this is the same way we got the break rust; internal compiler error.

1 Like

the rust reference says:

This can cause an ambiguity between [a block expression] being parsed as a standalone statement and as a part of another expression; in this case, it is parsed as a statement.

so afaict this means { 8 } == 1; is parsed as if { 8 } is a standalone statement and then == 1; is an invalid expression so it errors.

the fact that rust syntax has to be explained in prose instead of formally doesn't feel great.

1 Like

In the past I've looked for a normative grammar document. Could never find one unfortunately, and that lack of a normative document bothers me too. In lieu of using that normative definition directly for parser generation, it seems like a fundamental piece of documentation at the very least.

1 Like

Arguably the ambiguity is visible in the grammar present in the reference, it just doesn't show how the ambiguity is resolved.

that's because they decided to use BNF to express tbe grammar which is designed in terms of generating text rather than parsing text so is ambiguous (in particular the BNF | operator doesn't tell you which side to match if both sides could), if they rewrote it to use a PEG, then it would be unambiguous since that instead has the / ordered choice operator that always matches the first alternative that matches the input, so is completely unambiguous.

2 Likes

There was a declarative grammar previously, but it disagreed with the reference prose and the implementation. There also was wg-grammar, and we did so some work on making a grammar, but the end state was the "ungrammar" used by rust-analyzer and a conclusion that there weren't any good machine-verifiable tools for specifying the syntax that can express the disambiguation precedence in an intuitive manner.

A formal grammar that correctly parses Rust but does so in a way that obfuscates the syntax structure solely for the purpose of formally encoding syntax precedence isn't particularly beneficial to anyone. All of the real tools want to parse a superset of Rust so they can offer decent error messages and recovery; the point of the reference grammar isn't to actually parse Rust, but to inform implementations how to parse Rust.

For this, the semiformal grammar in The Reference along with the prose description of disambiguation is currently seen as sufficient. I'm not an active part of the spec team, but the eventual spec document will include a formal definition of the syntax/parse of valid Rust programs.

6 Likes

This is an interesting article about ambiguities in grammars:

  • PEG grammars are unambiguous, but not because the syntax can be parsed only in one way (i.e. compatible with deterministic parsers), but because PEG just makes a specific choice when encountering an ambiguity.

  • BNF is likely to be ambiguous, but making it unambiguous may require rewriting it with extra terms, and such transformation is often unintuitive and harder to understand by humans.

6 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.