Grammar of rust

Hi all

I started in june with trying to verify the grammar inside the compiler. So I started to compile the make check-grammar target. https://github.com/rust-lang/rust/pull/34994

Now I am a little stuck with understanding the current grammar infrastructure. Can anyone give me a little intro?

I am currently at rustfest.eu if you are here as well, please find me.

Regards Stefan

The grammar situation in general is quite poor at the moment, the language is basically defined by the parser. There is a plan to make things better (GitHub issue) and provide a reference grammar, but it is far from completion.

I don’t know about grammar checking infra in the repo, but I guess it should start with a proper grammar definition, and there isn’t one at the moment.

Currently available options are:

Ah, and the README at https://github.com/rust-lang/rust/tree/master/src/grammar is completely misleading

In that folder, only the lexer is implemented in ANTLR4 (a Java tool). I think there was some code to compare it with rustc native lexer, but it isn’t run as part of the build.

Also that folder contains a lexer and parser definition for lex/bison (C tools). There is no code that compares syntax trees though.

My first goal was to finish the antlr4 grammer to create a lexer. This would enable checks like if the lexer does not accept a source file, the compiler must not accept it.

1 Like

parser-lalr.y, parser-lalr-main.c, tokens.h and lexer.l are extracts from the more-complete rust-grammar repo: https://github.com/bleibig/rust-grammar

It is the most up-to-date grammar I’m aware of; everything pre-2015 – when the language underwent its final rush-to-stability – is probably junk due to constant grammar churn (all the ANTLR work is hopelessly bitrotted, and it’s not an appropriate tool anyways).

IMO Rust isn’t ideally thought of as LL(k) anymore; I tried to keep it that way for a long time, but it’s grown a lot of bits that work better in LR-family. I highly recommend just deleting anything ANTLR-related and focusing on LR(k) or LALR(k) grammars.

The verify.rs and testparser.py tools are components of the rust-grammar repo, but only parts; they’re intended to be used with the rlex / rparse stubs in that repo, that generate a comparable syntax tree dump from the production compiler. This is a reasonable approach to bringing a grammar-derived parser up to parity, but it’s bitrotted some. Also, of course, true lex-and-yacc are not likely the targets you want to stick with long term; they’re a stopgap.

Niko is working on https://github.com/nikomatsakis/lalrpop which should, eventually, be a good target for building an LR-family frontend for Rust, based on a grammar similar-to the one in the yacc file.

Hth.

1 Like

Hm, further digging. These are the 3 best (live, post-2015) resources:

https://github.com/bleibig/rust-grammar – Source of the lex/yacc (LALR) grammar in the Rust repo https://github.com/nikomatsakis/rustypop – Niko’s partial port of that from yacc to LALRPOP https://github.com/jorendorff/rust-grammar – Jason’s ANTLR4 grammar

I would recommend focusing on Niko’s (rustypop), though it might be worth emailing Jason to ask what the story is on his. I suspect it was made as part of the Programming Rust book http://shop.oreilly.com/product/0636920040385.do

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.