Grammar of rust


#1

Hi all

I started in june with trying to verify the grammar inside the compiler. So I started to compile the make check-grammar target. https://github.com/rust-lang/rust/pull/34994

Now I am a little stuck with understanding the current grammar infrastructure. Can anyone give me a little intro?

I am currently at rustfest.eu if you are here as well, please find me.

Regards Stefan


#2

The grammar situation in general is quite poor at the moment, the language is basically defined by the parser. There is a plan to make things better (GitHub issue) and provide a reference grammar, but it is far from completion.

I don’t know about grammar checking infra in the repo, but I guess it should start with a proper grammar definition, and there isn’t one at the moment.

Currently available options are:


#3

Ah, and the README at https://github.com/rust-lang/rust/tree/master/src/grammar is completely misleading

In that folder, only the lexer is implemented in ANTLR4 (a Java tool). I think there was some code to compare it with rustc native lexer, but it isn’t run as part of the build.

Also that folder contains a lexer and parser definition for lex/bison (C tools). There is no code that compares syntax trees though.


#4

My first goal was to finish the antlr4 grammer to create a lexer. This would enable checks like if the lexer does not accept a source file, the compiler must not accept it.


#5

parser-lalr.y, parser-lalr-main.c, tokens.h and lexer.l are extracts from the more-complete rust-grammar repo: https://github.com/bleibig/rust-grammar

It is the most up-to-date grammar I’m aware of; everything pre-2015 – when the language underwent its final rush-to-stability – is probably junk due to constant grammar churn (all the ANTLR work is hopelessly bitrotted, and it’s not an appropriate tool anyways).

IMO Rust isn’t ideally thought of as LL(k) anymore; I tried to keep it that way for a long time, but it’s grown a lot of bits that work better in LR-family. I highly recommend just deleting anything ANTLR-related and focusing on LR(k) or LALR(k) grammars.

The verify.rs and testparser.py tools are components of the rust-grammar repo, but only parts; they’re intended to be used with the rlex / rparse stubs in that repo, that generate a comparable syntax tree dump from the production compiler. This is a reasonable approach to bringing a grammar-derived parser up to parity, but it’s bitrotted some. Also, of course, true lex-and-yacc are not likely the targets you want to stick with long term; they’re a stopgap.

Niko is working on https://github.com/nikomatsakis/lalrpop which should, eventually, be a good target for building an LR-family frontend for Rust, based on a grammar similar-to the one in the yacc file.

Hth.


#6

Hm, further digging. These are the 3 best (live, post-2015) resources:

https://github.com/bleibig/rust-grammar – Source of the lex/yacc (LALR) grammar in the Rust repo https://github.com/nikomatsakis/rustypop – Niko’s partial port of that from yacc to LALRPOP https://github.com/jorendorff/rust-grammar – Jason’s ANTLR4 grammar

I would recommend focusing on Niko’s (rustypop), though it might be worth emailing Jason to ask what the story is on his. I suspect it was made as part of the Programming Rust book http://shop.oreilly.com/product/0636920040385.do