This is not quite true. In particular, the entire concept of a lexer fundamentally relies on that pretty much all interesting classes of grammar are closed with respect to composition with finite-state transducers.
That is to say that for a grammar category that is closed in such a manner, composing it with a finite state transducer yields a grammar in the same category. This is true of LL, LR, LALR, CFG, etc.
However, finite-state transducers can only express transformations up to the complexity of the regular languages. Once you go beyond the regular languages, itâs no longer an FST, and the closure properties no longer hold.
As a result, taking a CFG and composing it with a âlexerâ that expresses things beyond regular languages may yield something that isnât a CFG at all - it might be context-sensitive, or it may even be undecidable.
In other words: There is, in fact, a formalism for describing lexers - one which, in fact, inspired their use in parsers. That formalism is a âFinite-state transducerâ, and if Rustâs âlexerâ cannot be expressed as one, thatâs potentially a real problem.
Nested comments are a very good example there: Finite-state transducers, like finite-state automata, cannot count. If Rust actually does handle nested comments by nesting them, rather than the first */ ending the comment, it may not be possible to express comments as part of the lexer without badly violating the definition of the term.
(It may be possible to end-run that specific issue by showing that the Rust grammar itself falls into a class that is closed under composition with âbalanced parenthesis languagesâ, but thatâs a whole different kettle of fish, and possibly a rotten one)