This is a reply to this GitHub comment as well as a report of my current experiments with rust-analyzer. I fully agree that going from spans to opaque ids seems to be a good idea! This is what rust-analyzer is using for macros (well, will be using once #724 is merged).
Specifically, the setup is as follows:
TokenTree are strictly for interfacing with macros. “AST” (which is Concrete Syntax Tree actually) itself is build directly from text (to account for comments & whitespace).
To do macro expansion, there’s an explicit step that converts the CST for the macro argument to the enum TokenTree { Leaf(Token), Subtree(Vec<TokenTree>) }
representation. Because macros care about identity of tokens, tokens (currently only idents) are assigned incrementing ids in the process of transforming CST into TokenTree
s. That is, ids are local to each macro call.
A macro than transforms TokenTree
to another TokenTree
, taking care of maintaining token identities where necessary.
TokenTree
is then parsed back to CST. This step is pretty messy at the moment, because it basically goes via to_string
and does not preserve “structured” hygiene information (though, we do remember which source text range maps to which target text range, for IDE purposes)
However, we don’t try to be incremental “within” the macro. Adding a token to the start of macro invocations just shifts all the ids. In fact, although the parser itself is incremental, we don’t try to match pre-edit and post-edit trees at all: trees are assumed to change completely after every edit. This works pretty well with salsa nonetheless: the trick is to place a mediating query between the syntax tree and further analysis.
For example, to do name resolution, we first extract a (position independent) set of imports and items declared in a file, and then run resolution on top of that. The first phase, lowering syntax trees to a set of import and items, is repeated after every change, but is fast, and happens for a single file only. The second phase is executed only when results of the first phase change (ie, if you actually add an import). This setup does not require precise tracking of changes: even if you, say, rewrite file completely by switching branches, if the set of declarad items stays the same, name resolution is reused.
Although the bread and butter of most IDEs is being lazy and skipping large chunks of the source code completely, this wont’ work for Rust. The reason is that name resolution is a fixed point iteration algorithm, intertwined with macro expansion, which you more or less have to run for a whole crate at a time. Luckely, salsa + mediating queries should handle this pretty well: if you don’t change top-level strcture, nothing is recomputed, if you add a new struct, nameresolution is recomputed, but macro expansions are not affected, etc. Hopefully, even editing code inside a top-level macro, like lazy_static
, will require only this macro to be re-executed, with a caveat.
Caveat:
A major requirenment for incremental parser is error tolerance. If the user added let x =
, that shouldn’t break the parse tree before or after. That’s easy to achieve in a hand-written parser, it’s really not a rocket science. However, I am not sure how this would work with procedural macro. Say, you’ve implemented lazy-static as a proc macro. Now, user types let x =
inside lazy-static invocation. If you use something like syn
, than it will (I think) just refuse parsing this invalid expression, and so, globally, you’ll transition from the state where there is a top-level item declared by macro, to the state without one. So, you need to re-run name resolution, etc… Error recovery inside macros would be a fun problem to solve once we sort out non-macro stuff!
BTW, if anyone finds this stuff fascinating, check this rust-analyzer issue and PR which start to tie together the unified “parsing/macro expansion/name resolution” frontend: