The future of syntax extensions and macros

I'd really like to get a handle on just what set of operations are required. My personal rough preference at this juncture is something like:

  • a base API where plugins get TT in, TT out
  • some additional side APIs for things like interners, but I don't know what this set is
  • yes, I prefer IPC to dlopen, for a variety of reasons
    • plugin crashes don't break the compiler
    • security
    • sidestep ABI issues, LD_LIBRARY_PATH questions, etc
    • probably a few more

Eventually, I would want to add the optional ability to send/receive ASTs. This is both for convenience and performance. My thought though is that this would come after the initial release. Instead, we would begin with a nursery library for parsing/serializing TT <-> AST. This way we can prototype the AST and parser and experiment with it, I think it has lots of interesting demands that the current AST and parser do not satisfy:

  • extensibility: it'd be nice to be able to add a keyword, like yield, but otherwise parse Rust grammar unchanged. There is some research in this area that is worth looking into, e.g. I've had good experiences using Polyglot extensible compiler framework, which has a lot of novel ideas in this direction. I'm sure there's more.
  • including more information: I think syntax extensions will want easy access to everything in the source, probably including even comments.
  • ease of grafting and copying around etc: we should make the API a joy to use. I know there are some nice builders out there.

In fact, even just TT <-> TT interfaces raise some questions. I am not sure that the tokens we use should be the same tokens that Rust uses. It might be useful to have a more primitive notion of token. For example, I have often wished for the ability to use symbols that the Rust tokenizer doesn't recognize in my macros. Plus, I don't want changes to the tokenizer to break syntax extensions, if we can help it. So I'd prefer if our tokens were something pared down. I haven't thought this all the way through, but maybe something like:

  • The delimeters (, [, {, }, ], ), each of which encompasses a delimeted tree
  • String constants
  • Comments
  • Floats
  • Integers
  • Symbols (continuous strings of other characters, like !=, <> or !@#!@#!)
  • Words (identifiers + keywords + reserved-words etc)

(These tokens would then be "re-tokenized" to get the Rust tokens.) Anyway, I'm not sure if that's the right breakdown (I haven't, for example, gone to look at the list of Rust tokens to see what I'm overlooking), and maybe this concern is just overblown. But it's something to think about.

Some other open questions though:

  • should we give syntax extensions access to the rest of the surrounding AST? I'd prefer not to, at least to start, but eventually read-only access might be ok. In that case, it'd be very good if they ask for it piecemeal, so we can track what they looked it. This seems to require an AST-based API (though I guess we could reserialize).
  • what about attributes? We've long had a plan to make outer attributes act like macros. @nrc has raised some interesting questions about whether we could permit outer attributes to be used with arbitrary token trees. The idea would be that #[foo] can be followed by any number of token trees that ends in a ; or {...}. This kind of sort of seems to work, though there's probably a gotcha somewhere in expressions (e.g., if/else), and it is maybe limiting on future extensions of the grammar. (Also, inner attributes don't fit here.)
  • presuming we keep attributes as only attaching to Rust code, then I think that if a decorator were implemented as a syntax extension, we would reserialize the Rust code (until such time as we support an AST-based interface).
  • what about namespacing and so on? I've been toying with some more name resolution ideas which I think are getting somewhere. I'm hoping to write them up soon.

OK, that's all I can think of at this moment. :smile: