The future of syntax extensions and macros

I’ve previously laid out the current state of affairs in Rust (0, 1, 2, 3, 4). In this post I want to highlight what I think are issues with the current system and if/when I think we should fix them. I plan to do some focused design work on this leading to an RFC and to work on the implementation as well. Help with both will be greatly appreciated, let me know if you’re interested.

I’d like to gather the community’s thoughts on this before getting to the RFC stage (and I’m only really requirements gathering at this stage, not coming up with a plan).

RFC issue 440 describes many of the issues, I hope I have collected all of those which are still relevant here.

high-level motivation

The most pressing issue is the instability of procedural macros. We’ve seen people do really cool things with them, they are an extremely powerful mechanism, and one of the most requested features for stability. However, there are many doubts about the current system. We would like to have a system of procedural macros we are happy with and move towards stability.

There are many issues with macro_rules macros which we would have liked to address before 1.0, but couldn’t due to time constraints and prioritisation. Many of the ‘rough edges’ would be breaking changes to fix. It is unclear how difficult such changes would be to cope with and if it is worth making such changes. In particular, we can have the best macro system in the world, but if the old one has momentum and no-one uses the new one, then that would not be a good investment.

Finally, the two macro systems interact a lot. We need to make sure that any decisions we make in one area will leave us the freedom to do what we want in the other.

macro_rules issues

  • Our hygiene story should be more complete. See blog post 3 for details of what is not covered by Rust’s hygiene implementation. In particular type and lifetimes variables should be covered, and we should have unsafety and privacy hygiene. One open question here is how items introduced by macro expansion should be scoped.

  • Modularisation of macros needs work. It would be nice if macros could be named in the same way as other items and importing worked the same way for macros as for other items. This will require some modifications to our name resolution implementation (which is probably a good thing, it is kinda ugly at the moment).

  • Having to use absolute paths for naming is a drag. It would be better if naming respected the scopes in which macros are defined.

  • Ordering of macro definitions and uses shouldn’t matter, in the same way that it doesn’t for other items. E.g.,

foo!();
macro_rules! foo { ... }

should work.

  • Ordering of uses shouldn’t matter, e.g.,
macro_rules! foo {  
    () => {
        baz!();
    }
}
macro_rules! bar {
    () => {
        macro_rules! baz {  
            () => {
                ...
            }
        }
    }
}

foo!();
bar!();

should work.

  • There are some significant-sounding bugs that should be addressed, or at least assessed. E.g., around parsing macro arguments: 6994, 3232, and 27832

stability

The big question is whether we can adapt the current system with minimal breakage, or whether we need to define a macros 2.0 system and if so, how to do that. I believe that even if we go the macros 2.0 path, it must be close enough to the current system that the majority of macros continue to work unchanged, otherwise changing will be too painful for the new system to be successful. That obviously limits some of the things we can do.

procedural macro issues

  • The current breakdown of procedural macros into different kinds using traits is a bit ad hoc. We should try to rationalise this breakdown. Hopefully we can merge ItemDecorator and ItemModifier, remove IdentTT, and make MacroRulesTT more of an implementation detail. We should also find a simpler interface than the current mess of traits and data structures.

  • All kinds of macros should take and produce token trees, rather than ASTs. However, there should be good facilities for working with the token trees as ASTs. Either provided by the compiler or from external libraries.

  • There should be powerful and easy to use libraries for working with hygiene and spans.

  • The plugin registry system is not ideal, instead we should allow scoped attributes and support modularisation of procedural macros.

  • Having to build procedural macros as separate crates is not ideal, it would be nice to be able to have procedural macros defined inline like macro_rules macros. I imagine we would still allow something like the current system, but also support inline macros as a kind of sugar (nesting these inside macro_rules macros will make things fun).

stability

Currently procedural macros have access to all of the compiler. If we were to stabilise macros like this, then we could never change the compiler or language. We need to present APIs to procedural macros that allow the language to evolve whilst the remain stable. Various ways have been suggested for this, such as working only with quasi-quoting, relying on external libraries, some kind of extensible AST API, extensible enums, and a more flexible token tree structure.

other stuff

  • Scoped attributes were mentioned for procedural macros, it would be nice to allow tools to make use of these too. This could allow more precise interaction with the various attribute lints (unknown attribute, etc.).

  • Concatenating idents is supported by concat_idents, however, it is not very useful due to hygiene constraints. I believe we can do much better. (See tracking issue). We might also want to allow macros in ident position.

  • Proper modularisation of macros would require moving some of name resolution to the macro expansion phase. This would be a big change. However, name resolution is due for some serious refactoring and we could move what is left to the AST->HIR lowering step or even later and make it ‘on-demand’ during type checking. The latter would allow us to use the same code for associated items as for plain names.

  • We seem to be paying the price for some unification/orthogonality without getting a great deal of benefit. It would be nice to either make macro_rules macros more built-in and/or be able to do more compiler stuff as pure procedural macros (e.g., all the built-in macros, cfg, maybe even macro_rules macros).

out of scope?

There are some things which are touched on here or in the last few blog posts, but which I don’t want to think about at the moment. I think most of them are orthogonal to the main focus here (modularisation, hygiene, stabilising procedural macros) and backwards compatible. Things I can think of:

  • other plugin registry stuff - the plugin registry is pretty ugly, I’d rather it disappeared completely. Eliminating the registration of procedural macros is a step towards that, but I don’t want to think about lints, LLVM passes, etc. for now.
  • Macro uses in more positions (I mentioned ident position above, I’m not sure how important it is to consider that now. I’m not sure if there are other positions we should consider) - we can add more if there is demand, but it doesn’t intersect with much else here.
  • Expansion stacks and other implementation stuff - we can probably do better, but I think it is mostly orthogonal to the high level design.
  • Tooling - we need better tooling around macros, but it’s pretty orthogonal to this work
  • Compiler as a library.
  • gensym in macro_rules.
  • Allow matching of + and * in macro patterns .
  • External deriving implementations - hopefully this will extend the procdural macros work, but I don’t think it affects the design at this stage.
  • Dealing with repetition in sequences - i.e., generating n of something in a macro, counting the number of repetitions in a pattern.
8 Likes

I'm not sure I understand this concern. If the new macro system is less gunky, more powerful (hygiene, modularization) and easier to use (no more absolute paths!), won't people writing new code naturally want to use them instead of the old ones? Of course, existing code with old-style macros in it will continue to be that way, but even if they aren't rewritten (which some of them at least will be), they will naturally form a smaller-and-smaller share of the total Rust pie over time. (This is basically the same strategy C++ employs when it wants to "replace" an old feature with a new one.) We could even add deprecation warnings for old-style macros after some point to encourage cleaning up the stragglers.

What does the scenario you're afraid of look like?

3 Likes

The scenario I’m afraid of is we provide an awesome new macro system and a large number of people don’t use it anyway because they don’t like change and the old one is good enough, then we can’t deprecate the old system because there is too much code out there using it and we’re stuck with two systems forever. Alternatively, the majority of people do use the new system, but there is enough legacy code out there which is unmaintained and uses the old system that we have to continue to support it (the second scenario is less likely for now because Rust is young, but it will be not be long before this is a worry).

As for the concern of replacing one macro system with another, perhaps tooling which converts the old system to the new one by doing source transformations should be considered. Apple seems to be using this technique to great success with Obj-C and Swift, though I’ll admit that their developer audience doesn’t really match Rust’s, but their audience is probably bigger.

So, if the new system is a superset of the old system (there’s no reason I can think of, why this requirement would be unreasonable), then it should be possible to transform every existing macro into the new form. Perhaps this requires ways to opt out of the new hygiene, privacy, etc improvements, but it’s likely people will want an escape hatch for those anyways.

1 Like

So far in my limited experience writing Rust code, I find that my use of macros is mostly to work around limitations in the language/toolset:

  • In order for a unit test or benchmark to show up in the results as a separate entry, the test/benchmark has to be in a separate function. If there was a function run_bench("bench_name", || { bench body }) that had the same effect as #[bench] bench_name(b: &mut test::Bencher) { test body }, I wouldn’t have to use macros at all in tests/benchmarks. Note that this is the type of situation where I want concat_ident! to generate function names.

  • It seems people are frequently using macros to generate code to work around the inability to write generic code parameterized by integers instead of (or in addition to) types.

  • Because const fn doesn’t work, I’ve written macros that do what const fn could do. (I’m glad const fn seems to be very close to becoming stable.)

IMO, it is worth looking at what people are doing with macros and dividing them into four groups: (1) one group of “this doesn’t require the use of macros in the first place,” (2) one group of “we should improve the language so that macros aren’t needed for this use case,” (3) one group of “the current macro system handles this OK”, and (4) and one group of “this is strong motivation for improving/replacing the macro system.”

I have a feeling that a lot of current uses are in #1 and #2.

2 Likes

I’ll throw in one other interesting aspect of plugins today, which is that they force the compiler to be linked in a particular fashion. Due to the fact that we just dlopen a plugin and then run some code, expecting to share all the types in question, the compiler must be dynamically linked to all intermediate libraries like the standard library, libsyntax, librustc, etc. This is not only somewhat onerous for distribution (munging LD_LIBRARY_PATH, etc), it seems like it may be tying our hands a little much in terms a technical implementation.

Note that this does not impact the actual user-facing interface of plugins/macros/etc, it’s just part of the technical implementation. If, however, a well-defined ABI were used to serialize data across the boundary to plugins (e.g. some well-known serialization format) then we wouldn’t necessarily have this requirement and we may have more freedom in how we distribute the compiler in the future.

Basically just saying it’d be nice to take the conservative route of not requiring a dynamically linked compiler, although we may still ship one for the near future!

1 Like

I think we'll be able to deprecate the older system and transition people. The biggest (implementation) concern about keeping the existing system around is probably hygiene: hopefully we can model the existing system using whatever new system we build up. Some minimal divergence is probably ok, but I am sure people are taking advantage of the big gaps in hygiene (e.g. lifetimes, or item references). I guess it's still a bit unclear how many are doing so, and whether their code would keep working even if we improved hygiene. We'll just have to see as we go.

I don't quite know what you mean by this. Can you elaborate?

This is my hope too. The technical question is if we can locally opt in to hygiene for types (for example).

Concretely, I would like a macro defined today as macro_rules! foo {...} to work tomorrow as macro! foo {...} without modification if there was no abuse of hygiene, and with only fixes addressing hygiene if there was. However, I'm not sure if we can apply hygiene locally like this, i.e., only to macro! expansions but not macro_rules! ones. I hope it is possible.

I mean that we treat macro_rules like a syntax extension and that make our syntax extension code a little bit more ugly. But, we don't benefit greatly from it - it is not implemented as a syntax extension, it is built in to the compiler. I'm not sure this is a big thing tbh, just a thought that we might be able to improve here, though I doubt it will have very much impact.

There will need to be a bit more back-and forth than just TT in, TT out - the plugin will need access to the interner and attribute usage marker in some way, for example. It would probably be possible to work the whole thing out over some IPC layer though, which could potentially even let the plugins not have to be dlopened which has always been a bit of a sketchy proposition.

I'd really like to get a handle on just what set of operations are required. My personal rough preference at this juncture is something like:

  • a base API where plugins get TT in, TT out
  • some additional side APIs for things like interners, but I don't know what this set is
  • yes, I prefer IPC to dlopen, for a variety of reasons
    • plugin crashes don't break the compiler
    • security
    • sidestep ABI issues, LD_LIBRARY_PATH questions, etc
    • probably a few more

Eventually, I would want to add the optional ability to send/receive ASTs. This is both for convenience and performance. My thought though is that this would come after the initial release. Instead, we would begin with a nursery library for parsing/serializing TT <-> AST. This way we can prototype the AST and parser and experiment with it, I think it has lots of interesting demands that the current AST and parser do not satisfy:

  • extensibility: it'd be nice to be able to add a keyword, like yield, but otherwise parse Rust grammar unchanged. There is some research in this area that is worth looking into, e.g. I've had good experiences using Polyglot extensible compiler framework, which has a lot of novel ideas in this direction. I'm sure there's more.
  • including more information: I think syntax extensions will want easy access to everything in the source, probably including even comments.
  • ease of grafting and copying around etc: we should make the API a joy to use. I know there are some nice builders out there.

In fact, even just TT <-> TT interfaces raise some questions. I am not sure that the tokens we use should be the same tokens that Rust uses. It might be useful to have a more primitive notion of token. For example, I have often wished for the ability to use symbols that the Rust tokenizer doesn't recognize in my macros. Plus, I don't want changes to the tokenizer to break syntax extensions, if we can help it. So I'd prefer if our tokens were something pared down. I haven't thought this all the way through, but maybe something like:

  • The delimeters (, [, {, }, ], ), each of which encompasses a delimeted tree
  • String constants
  • Comments
  • Floats
  • Integers
  • Symbols (continuous strings of other characters, like !=, <> or !@#!@#!)
  • Words (identifiers + keywords + reserved-words etc)

(These tokens would then be "re-tokenized" to get the Rust tokens.) Anyway, I'm not sure if that's the right breakdown (I haven't, for example, gone to look at the list of Rust tokens to see what I'm overlooking), and maybe this concern is just overblown. But it's something to think about.

Some other open questions though:

  • should we give syntax extensions access to the rest of the surrounding AST? I'd prefer not to, at least to start, but eventually read-only access might be ok. In that case, it'd be very good if they ask for it piecemeal, so we can track what they looked it. This seems to require an AST-based API (though I guess we could reserialize).
  • what about attributes? We've long had a plan to make outer attributes act like macros. @nrc has raised some interesting questions about whether we could permit outer attributes to be used with arbitrary token trees. The idea would be that #[foo] can be followed by any number of token trees that ends in a ; or {...}. This kind of sort of seems to work, though there's probably a gotcha somewhere in expressions (e.g., if/else), and it is maybe limiting on future extensions of the grammar. (Also, inner attributes don't fit here.)
  • presuming we keep attributes as only attaching to Rust code, then I think that if a decorator were implemented as a syntax extension, we would reserialize the Rust code (until such time as we support an AST-based interface).
  • what about namespacing and so on? I've been toying with some more name resolution ideas which I think are getting somewhere. I'm hoping to write them up soon.

OK, that's all I can think of at this moment. :smile:

I don't understand why this is an issue. IMHO, this provides separation of concerns and is easier to reason about and implement. e.g. in cross-compilation contexts its clear what code is compiled for which arch and where it runs.

1 Like

Fundamentally, the most basic possible interface is gluing the macro input back together into a string and handing that off to the plugin, expecting a lump of raw source text back. I actually had a fork of rustc a while ago that did exactly this over stdin/stdout. Means you can do silly things like write plugins in Python, since it's now just a question of text processing. :stuck_out_tongue:

You don't strictly need anything beyond that, but I think it would be good to expose some sort of IPC layer that gives plugins access to the following:

  • Lexing. Both "turn this big string into a whack of tokens", and "what's the next token in this string?".
    • Actually, that first one could be done by actually passing some opaque "handle" to the plugin that it can trade in for either source text or tokens.
  • Parsing. "Here are some tokens; give me back an opaque handle that represents an expression at the beginning, plus the number of tokens it used."
    • I surveyed every procedural macro I could find on rust-ci a while ago and found that only one actually needed AST manipulation, and that was just parsing very simple arithmetic expressions.
    • Anything more complicated can link against syntex (or something akin to it).

As for lexing things that aren't in the language itself: at that point, you might as well just have a "Flotsam" token type that gobbles up all text that would otherwise cause a parsing error. Macros can either deal with it, or the parser will crash if it finds one outside a macro invocation.

The problem with just passing strings around is implementing hygene. After all, that’s why the C macro system is so hard to use…

4 Likes

That would be something you’d have to implement on top, if you cared about it. My point was more that, if you really want the absolute usable minimum, a big string is pretty much it. Also, hygiene isn’t really a huge concern for various cases, like generating Cap’n’proto structures from IDL, which is something that could easily be implemented as a simple “stream filter”.

But really, beyond that, that’s where you’d want a higher-level interface. In my prototype, I had a provision for specifying whether a macro wanted to use plain text or something higher level. There’s no reason both can’t be supported (maintenance and implementation costs aside, obviously).

Does generating complete source files from IDL really want compiler extensions? This sounds like a use case for line number annotations and just generating the file in a separate step.

I don’t understand this idea that there are some forms of code generation that “aren’t worthy” of in-language invocation syntax. Either you want code generation (which is what macros are), or you don’t. If you’re going to support code generation, what’s the point of drawing an arbitrary line in the sand and saying “oh no, that’s not special enough; do that in a build script”? You might as well just support everything you can reasonably support.

Anyway, that’s all beside the point, which was that plain strings are sufficiently useful in some cases (any DSL that involves a direct rewrite counts, too). If the language grew a syntax for annotating hygiene on identifiers (strawman: ${"name", ctx}), it’d be sufficient for that, too.

Because the Cap'n Proto IDL won't parse as a Rust token tree. This means either that your compiler plugin is invoked with just a file name, in which case there is no difference between it and a build script, or that your IDL is completely wrapped in quotes, which is significantly worse than a build script and external file.

I have no problem with writing IDLs as Rust macros or syntax extensions, but that won't work with an IDL language that uses # for comments.

I’m not arguing this any more as it’s getting completely off topic.

I'll just chime in that I disagree with this heavily. The more expressive language is, the greater burden is on the reader to understand what someone wrote in their DSL.

As far as plugin extensions are concerned, I'd love if the plugin system could look more like regular language.

I will be blogging about ideas/plans for improvement, a little at a time. Starting with concatenating idents and macros in idents position - http://www.ncameron.org/blog/untitledconcat_idents-and-macros-in-ident-position/

2 Likes