Discussion: Adding grammar information to Procedural Macros for proper custom syntax support in the toolchain

I am writing this post here on the Internals Forum in order to start a discussion on the matter as well as gather more information. I'm not a toolchain developer or a member of any of the Working Groups, so I have little knowledge on whether this is something that is actively being worked on. The latest semi-relevant discussion on the forum was, as far as I am aware, here, but perhaps I have missed something. In my eyes we lack a good place to discuss this.

Problem and use cases

The reason why I would like this to be discussed here is multifaceted. The lack of grammar information in proc macros affects many areas of the toolchain, and there have been isolated attempts at bringing this up in projects like rust-analyzer and rustfmt, but what I believe we need is a unified interface that both of these projects - and any other that would benefit - can use. I will focus on these two purely because that's as far as my knowledge on the topic can reach.

This is neither the responsibility of an IDE integration nor any formatting tool. If rust-analyzer were to implement something like this for its purposes, rustfmt would need to have its own implementation, due to rustfmt not being dependent on the other. The rustfmt developer team also understandably has little interest in implementing this themselves. It's a heavy effort that is outside the scope of the project.

Function-like macros with custom syntax appear in crates that have wide usage throughout the ecosystem. Examples include serde_json with 310 million downloads (although usage of the json! macro isn't extremely common), sqlx with 19 million, delegate with 2.5 million, yew with 1.3 million, leptos with 0.6 million, and so on (as of writing). This does to my knowledge currently mostly impact frontend frameworks, since basically all of them use some flavor of JSX or RSX written using macros, but it is not limited to those either.

rustfmt

rustfmt already has partial support for formatting some function-like macros (Does rustfmt format macro calls? · rust-lang/rustfmt · Discussion #5437 · GitHub), but the user experience while working with macros that use non-compliant syntax would be significantly improved by, for example, implementing a hook that libraries could use to define their own formatting rules and parsers for syntax trees.

This requires additional design work but would improve Rust toolchain's ability to handle other kinds of syntax and in turn make function-like macros much more appealing to use. It would also remove the need for library developers to maintain their own CLI tools for formatting macros specifically. This would both remove the pain of having to configure formatting differently for every project (these CLI tools do not have a unified interface) on the user's side, while also making it easier for developers to offer a positive experience for library consumers. Examples of such CLI tools include dioxus-cli, leptosfmt (this one is a drop-in for rustfmt thankfully), yew-fmt (also compatible, but an unofficial tool!).

A solution to this problem is something that has been requested and mentioned on multiple occasions in different projects. See examples:

rust-analyzer

The IDE integration could potentially benefit from proper syntax highlighting, attributing the correct documentation items to selected keywords, as well as code completion, in the context of custom-syntax macros.

You can develop IDE extensions for libraries as a workaround to adding grammar metadata to macros. Those suffer from the same problems as external formatting tools. They are often unofficial hacks that become outdated over time and are of lower quality than something that could be provided by library developers. Leaving that task to library maintainers is an option, but in practice that just doesn't happen. Major frontend frameworks, all except for dioxus, don't have official extensions, and the one provided by dioxus - dioxuslabs.dioxus - only wraps functionality from dioxus-cli. It doesn't provide any of the things mentioned earlier.

This has been considered, among others, here:


No one seems to want to or know how to deal with this, while I see it is a viable improvement. Addressing this lacking could greatly enhance library development experience and streamline tool integration. Being able to expose the expected input syntax would be a great tool in the hands of library developers. I encourage everyone to share their thoughts and experiences to help us potentially move towards a solution. It is not the first time this is brought up, and unlikely to be the last.

I would love to elaborate on the exact details of what kind of interface would need to be exposed for the use cases here, but I have very limited information on the internal design of the toolchain. I implore you to share your thoughts, though. I did what I could with the information I had. There are still many questions to be asked!

Hopefully this is of high enough quality for my first interaction with the Forum... :grin:

14 Likes

I thought it might be useful to point out the prior art on this from emacs lisp, since the same problem exists there. Macros in emacs lisp can have a declare "special form" (really just means something that looks like a function call but is actually directly implemented by the compiler) that lets the macro author either select from a few popular preset ways of doing indentation or provide an arbitrary function.

I can't speak to the merit of having an API for communicating a grammar versus having an API for handing control over indentation over to the macro author, but I can say that a grammar by itself is not enough information. There are many possible ways to format the same grammar, and which makes sense may depend on the specific macro semantics.

5 Likes

Thank you for providing your point of view!

I agree that adding grammar metadata by itself would not directly solve anything - what I'm advocating is that it would provide a starting point for developing a solution to the issues mentioned.

A similar problem was mentioned in this comment at rust-lang/rustfmt#8 in response to this forum post, where @calebcartwright correctly pointed out that if rustfmt were to theoretically use exposed grammar to format macros, there wouldn't be enough information to decide how to do so - whether because there would need to be a place for macro authors to define certain rules, because formatting should be configurable, or, because, as you pointed out, some of that information might be context dependent in general.

I don't have an answer to the question of what would suffice, but this is exactly why I opened this discussion thread. If we were to imagine a formatting tool that does take advantage of grammar metadata in macros, the first approach that comes to my mind is that tool exposing its own interface for providing style information for formatting the input, in addition to consuming grammar information.

The reason why I would leave it up to the consumer of such an API is, style information would not really be useful to other consumers, and as such, the burden of accessing that data would fall on said tool. If a user were to develop their own formatting tool with diverging functionality, it would likely need an interface for accessing it different from that one, too. This would allow a tool to tailor its interface to the kind of data it needs to operate - different tools have different fundamental values and goals.

This is kind of like how rustfmt has its own config file, instead of using something like Cargo.toml to set formatting preferences. While part of the toolchain, it's also a standalone tool. Grammar can be used without formatting rules to achieve things different from formatting, such as solving the problems that the users of rust-analyzer have reported. It's a one-way dependency. Only the formatting tool needs both.

Regarding the counterproposition of:

an API for handing control over indentation over to the macro author

I definitely do see merit in that, as it would be the much simpler solution to the problem of adding formatting, and I am all for reducing its complexity. The problem I have with it is, it would not address issues raised by the users of rust-analyzer, while exposing grammar would open up the possibility of doing so. A macro developer would also be able to provide support for many consumers by implementing the same interface once, with the exception of tools that require additional information.

Perhaps, if I find time, which right now is unfortunately unlikely, I could develop a proof-of-concept library + CLI tool and share my findings. But for now, I encourage anyone with perspective on this to share their side.

I'd love to see something like this happen, but ... how?

rustfmt uses rustc_ast while proc-macros (usually) use a custom AST built over syn.

Just focussing on formatting (since that's one operation, vs the many that rust-analyzer supports), I can see a few options:

  1. Tell rustfmt to "try formatting this as if it were a specified AST type, failing silently on error". This is the cheap, cheaty limited option (which would also put pressure on rustfmt and rustc_ast to be more permissive).
  2. Support an optional custom formatter alongside proc-macro definitions. This is flexible but hacky (pushing proc-macro authors to re-implement formatters however they see fit). It also doesn't scale to the many functions that rust-analyzer supports.
  3. Re-implement via syn: separate parse and apply steps in a proc-macro definitions, expecting that the AST yielded from the first step support a reformat operation. Have syn support this reformat operation on its types, duplicating the behaviour observed from rustfmt. User-defined ASTs must also implement this, but can base their impls on the syn impls. This option would be a lot of work (duplicating various behaviour), but might be the most viable — except that, when it comes to rust-analyzer, there are many operations which these AST types would be expected to support (requiring a lot of redundant code).
  4. Stabilise the AST used by rustc, replacing or merging with syn. This involves a lot less duplicated code than the prior option, but places major constraints on rustc. (There are approaches allowing the Rust grammar to still evolve, such as supporting multiple backwards-compatible versions on the AST, but this may be placing more burden on rustc developers than the prior option does on everyone else.)

Of the above, (3) appears the most viable to me. It would, however, require that custom proc-macros define types, not functions (or in the case of custom attribute macros, a type for the result of parsing the attribute and another type representing the result of parsing (but not expanding) the item). These types would then need to implement some trait covering required functionality (parse, expand) and optional functionality (reformat, suggest expansions for some cursor, ...)

2 Likes

This is difficult to accomplish for rustfmt because to correlate the macro with its settings, rustfmt will have to perform name resolution, but name resolution in Rust is a complicated process that involves expanding macros and is fixpoint. I strongly oppose to doing that in rustfmt, as IMO it should remain very lightweight - something I can easily run every now and then or even automatically on save.

For rust-analyzer it's doable, but will require more than just syntactic information.

3 Likes

I did some tinkering locally and arrived at the exact same conclusion of (3) making the most sense. I do dislike the amount of code duplication it would introduce... I would like to avoid that as much as possible for simplicity's sake, but I'm not sure I see a way to do so, even just when doing the bare minimum of formatting parsed macro inputs into a formatted TokenStream representation.

Structs as proc macro lib output is something I've thought of too, and it would make them much more flexible, especially in scenarios like this one here. I am not sure how realistic getting something like that through is, though.

I agree with your stance on adapting rustfmt. Here's an excerpt from my response to the previously mentioned comment at rust-lang/rustfmt#8:

(calebcartwright)

I feel like positioning rustfmt as a general purpose formatting platform would run counter to that or at a minimum create a very conceivable surface for those tenants to be directly contradicted.

(me)

I sympathize with that concern. This is one of the thoughts that have been on my mind. Expanding it into a general-purpose formatting platform could indeed lead to complexities that might compromise consistency and stability.

To clarify, rustfmt is mentioned here mainly because it’s been the focal point for much of the community's requests and discussions on the topics. I’m not entirely convinced that adapting rustfmt is necessarily the best or only approach, but there’s clearly significant interest in improving the toolchain's capabilities when working with macros.

Different use cases and values exist across the community. I believe it would be beneficial to create a space where these discussions can take place. I am interested in exploring how we can address these needs without compromising on the goals of any part of the toolchain.

I wouldn't want to allow macros to run arbitrary code during formatting. It's already bad enough that macro expansion is entirely unsandboxed, can run for arbitrary time and access arbitrary resources. If rustfmt had hooks for macros, all of these issues would be created at the formatting layer. That's just insane. That's log4j levels of "let's insert arbitrary code in unexpected places because fuck security".

Not to mention, how is that code even supposed to be compiled? Does rustfmt now silently run rustc on all macro crates in the project? How is it going to find them? How will it pass proper compilation options? Formatting is supposed to be simple, fast, reliable, and idempotent (some projects gate their CI on rustfmt emitting no changes). How can any of that be guaranteed when rustfmt compiles and executes arbitrary code?

Imho inserting formatting hooks should be out of the question. A declarative approach to formatting is more viable, but that can't be part of existing macro source code, so would require introducing some new declarative grammar & layout configuration files. Also, is there a reason to believe that there is some common declarative formatting model which would be enough for existing use cases?

Finally, there are always macros with parsing and semantics special enough that there is no hope of defining them via context-free grammars. Those cases can't be handled declaratively. How common are they in the ecosystem?

5 Likes

Thank you for the harsh criticism! Sometimes that's what's needed, too.

I have also recently realized just how much trust I'm putting in the libraries I use by relying on macros and the build system, and wish those were sandboxed with at least some rudimentary permission system. To be fair, addressing that security concern is not my goal here, but I understand where you're coming from.

Reliability, simplicity and performance are also concerns I have with this approach. Perhaps this is just an artifact of daydreaming in the realm of perfection, especially considering I am trying to tinker with an ecosystem that is already actively in production use. A declarative approach does sound like a more reasonable compromise, considering limitations as well as the fact it could still potentially cover all of the use cases above.

Designing a system like that would be an effort, but the result would be significantly more satisfying than most likely whatever would come out of my previous train of thought.

Finally, there are always macros with parsing and semantics special enough that there is no hope of defining them via context-free grammars. Those cases can't be handled declaratively. How common are they in the ecosystem?

I am not sure I've seen any usage of context-dependent grammar macros in Rust, in general, but my perspective is limited.

An idea: rustfmt's approach to formatting macros is mostly to leave them unchanged. What if another tool did the opposite: custom formatting for macros while leaving the rest of the code unchanged? Then users could choose whether to involve macros and name resolution in their projects’ formatting process or not, by running both tools in sequence, or just rustfmt. This would also mean that there doesn't have to be a universal solution built into rustfmt.

3 Likes

I'll cite what Lukas said on this forum and rust-lang/rust-analyzer#15452: proc macros that don't consume inputs and avoiding erroring out would go a long way in making rust-analyzer behave nicely in the face of custom syntax.

For function-like, JSX-style macros—which seem to be the majority of function-like macros where the lack of tooling support is most acutely felt?—I'm not confident in any general solution that I might propose.

3 Likes

I think it's more viable to support macro formatting at the level of IDE. I don't see a reasonable path towards formatting macros with rustfmt in any generality, but IDEs already expand macros. They know how to compile and run them. If proc macros could output some (contents-dependent) formatting info alongside the expansion, the IDE could use it to reformat the macro. This would also alleviate most of the context-dependent concerns: context is already handled by the macro code, it just needs to output some simple markup for the contents which includes possible line breaks, tab stops and spacing.

Technically this means that cargo could do the same, but personally I wouldn't want to tightly couple cargo and rustfmt. Currently running cargo fmt doesn't depend on project structure, and I'd prefer it to stay this way. It could indeed be a separate tool cargo-macrofmt, which would build the code and reformat specifically the macro invocations. It would also preserve the stability promises of rustfmt itself.

3 Likes

rust-analyzer doesn't handle formatting for non-macro code, so it'd need to be able to handle that first. However, in order to build formatter inside of rust-analyzer, we'd need to get rid of mutable syntax trees and then move over to a Roslyn-style trivia model. The former needs to happen for performance reasons anyways, but the latter might take more time and effort.

In that case it's good that RA isn't the only Rust IDE in existence.

1 Like

I can see this being problematic for collaborative projects:

  • how would CI check formatting for macros?
  • what if some user doesn't use an IDE, or doesn't use one that supports macro formatting?

And what if IDEs implement this functionality differently?

I don't believe that I implied that rust-analyzer was the only Rust IDE in existence, but if I did, it certainly wasn't my intention! Regardless: I'm only writing down what needs to happen in rust-analyzer given that the original author had a rust-analyzer section. Those things will need to happen, but the work to migrate over to Roslyn-style trivia model hasn't been prioritized yet. It might well very well be!

Peanut gallery chiming in here (like OP, I haven't contributed in the past, but I have tried to read all the threads on this topic):

Would it be possible to add another rustfmt config setting that lists out the macros which a given user is willing to treat as having a normal AST? e.g.

// not sure where the hardcoded list is 
// i didn't see it in macros.rs::rewrite_macro
basic_macros = [
  "println",
  "eprintln",
  "json",
  "vec",
]

I would imagine that for some use cases, this would create a path forward for folks who want formatting.

Drawbacks:

  • rustfmt.toml is not expanded (and that allowing dynamic loading and building an import system in rustfmt configs is controversial/undesirable), so library authors can't ship macro formatting rules
  • any macro that requires non-trivial syntax is unsupported in this approach

Rustfmt can already deal with stdlib macros. Supporting custom macros in this way looks relatively brittle, since rustfmt has no way of knowing which exact macro was imported with a given name. Not impossible (after all, macro name collisions are uncommon, and a project can vet which macros it uses), but still icky.

Also, what is "trivial syntax"? json! surely isn't anywhere trivial. I'd restrict that moniker to simple function-like macros (i.e. taking a comma-separated list of expressions). I guess that's possible to support, but the pain points are stuff like rsx!, which is beyond any simple approaches.

1 Like

I don't see a reason why refusing to use certain tools is a valid reason to not add support for the features for everyone.

Not checking it in CI is always an option. No enforcement, no problem.

In any case, once a protocol is accepted as a specification of formatting, it can be implemented in any number of tools. This includes IDEs, but it can just as well be implemented in a CLI tool, which you would use in CI (in fact, that's the most likely way that IDE support will be implemented, at least initially). Agreeing on a common formatting is purely a political problem, not a technical one.

2 Likes

Yep, totally agree on this; I just wanted to highlight that if it is useful to be able to format macros for which every argument is a valid expression, independently of the ability to format macros which define their own syntax tree, then this is a viable path forward to support those cases.