Discussion: Adding grammar information to Procedural Macros for proper custom syntax support in the toolchain

Actually, it's already the case that a macro invoked as macro!() will have its arguments formatted if they look like fully syntactically valid Rust code. (Though the exact rules as to what syntax is allowed can be a little opaque.) Invocations as macro!{} get left as-is. I don't see/use macro![] enough to determine what heuristics it uses for when to format its arguments.

I've been bit by rustfmt deciding to remove a for<T> from a macro invocation for some reason before. It quietly just works most of the time and its only really noticable when things go wrong or suddenly change.

Try running rustfmt on this example.

input
macro_rules! m {($($t:tt)*) => {}}

m!(
    fn f ( ) -> i32 { 0 }
);

m!{
    fn f ( ) -> i32 { 0 }
}

m![
    fn f ( ) -> i32 { 0 }
];
output
macro_rules! m {
    ($($t:tt)*) => {};
}

m!(
    fn f() -> i32 {
        0
    }
);

m! {
    fn f ( ) -> i32 { 0 }
}

m![
    fn f() -> i32 {
        0
    }
];
2 Likes

For Slint, I have developed at LSP server. Some editors such as vscode support having several language servers for the same file. So rust-analyzer takes care of everything, while slint-lsp takes care of what is in the slint! macro.

1 Like

Oh, interesting... thanks for clarifying that for me.

I'd somehow convinced myself that it was the case that eprintln! wasn't formatted correctly, but others are, but now that I'm actually testing that hypothesis I see that I was wrong.

... I guess I'll have to keep a closer eye on when rustfmt doesn't do anything. I didn't think we used brace-style macros in our code that much, but maybe we do more than I realize.

For a small (or even a complex, self-contained) DSL such as JSON or Slint, this approach of using a separate formatter makes a great deal of sense. For something like impl_scope! (essentially just Rust with a couple of tweaks), it doesn't.

This approach would be simpler and likely be sufficient for impl_scope! (some macro input might be rejected by a strict Rust parser, but most would not be).

2 Likes

OK, circling back: the use case where I ran into this was with cfg_if, which is supposed to look like real Rust code, but isn't actually. That's why I thought it was an issue with rustfmt, as opposed to "hard technical problem to get right".

So in this case, the cfg_if crate would somehow need to communicate to rustfmt what inside a macro is considered a "normal" AST, which is... well, hard.

cfg_if::cfg_if! {
    if #[cfg(target_arch = "wasm32")] {

        // stuff in here will be ignored by rustfmt
        // because it's inside a braced macro

        let loader = super::wasm::load_aws_config().foo().bar();

        loader
    } else {
        aws_config::defaults(BehaviorVersion::latest())
    }
}

Isn't if cfg!(target_arch = "wasm32") the way to handle this rather than a macro to allow if #[cfg(…)]? Maybe I'm missing the use case for cfg_if here…

Looked at the docs…seems that it also supports having function/method definitions inside of the blocks, so it works at top-level, trait, and impl "scopes" rather than just inside of functions.

if cfg!() requires all the branches to be compilable at the current target, cfg_if!() doesn’t.

cfg_if!() could use less weird syntax though (like

cfg_if! {
    if cfg(target_arch = "wasm32") { ... }
}

or even

cfg_if! {
    if target_arch == "wasm32" { ... }
}
2 Likes

The reason cfg_if doesn't get formatted is because rustfmt by design doesn't do macro expansion, name resolution or anything else that requires looking beyond a single file. If rustfmt was able to perform macro expansion, it should not be too hard for rustfmt to figure out how to format the code inside the if's (though not the if condition). There is no need to add actual grammar information support to proc macros for cfg_if and other macros that copy-paste the part that should be formatted verbatim to the output.

There could be some sort of conventional syntactic indication that the code inside of a macro invocation is considered to be normal Rust code (like maybe macro! {{ }} (double braces)) or something. It would enable formatting at the cost of slight syntax weirdness while keeping rustfmt simple and separated from the compiler.

Using () parentheses instead if {} does exactly that.

3 Likes

TIL. The formatting for cfg_if!()-like usecase is a bit awkward though:

cfg_if!(if cfg(target_arch = "wasm32-unknown-unknown") {
    do_something();
} else {
    do_something_else();
});

Going to comment here rather than clutter up #8. Thanks for starting (continuing?) this discussion.

serde_json::json!, tokio::select!, sqlx::query!, leptos::view! are all extremely common macros within their respective domains. I'd classify tokio::select! in particular as being quasi-syntactical, as in practice it's almost guaranteed that some version of it is going to end up used in any non-trivial async Rust codebase. The reason I bring this up again is because I suspect the impact of the pain of not having proper formatting support for these is greatly underestimated and we need more concrete data on the subject so we can motivate rustfmt devs to prioritise the issue.

I also think there's a thread here to pull at around the precedent of dioxus-cli, leptosfmt etc. being standalone tools. In particular leptosfmt being a drop-in replacement for rustfmt is quite interesting. It seems to indicate a different path to take towards sustainably integrating custom macros into the formatting ecosystem. Instead of trying to modify core tools like rustfmt and rust-analyzer directly, they could introduce an extensions system that macro developers could use to provide third-party plugins, similar to what cargo has. Of course, that adds its own complexity, but it's bounded complexity, while the use cases it enables (even outside of formatting) are unbounded.

Clippy suffers from a very similar lack of extensibility, which has also motivated drop-in alternatives like Dylint.

Does anyone foresee any particularly difficult challenges with a plugin approach?

2 Likes

The most challenging part will probably be agreeing on a stable API surface for handling the plugins. And the lack of name resolution can still cause surprises. But having plugins to handle nonstandard macro formatting is certainly more doable than teaching rustfmt about "common enough" macro names (like how vec! is handled).

A secondary thing is that "silently" running new binaries during compilation has security implications. Say you've done cargo install cool-cli. It's fine, and you've even used cool for a month or so. Now you upgrade your Rust toolchain, and surprise, you also installed a second binary, rustfmt-vec, which now gets run during compilation without you ever having known about it, because cool-cli was compromised to also distribute that formerly silent payload.

So some sort of opt in is probably desired in order to mitigate that, even if only for current editions. It may be annoying, but one option that would also mitigate surprises around textual activation is to list all active fmt addons in rustfmt.toml keyed by the macro name that they're activated for. So something like:

[macros]
json = "rust-json-fmt"
select = "tokio fmt"
query = "sqlx fmt"
view = "leptos fmt"

As for what the interface would actually be, think of how you can define an appropriate narrow waist. I think an appropriate one could be for rustfmt to slice to just the macro! { … } text, remove any block indentation contextually required, pass that through the addon via the same CLI that rustfmt use, and then readd the contextual block indentation before splicing it back into the formatted source.

That feels like it could be reasonably straightforward to implement unstably (like most of rustfmt.toml cfg is) and experiment with in rustfmt, if someone wants to propose that to the team and do the work needed to actually implement it.

2 Likes

Do we have any new progress now?

There's not been any new development here.

Well, there's been some movement towards preparing cfg_match! for stabilization. rustfmt will do the mod discovery that it does for cfg_if! now, but there's no special formatting handling for those or any other macro.

Also still relevant is that rustfmt doesn't do any kind of name resolution.

1 Like

I wrote a program to format json! macros

First parses the original input tokenstream into a syn tree,

then dump the parsed syn tree to json string.

Then write it to a specific location in the. rs file.

Not fast, but use syn as parser.

I believe the key to letting rustfmt understand macro DSLs is to provide a dedicated syntax description file.

Today, a macro_rules! macro is little more than a pattern-matching engine that expands by recursion. The DSL it defines freely interleaves Rust expressions with its own keywords. At minimum, we should be able to format the Rust fragments inside it.

I once experimented with a ā€œsmart formatterā€ that parsed a macro’s match arms and emitted a PEST grammar, but writing it felt like building yet another interpreter.

For procedural macros the situation is even worse: the grammar is hidden inside ad-hoc syn structures or private parsing functions.

The simplest way forward is to let macro authors supply a grammar file—e.g., in tree-sitter or any other community-standard format. Rustfmt could then format only the parts it recognizes while leaving the rest untouched.

The ā€œDeclarative (macro_rules!) macro improvementsā€ initiative seems like the right place to surface this idea: if more functionality can be expressed with declarative macros, fewer crates will need to resort to proc-macros, making a shared grammar format even more valuable.

One issue with this is that rustfmt does not do name resolution as part of formatting, so in its current design, it cannot do macro-specific formatting.

That said, as a very minimal starting point, I've wondered sometimes if we could have grammars or rules or patterns associated with macro names within an individual project's rustfmt confusion configuration. For instance, "format this like println".

1 Like

From my experience a lot of macros try to align their syntax to be rust-like or contain rust code inside. Currently macro invocations using () are formatted if they look like a function call.

How far could we go if this is extended a bit? Like:

  • every {} that only contains valid rust is formatted like rust
  • Add an extra indent for every {}
  • Align the lines beginnings to the indent (but don't add new line breaks)
  • Etc

There would then be a tradeoff of formatting quality and macro compatibility when deciding the amount of rules.

This would of course mean that some parts of the calls are not formatted. But the biggest pain point seems to be having to write rust code without formatting.

Macro authors could then design their macros to be somewhat compatible with this and opt in by telling their user to use () for the invocation. And users would (maybe?) be happy because there is a bit more commonality between the syntax of different macros.

And if the formatting breaks something it is always possible to switch to {} braces for the call.

I’m afraid the ā€œjust look at the bracesā€ approach can’t work in practice.

  1. Hard-coded bracket rules have already been tried and abandoned The last time we baked special knowledge about delimiters into rustfmt itself was lazy_static!{.

  2. Braces are ambiguous A pair of {} can be

  • a block of Rust statements,
  • a struct literal,
  • a macro body that looks like a struct literal but isn’t (e.g. quote! { struct S {} }), or
  • a DSL that merely borrows Rust’s punctuation (e.g. Dioxus rsx! { div { class: "foo" } }). Without name-resolution rustfmt has no reliable way to know which case it’s facing.
  1. Even ā€œvalid Rustā€ inside braces can be the wrong thing to touch In quote! { let #ident = #expr; } the tokens let, #ident, and #expr are not a normal let statement; re-indenting or re-breaking the line can change the span information that quote relies on. We’ve seen this break real crates.

  2. Parentheses don’t save us either vec![a, b, c] uses , sql!(...) uses (), and html!{...} uses {}. Telling macro authors ā€œjust standardise on ()ā€ both ignores existing conventions and pushes the compatibility burden onto every crate in the ecosystem.

1 Like