Human readable macro cache for proc_macros

Currently rustc processes proc macros during compilation. This has a couple of disadvanteges:

  • Code builders must run macro code on their machine, this may have security hasserts
  • Macro needs dedicated dependencies that need to be built as well. This takes compilation time.
  • It is difficult to inspect, what code is generated by macros.
  • Tools like rust-analysers have a hard time to deal with macros and often rely on hacky solutions

In order to overcome these issues, I would propose a human readable macro cache and two compilation modes (syntax only a suggestion):

rustc <crateroot.rs> --emit=macros --macro-cache-dir=<path_to_cache_dir>

In this mode macros are expanded, each expantion is cached into a suitible named file in the specified cache dir.

rustc <crateroot.rs> --macro-cache-dir=<path_to_cache_dir>

This works like the normal build mode, however proc_macro code will never be invocted, but instead the compiler tries to read the expansions from the specified cache dir. If this fails the compilation will abort.

Each cache files should contain three sections, seperated by "§§":

  • the first section should contains some metadata as a json, about the macro invocation described.
  • the second section contains the full macro expression that is being replaced.
  • the third section contains the expanded macro code.

For example, the wstr macro, a macro that is used to generate utf16 literals such an expansion file could look like this ( would be substituted by suitable hash values):

{"rust_macro_cache_version": 1, macro_kind: "function", "macro_item": "wstr_<hash>::wstr", macro_hash: <hash>, "expand_crate" : "foocrate", "expand_module": "foocrate::foo::bar", "expand_linenum": 32, "macro-frag-spec": "expr", "targets" : "*", hash:  <hash>}
§§
wstr!("Hello 🦀")
§§
[0x0048u16, 0x0065u16, 0x006Cu16, 0x006Cu16, 0x006Fu16, 0xD83Eu16, 0xDD80u16].as_slice()

macro_kind would be "function", "attribute" or "derive". "macro-frag-spec" would be one of the macro fragment specifiers (similar to those listed for declarative macros), that matches what kind of code fragment is expected in the macro expansion context, e.g. "expr", "stmt" or "item". targets could be used to allow for the macro cache to be rejected on certain targets. In order to record macro hygiene identifier can carry an additional suffice (like "§1" "§2") or something indicating which name scope an identifier should live in (macro name scope or expansion scope).

Using § as a special symbol is just a suggestion. I picked it, because it is very commonly available in fonts (due to being in latin-1) but not really usefull in handwritten code due to not being present on all keybords.

It is not possible to losslessly format this as snippets of regular source code. The macro hygiene information is essential for correct name resolution, yet it is not representable in regular rust source. Also spans are necessary for pretty errors and debuginfo, which regular rust source can't represent either. And finally "invisible" delimiters are not representable either, yet do affect parsing.

This proposal doesn't help in any way with that afaict as the macro cache will have to be built with the exact same rustc version as would consume it due to the format being unstable, which will almost certainly not match whatever service created the macro cache (crates.io?).

I fear that this will make external tools think that they can parse the macro cache, even though it's format will almost certainly have to be unstable.

Rust-analyzer and RustRover use rust-analyzer-proc-macro-server for running proc macros, which is distributed with each rustc toolchain. This is not a hacky solution at all IMHO. Also unlike your macro cache suggestion this doesn't require writing the source file to the disk, which is not possible for interactive use in ide's. People expect the macros to be re-expanded without having to save the file they are editing.

4 Likes

Yes this is why my suggestion includes a slight extention to regular source code. As mentioned identifiers can be sufficed to ensure that they are put in the correct namespace for name resultion to ensure macro hygene (an identify "foo" in the macro input scope would stay "foo", but one in the macros own scope would be changed to "foo§1". Invisible deliminators could similarly be designed.

This proposal would of course require making the cache format stable. This is certainly a restriction, but given that current changes are allready constrained by the stable api of the proc_macro crate API, so I don't see why such a format would be significantly more restrictive.

I am genuenly, curious why you are so convinced why the format cannot be made stable. Could you point out some specific reasons for this.

Here I kind of disagree, a tool which provides a fully undocumented and unstable interface and tells people to not use it if they are not "rust-analyzer" is a hacky solution IMO. You can only use this option if you tightly coordinate with the Rust team. This may work for very specific high profile cases like rust-analyser, but for other tools, this is entirely useless. I do agree with you, that my solution lacks the interaction abilities. However you could also offer the same API as the files also as an interactive tool. Which would have the benefit of being a stable interface, everybody could rely on if needed. The main idea would however still be to store the expantions on disk for the commit.

The full list of things in the expansion context can be found at ExpnData in rustc_span::hygiene - Rust combined with Transparency in rustc_span::hygiene - Rust Most of this data is entirely unstable. Transparency::Opaque can't be created using the stable proc macro API, yet is created by various stable macros in the standard library and rustc itself. allow_internal_unstable must never be stable as it would allow using unstable features on stable. The standard library needs it in the implementation of several stable macros like panic!() to call unstable functions and use various unstable language features however. Something like macro_def_id references the internal DefId of the macro. To serialize this in a lossless way you have to expose the StableCrateId and DefPath. The DefPath has seen frequent changes to the DefKind categorization.

As for spans all data is in SpanData in rustc_span - Rust. It may seem a lot more managable, but the lo and hi fields reference the source map which contains a SourceFile in rustc_span - Rust for each source file, which again is not very stable.

Even if a proc macro itself doesn't generate most of these things (at least macro_def_id is still generated for proc macros), a proc macro can take a span that contains any of these as input argument and pass it through to the output either unchanged or using methods like resolved_at, located_at or subspan with non-trivial modifications. Especially once the expand_expr interface is stable to allow expanding arbitrary macros that are rustc builtin or part of the standard library.

1 Like

Thank you for the elaboration. While I did not expected my proposal to also cover build in macros (just like declarative macros.), I do realize that going through would be much more challanging them I would have expected.