Caching of Proc-Macro expansions

It's not unreasonable to be frustrated. What I'm about to write is a bit speculative and I want to emphasize that I didn't get consensus within the rust-analyzer team, so this is entirely a case of “I personally think this would be a good, long-term direction for rust-analyzer to pursue, but I think that world is not unlikely.”

Anyways: I think some declaration on the build system level (either Cargo, or Buck, or Bazel...) saying that this macro is impure or external dependencies would be really helpful for rust-analyzer (and I assume, RustRover) because the pre-RFC, as formulated, means that an IDE doesn't know if a macro is impure until the macro is executed. I don't really have a strong opinion as to what that the declaration is, but some information that allows determining that information without executing the proc macro would be great.

To step back why this matters: today, rust-analyzer has an external procedural macro expansion server for two reasons:

  1. ABI Instability, which is elaborated on in this blog post. This means that nightlies would have procedural macro support in rust-analyzer with minimal effort expended by rust-analyzer.
  2. Non-determinism/crash resiliency, which is alluded to here. If a procedural macro crashes, we don't want that to take down the entire IDE!

This has several implications:

  • Every procedural macro expansion is an IPC call. An IPC call can occur on every single keystroke when you are, say, writing in an attribute macro-annotated function or method (like #[tracing::instrument]!) and that feels a little wasteful.
    • Full disclosure: this isn't the biggest performance issue today, there are algorithmic improvements to be made first.
    • This likely merits a standalone pre-RFC discussion and is not a fleshed-out idea, but if rust-analyzer could override aspects of the proc macro API (namely, set_span) so that we can filter/override certain expansions, we can provide substantially more accurate completions/hovers in things like #[tokio::main], which move spans around to provide better compilation errors, which, inadvertently, makes IDE functionality (autocomplete, hovers, etc) less accurate. Here's a simple example: hovering over variable shouldn't include Tokio's builder method! However, allowing overrides in the proc-macro crate would mean that every set_span would become an IPC call.
  • The procedural macro server doesn't have access to rust-analyzer's analysis, which means that any in-memory fixups/analysis we'd want to in order perform to provide more accurate functionality is not possible. I don't think it's reasonable for the proc-macro-srv to call rust-analyzer back, as it's also used by RustRover!

I also think it'd be nice if a future edition could automatically migrate procedural macros to something fully deterministic like WebAssembly except for those that declare they're impure/have dependencies.

I find this claim rather disappointing. I already listed a few example macros for this above that, so it wouldn't have been that hard to check how popular they are. As that was not done here are explicit number based on the download of the crates

I'm partially responsible for an extremely large monorepo with a lot of Rust code and the claim that most proc macros are deterministic is, in fact, true. The veracity of this claim does not diminish the utility of sqlx::migrate!, include_dir!, or diesel_migrations::embed_migrations!. However, I think the veracity of that claim should influence the discussion in favor of supporting all impacted tools, not just build tools, with rust-analyzer being one such example.

2 Likes

I realized I have some gaps between the implications and why I think purity needs to be known a priori:

  1. We need to know whether a proc-macro is pure to determine how we run it, so we can't wait for a call inside of it to tell us. This can be used to infer risk for running the proc-macro in-process (thereby taking down the entire rust-analyzer process), which is faster and enables more correctness (I am assuming the ABI stuff is somehow, magically fixed).
  2. For build systems like Buck and Bazel, they need to know ahead of time whether a build target is fully hermetic/deterministic order to ensure optimal usage remote caching or remote execution. Procedural macros are just one kind of build target.
  3. I've wanted rust-analyzer to be able to start up without querying the build system via {buck cquery, cargo-metadata} by having a rust-analyzer-native persistent/remote cache of its name resolution data. Macros influence name resolution. I do not think it would be a good outcome for the name resolution cache if rust-analyzer were required to wait for the depinfo file, the generation of which is a non-trivial subset of a full build. I think rust-analyzer should be able to startup faster than a build!
4 Likes

Thanks for this response. At least for me that's more helpful the the previous "let's just use a Cargo.toml setting" suggestions, as it tries to explain why rust-analyzer needs these information in a certain way.

On particular thing I wondered while reading your post is that much of the talk is that a declarative solution would allow to skip using the proc-macro at all, while the run-time declaration would require to run the proc-macro first. That sounds like an obvious thing, but maybe I miss an important point: How would you cache the output of something without producing the output in the first place? That would mean you likely have to run the macro at least once to get the output for the cache. With the approach proposed in the pre-rfc you would get the output + a set of rules when to invalidate the output (e.g. never, if something changes, always). With the declarative approach you would need to load the declaration, then run the proc-macro to get the output, and finally collect the registered env variables and files from the proc-macro run. At least for me that doesn't sound that much simpler.

As that is a potential breaking change I would suggest not doing that, because a proc-macro that produces deterministic output does not necessarily need to be compatible with wasm target. For the sake of an example let's just say there is a include_sqlite_table!() macro that embeds the content of a table from a given sqlite database into your binary. That macro would produce deterministic output, but it wouldn't be compatible with the wasm target due to the C dependency. (Not that I'm aware of that this particular macro exists, but that sounds like a valid use-case + it demonstrates the problem.

I would argue that we shouldn't mix WASM into the design of this API, as that seems to be far out in the future.

Again I'm not sure if mixing up these concepts is a good idea. Consider a proc-macro abort!(bool) that just calls std::process::exit if passed true and otherwise evaluates to nothing. It would be perfectly fine to declare that as deterministic macro, as the output only depends on the input, nevertheless it will break rust-analyzer if it tries to run that in process.

That sounds like a case that would actually benefit from declaring dependencies beforehand, but as already demonstrated that's not really possible as soon as the macro reads a file or environment variable. The other solution would be to officially declare that proc-macros shouldn't do that and remove this possibility in some future edition. I'm not even sure what's the official position on what proc-macros are allowed to do and what's not allowed. (Also if we apply the same argument as for impure proc-macros these other build systems wouldn't also be not relevant as the vast majority of projects uses cargo. To be sure: I'm not saying we should that, I just pointing out a similarity)

Wasm doesn't preclude having C dependencies. The current wasm32-unknown-unknown can't due to an abi mistake we are trying to fix, but wasm32-wasi and wasm32-unknown-emscripten can have and in fact do have C dependencies. You are right however that there are C libraries that can't (yet) be compiled for wasm.

2 Likes

Note that SQLite has great support & ecosystem for wasm target sqlite3 WebAssembly & JavaScript Documentation Index

Thanks for clarifying that the wasm32 target might support native dependencies at some point. I'm well aware of this possibility, but:

  • That's not there yet, due to the Rust - C ABI incompatibility for the wasm32-unknown-unknown target
  • It doesn't change the argument that there might be a (native) dependency that is not supported by the wasm32 target.

This topic was automatically closed 540 days after the last reply. New replies are no longer allowed.