Pre-RFC: Same-crate proc macros

(Just want to make a note before this: this is my first RFC/Pre-RFC and I'm reasonably new to rust compiler dev in general, so please be patient but point out issues with this. Thanks!)

(also, if you have any more examples, please tell me, as I want to add more but can't think of any more)

Glossary

Same-crate proc macros: The subject of this RFC, proc macros that can be used in the same crate as where they're defined.

External proc macros: The current proc macros, where they are defined in a different or external crate as where they are used. This does not necessarily require the current system where the crate type has to be proc-macro.

Summary

Add support for same-crate proc macros. My idea for how this could be implemented would involve compiling twice(with some caveats to hopefully provide similar performance as to currently).

Motivation

This could be quite useful in some scenarios and would likely increase usage of proc macros in the rust ecosystem. One example use case would be if you are, for example, implementing the API for OpenGL or a similar library in rust. In this scenario, you may want a proc macro that can take in the exact C code and convert it into valid rust enums or the such. This is of course a very specific scenario, but this still applies for other scenarios as well.

Guide-level explanation

(see the glossary for terms)

Same-crate proc macros are pretty easy to think about, as they're just a type of proc macro that can be used in the same crate as where they're defined. Internally, when compiling, more happens then when using external proc macros, as the crate is compiled twice, however they have similar compilation performance to external proc macros as each function is still only compiled once. This doesn't really affect readability, as the only change is that, for example, a crate my_cool_library_proc and my_cool_library could be merged.

Reference-level explanation

Internally, the compilers flow(or possibly cargo's? unsure) would be altered by inserting two steps: The first would be somewhere before codegen, but after parsing. I only have a high level overview of the compiler's flow so at the moment that's the best level of granularity I can get. At this stage, if the crate uses same-crate proc macros, it would apply some form of filter to remove the code irrelevant to proc macros(including the actual proc macro usage), while saving this code for later. It would also set a special flag, before continuing to compile to a dynamic library for usage as a proc macro as usual. After it actually generates the dynamic library however, it would then jump back and add the code saved before back, while removing the definitions of the proc macros. It would then compile again as the format the user wants after applying the proc macros. I'm unsure about what level of changes to the compiler it would require, or if there are any corner cases.

Drawbacks

This would require some major changes to the compiler and a significant amount of work in testing to ensure this wouldn't break anything major. Other then that, I don't think there would be many hiccups on the user's end.

Rationale and alternatives

I have no clue if this design/implementation would work the best, this is simply what I came up with. There may be better ways to implement this. However, this design would likely work, and it shouldn't cause many regressions. This wouldn't be able to be implemented as a library or macro as that would require many other language features be implemented so that you could implement functions for another crate, which needless to say would be a bad idea and could lead to bad situations.

Prior art

I don't think there's much prior art, as rust's proc macros are pretty unique.

Unresolved questions

Related issues that are yet out of scope include rust-lang/rust#130856. How this is implemented may be changed before this is merged, but as far as I can tell not much would need to be changed before stabilization if this gets merged.

Future possibilities

I don't feel there are many future possibilities for this, but this could bring more attention to proc macros in general and lead to more work on the previously mentioned rust-lang/rust#130856.

Unfortunately, this is much more complicated than "just do it," because (procedural) macro expansion can expand to mod statements and expand the set of source files. Additionally, you can't even just skip expanding proc macros, as now any non-proc-macro-gated code referring to item names defined by the proc macro expansion are also errors, and you can't know the set of unknown names until you expand the proc macro.

So proc macro mode would need to silently ignore any code with any name, type, or impl resolution errors… but now encountering real errors in the proc macro stage is a horrible experience, as code doesn't emit errors until it's used. There's no "oops, you can't use that at proc macro time," just an error about an undefined name or missing impl that isn't there if the code doesn't get called from the proc macro.

A better solution imo is to allow colocation without conflating runtime and macrotime code. I.e., by allowing multiple crates (compilation units) to exist within a single package, e.g. via something along the lines of #[proc_macro] mod crate macros;.

7 Likes

Oh I see, I didn't think of that. Yeah that could work, although that could decrease readability. Of course it's better than my idea since it would actually work :sweat_smile:

Kind of curious though, would it just be too complex to reasonably implement something to filter out all non-proc macro code(and the imports) and then reinsert it for the actual build?

I feel like another critical problem of same-crate proc-macros is handling of dependencies. It is common for a proc-macro to require proc-macro-specific dependencies. If you are building the crate twice (once as the actual crate, and once as the proc-macro), you'll end up building the entire dependency tree twice (for both the regular library and the proc-macro). That can be extremely expensive. Additionally, some dependencies may not even be buildable on the host (for example, cross-compiling with some sys crate where its library is only available on the target). Also due to feature unification, the two sides could end up with different features/behavior.

5 Likes

So, all of parsing, macro expansion, and item name resolution is by necessity a single fixed-point iteration algorithm, as macros can be used by namespaced path, and macro expansion can add new items into the namespace. Your "something" would need to defer the expansion of any local proc macro, do tree shaking to identify items reachable from a proc macro, hope there aren't any mysterious errors from trying to use items/impls that don't exist yet, compile just the reachable items, then reset back to the start to expand the new proc macros and finish name resolution (potentially retroactively causing a name resolution error in the proc macro code due to an introduced name clash) before finally continuing with normal compilation. You basically do have to compile two separate crates, and you're just inferring a kind of #[cfg([not(]proc_macro))] to separate the crates.

So, yeah, what would need to be done is known and fairly straightforward. But it's a lot of effort for a low payoff when we could just require the user to please don't write a proc macro that can potentially depend on its own expansion. What is or isn't "proc macro code" isn't obvious to the compiler. If you require the developer to make it obvious, that'll look something like mod crate where organizationally you have a module but for compilation purposes it's a separate crate that the parent imports.

1 Like

Hmm, wait. Multiple crates can already exist in the same package, so why not add macros to the already existing crate types bin, examples, etc? I guess it would be novel if the "main" crate of a package depended on one of its sibling crates and not vice versa [1] but certainly not a blocker?


  1. Assuming it does rather than just providing it to dependencies. ↩︎

1 Like

This is a proposal from a few year ago: Proc_macro in an existing library crate - #11 by ogoffart

1 Like

If you went the multiple-crates-in-the-same-package route, this could perhaps be handled with a [macro-dependencies] section in Cargo.toml

Edit: oh, now I see that's very similar to what @ogoffart was proposing in the linked post: [proc-macro.dependencies]

1 Like

I’d been thinking build-dependencies was close enough, as regular dependencies are shared between library and binaries, but I guess that would lose parallelism if you also had a build script.

1 Like