I’m moving this discussion from github to here at the suggestion of @centril.
I’ve been thinking about how macros work in rust, and realized that what we’re after is ‘rust within rust’. derive macros attempt this, but are limited because they can only provide access to the token stream of the thing that they are wrapping; if you need to know about the crate your macro is executing within, or need to embed external files (e.g., your externally generated lookup table), then things get messier. So when I read about JAI’s #run idea, I got pretty excited. Combined with @Zoxc’s compiler interface ideas, I think that we can make ‘rust within rust’ a reality. Here is what I’m thinking (BTW, my ideas have shifted a little from what I wrote on github, so please read this too):
Following what @Zoxc suggested, make a crate available for the compiler itself. The crate ships with the compiler (you can’t get the crate on https://crates.io/, although there should probably be an entry there so that everyone knows that it is a part of rustc), and is available via rustup doc
. Just like any other crate, you can use using rustc_interface::prelude::*
, and everything Just Works™. An important part of this is that the crate needs to be available to any code, not just code that is running within a macro when the compiler calls it. This will make it easier to test out potentially complicated ‘macro’ code separately, and only use it when it is ready for prime-time. The create also defines a public ‘Macro’ trait that any code that wants to be called by the compiler needs to implement. This defines one function, which would have a callback similar to fn macro_callback(compiler_state_stack: &mut Vec<&mut CompilerState>);
This is the entry point for macros being called by the compiler at compile time.
With that in place, the compiler can have a #run
keyword implemented. When it encounters #run
, it does the following:
- Makes sure that its state information is accurately reflected in the
CompilerState
object that it will be passing to the callback. - Pauses the current compilation, pushing the
CompilerState
onto a stack. - Creates a new
CompilerState
object that is used by a new instance of the compiler to compile the#run
block. If there are multiple#run
blocks nested within one another, you keep repeating these steps until you reach the inner-most block. - Once the inner-most block is reached, it has the compiler state information for each compiler instantiation up to that point.
- Because the top-most
CompilerState
on the stack corresponds to the inner-most run block, we know that there aren’t any more#run
blocks to process. We can complete the compilation of thatCompilerState
block as a library object. We keep the top-mostCompilerState
instance after popping it off the stack. - The compiler now starts working its way down the stack. The top-most
CompilerState
instance is adopted as the current state, and the rest of the stack is fed to the newly compiled library’s,macro_callback()
instance. - Once the stack is empty, the last
CompilerState
instance is adopted as the new state, and compilation continues.
Advantages of this approach
- Like derive macros, you don’t have to learn a new macro language; it’s just rust all the way down.
- You have the full power of the standard library; you can access the filesystem, you can even have a macro that pops up a GUI if you need to (possibly via WASM and a call to an external browser).
- You have access to the compiler’s complete state, not just the little bit that the macro was involved with.
- With some work, it should be possible to get a debugger to step through the
#run
code. That way, you can see where your macro failed. - Speaking of debuggers, they could be macros as well, able to inspect and mutate the
CompilerState
objects as needed. This may be handy when debugging the compiler itself. - Others?
Disadvantages
- Security. Compilation would be equivalent to running untrusted code on your machine. This isn’t that big an issue because the target audience are developers, who probably know more than the average person about how to get into and get out of trouble. There is a chance that someone will write a macro that, when compiled, churns out a virus/worm that promptly tries to infect everything around, but people already have that ability, and getting someone else to download and compile your crate the very first time is more complicated than just writing an ordinary worm.
- Compilation time. Although the API as suggested is a pure-ish function (you could make it pure by having it return the vector, instead of mutating it in-place), you still have a stack of compilation happening, which will happen each and every time a
#run
statement is encountered. By being careful and making the API a truly purely functional API, we should be able to cache compile steps and reuse them, but that would require a great deal more thought. Again, this isn’t that big a deal because you can already hurt yourself with macros. - Others?
Any thoughts/comments would be appreciated.