Compiler plugins vs code generation


#1

Hi!

I would like to talk about macros and compiler plugins in the context of IDEs. It’s something I’ve been thinking about recently and I’d like to share a braindump.

I’m working on IntelliJ Rust. As most other IntelliJ IDEA based IDEs, it is designed to be able to fully understand the semantics of Rust code without consulting the compiler/racer/RLS. That is, we are going to implement lexing, parsing, name resolution and type inference.

A major road block for this approach to IDEs are macros and compiler plugins. We are nowhere near handling macros at the moment, but we’ll have to do something with them eventually :slight_smile:

While macros by example are hard, but doable, a proper handling of compiler plugins seems impossible. So I am wondering how the effects of a compiler plugin can be understood without the compiler itself.

It’s interesting that code generation tool (which literally dumps source code into some directory which is then fed to the compiler) would work with IDE out of the box. What about designing compiler plugins in such a way that the result of their work is representable as a Rust source code? I guess this may have some nasty interaction with hygiene and mir level plugins?

It’s also interesting that “custom derive” plugins should not be a problem for the IDE, because you know the effect of a plugin (trait X is implemented for type Y). So another approach would be to augment plugins with some kind of metadata about their effects.

Another tool which may prefer code generation to arbitrary plugins is a debugger. If you are debugging an error which manifests itself amid the generated code, what source line should you see in the debugger?


#2

Is there a specific reason for this? RLS’ purpose is to avoid duplication of implementation logic and allow IDEs to use to full selection of semantic information and native compiler performance.

I suspect you may be able to get away with a slightly simpler implementation by ignoring lifetimes completely (which means you won’t be able to give any sort of lifetime-related diagnostics), but it’s still an enormous amount of work.

Rust is more complicated than all the other languages supported by IntelliJ combined, if you include the trait system, so I’m not entirely sure how wise it would be to repeat the same recipe.


#3

In some sense, IntelliJ platform is a framework for building RLS-like tools. I don’t know which one is more work: to implement compiler frontend in IntelliJ, or to implement incrementality, indices and ability to work with broken code in rustc. And it makes sense just to try different approaches to IDEs :slightly_smiling:

Rust is more complicated than all the other languages supported by IntelliJ combined

Hm, I’m not sure that this is true. I don’t think that Rust type system (especially without lifetimes) is more complex than that of Scala (I may be wrong of course). And even Java is sometimes (subtyping, use-site variance, overloading) more complex than Rust.

I totally agree that this is an enormous amount of work :slight_smile:


#4

And what other tools, besides debuggers and IDEs, can be impaired by heavy usage of plugins? I can think of proving correctness/security audit.


#5

The fundamental problem with using source text as an intermediate representation is that you lose hygiene (as you note), which is one of the prime benefits of the Rust macro system. I don’t think there is any way to preserve hygiene in a source text representation (you could rename of course, but then you must ensure the source text is not changed by the user).

On the other hand if the tools are smart enough, then working with macros should really be no problem. They are simply a program transformation, just like the rest of the compiler transforms source code to binary (to some extent debug info can just take this into account, just like the rest of compilation). Which is not to say it is easy, but it is totally possible, even with procedural macros (see Dr Racket for an example of a tool with great support for macros).

In the compiler, we have expansion traces embedded in the spans for source code to handle this sort of thing which the compiler and some tools make use of (you could consider this the metadata to describe the effects of the macro, although it is not exported in any form at the moment). Over the winter, a student had a project to make code search work with macros and it works really well (I hope to have a demo soon, unfortunately there are some unrelated memory issues with the tool at the moment).

So, in terms of solving the problem in Intelli J, you have two options: either make the parser macro aware, or interact with the compiler. I prefer the second approach (but I’m obviously biased). The first approach is certainly feasible, but not easy and I can’t think of an 80/20 solution that would get you wins. I think it will only get worse too, because in the future we plan to tie name resolution much more closely to macro expansion. On the other hand, if you do manage to implement it all (and I’m very impressed with what you’ve done so far) then we’d be pretty close to having a second Rust compiler, which would be really great.

We do want incremental compilation anyway (compilation times are a real sore spot for Rust at the moment), and the implementation is well under way. The rest is also hard, but hopefully not too hard!


#6

Hm, my understanding is that to implement procedural macros we will have to implement Rust interpreter in Kotlin, which sounds way more complicated than a Racket interpreter in Racket.


#7

You could compile the macro using the existing Rust compiler, then call it like a library. (I assume Kotlin as something like JNI you can use). That is essentially what the Rust compiler does and should be much easier than implementing an interpreter.


#8

Yeah, “macro as a service” approach should be feasible. You don’t need a large invocation context to expand the macro, right? That is, if somewhere in the file I have foo! (token_tree), will it be sufficient to know the name foo the definition of foo and the text of token_tree to expand the macro? Or will the compiler parse the whole file/crate to do the expansion?


#9

This is a really interesting thread. I want to add a couple of thoughts.

Short term

Code generation is of course a thing now with Rust. After all, it is currently the only stable way to do “compile plugin”-like things at the moment. Typically this is done by (ab)using the build.rs mechanism in cargo. See e.g. LALRPOP and so forth. Unfortunately, the user experience here is somewhat lacking. In order to move more users off of nightly NOW, there’s been talk of trying to make a few minor tweaks to address the above shortcomings.

Some things I’ve heard kicked about:

  • extend compiler with ability to have multiple source paths
    • right now, build products must either go in the source directory (bad) or be referenced via include! (worse)
  • extend compiler with some way to make spans in generated code to original input
    • presumably we would not support full stack traces though (but maybe?)
  • maybe extend cargo with some kind of first-class “preprocessor dependency”
    • then people don’t have to write a build.rs file
    • but also we can compile the code optimized
    • this is maybe just something I want :slight_smile:

I think there are no plans to permit hygiene to be emulated. The importance of this is somewhat unclear; for many code preprocessing tasks, some kind of “gensym-like” capability is sufficient, and it can be achieved in practice by prepending __ to names (LALRPOP for example prepends as many _ as it needs to make a unique string that does not appear in the input; this is not perfect, but also works pretty OK in practice).

Medium term

Of course, longer term, I think we would all like stable compiler plugins and @nrc has been hard at work on spelling out just what that means. One of the questions that has been raised actually is how (stable) compiler plugins should interface with the compiler. Today’s “interface” (expose all internal data structures) is clearly untenable. Some kind of more limited interface seems obviously preferable.

One of the things that I was considering was whether we could actually make the interface operate purely over IPC – that is, rather than linking compiler code as a dynamic library, we would run it as a process. This has some advantages for the compiler (ABI is a non-issue, better isolation), but I would think it would be way nicer for integration with IntelliJ (and other IDEs). It might be slower though. This is unclear to me. Thoughts?


#10

er, meant to cc @alexcrichton and @erickt


#11

I think there is a third possibility of using #[path=] attribute, if a build product is a module.


#12

The advantage of using build.rs right now is that build.rs’s primary goal is not to generate code. Once plugins are stable, build.rs files will continue to be used for other purposes.

Whereas if the compiler gets some sort of first-class support for code generation, this mechanism will likely become obsolete in the future.


#13

The way DrRacket provides excellent tool support in the presence of macros is by using the compiler itself, not by re-implementing anything. In other words, it uses the RLS approach, and I think it would be totally infeasible to build a high-quality IDE for Racket any other way (this is also how the various Racket emacs modes work).

Presumably IntelliJ already solves this problem for Scala, which has procedural macros.


#14

I’m not really familiar with Scala macros and their handling in intelliJ plugin, but from what I know, Scala macros are much more IDE friendly then Rust ones. In Scala macro looks exactly like a function call, it accepts typed expressions and returns a typed result. So you don’t need anything special to support macro invocations.


#15

For Scala, you don’t need to do something special from a parsing side just to find the end of the expression, but if you want to correctly rename a variable, for example, you’d need to handle the macros.


#16

The same is true for Rust, due to token trees approach. However Scala’s approach allows to run type inference without expanding macros.

Hm, if scala macro looks like a function call, then I can just rename variables inside a macro call the same way as for function call, right? Note that I am talking about macro usage here, not about macro definition.

For the reference, here is a blog post describing how IDEA supports Scala macros: https://blog.jetbrains.com/scala/2015/10/14/intellij-api-to-build-scala-macros-support/. Basically, each macro is special cased, but it is relatively easy to add new special cases.


#17

Maybe unstable (“nightly”) compiler plugins should see entire compiler internal interface forever, but stable compiler plugins should see that limited interface.

After introducing the first stable compiler interface, only few compiler pluigins could be marked stable, but, like in the rest of Rust, as time goes, more and more (but not all) of compiler API be turned stable and more and more compiler plugins becoming stable.

Some super-tricky compiler plugins might be ever-unstable.


#18

For the reference, here is a blog post describing how IDEA supports Scala macros: https://blog.jetbrains.com/scala/2015/10/14/intellij-api-to-build-scala-macros-support/2. Basically, each macro is special cased, but it is relatively easy to add new special cases.

That suggests that macros need to be handeled as if they’re core language forms, but with an easy way to write them. Which makes it seem like they pose the same problem in Scala as in Rust.