I was looking into the compiler API for rustc_drivers and I saw that there exists a MutVisitor trait one can implement to mutate different language items/expressions when working with the AST. This trait also exits for the MIR. However, it does not seem to exist for the HIR (only Visitor exists). My first question is why is this the case. I looked at previous posts and they all mention that the HIR is immutable but I don't understand why it needs to be that way. Finally, assuming there is a principled reason the HIR is immutable. Would it be possible to copy the HIR perform some modifications on the copy and pass it down for further compilation? If this is not possible it would also be good to understand why. Thank you very much in advance.
Could you expand a bit in this answer I am new to working with the compiler.
My understanding is that arena allocated means that the whole HIR is in the same region of memory sharing the same lifetime. My guess as to why the HIR is NOT clonable in the arena is because the compiler uses pointers to for type comparisons (so if we clone same types would have different identifiers). Given that it has interior shared references I understand now also why it is not mutable.
I do not understand what you meant by "Mutating one thing requires cloning everything."
I wanted to create a copy of the HIR (maybe to a new arena) that I could modify or maybe just build my own based on the original copy and compile that instead. Is this just not possible?
A similar issue came up recently while applying rewrites to a program that required type information, which the AST alone could not provide.
Since directly mutating the HIR did not work well, the rewrites were kept at the AST level.
Whenever additional information was needed, the AST was lowered to HIR to obtain it.
One important detail is that each AST rewrite may invalidate the HIR, so a fresh lowering is required whenever the AST is modified. That said, parsing (which is not the costly part) can be skipped, as the existing AST can be reused; it implements Clone.
Depending on what you want to do, you might target the MIR, HIR or AST. Most rustc consumers target the MIR, and it's also the only stage to be available in rustc_public for now.
But in fact, while the MIR and the HIR mostly can map back to source code (via spans), they're not intended to that, so implementing a rewriting tool via them is going to be hard. On the other hand from the AST it's hard to obtain information as you noticed.
In general rustc is just not planned for rewriting tools, as it's not something a compiler usually does (it does some rewriting in diagnostic fixes, but it's very minor). If that's what you want to do, you might want to consider using rust-analyzer instead - while not as accurate as rustc, it is pretty accurate and unlike rustc, it definitely is intended for rewriting source code.
In this case, rewriting the source program was required, so MIR was not a suitable option, as its output does not resemble the original.
While it's clear that rustc is not designed for rewriting tools -- something that becomes apparent when building one -- I'm curious about the use of rust-analyzer in this context.
Can it provide precise type information, or is it more of an approximation that may differ from the compiler's? Additionally, when checking whether a program compiles, is that effectively the ground truth, or may it differ from the compiler?
The compiler, of course, is the source of truth. Specifically when asking whether a program can compile, rust-analyzer is unlikely to give you accurate answers, since its diagnostics are very far and few currently.
Type information (and other things such as macro expansion, path resolution etc.) is pretty accurate, but we do have bugs and things we don't match rustc in. It's gotten considerably better in the last time, and is still improving. We also share some parts of rustc (e.g. the solver) and that helps staying conformant.
Since directly mutating the HIR did not work well, the rewrites were kept at the AST level. Whenever additional information was needed, the AST was lowered to HIR to obtain it.
How were you able to map the information found on the HIR back to the AST to perform your modifications based on it? I am currently using spans but I am not sure if this approach is reliable. Some insight on this would be very helpful. Thank you both.
Yes, that was my intuition as well.
Unfortunately, my rewriting tool often needed to verify that the program indeed compiles, which has pushed me toward using the rustc APIs rather than rust-analyzer.
That said, I would be very interested in improving the ergonomics of these APIs; I imagine that rust-analyzer could also benefit from such improvements.
Do you know anyone who might be interested in this topic or who could mentor me in implementing these changes?
How were you able to map the information found on the HIR back to the AST to perform your modifications based on it? I am currently using spans but I am not sure if this approach is reliable.
Yes, I am using spans.
They also seemed brittle to me, but I haven't run into any issues yet.