Proposal: Borrow #run idea from JAI

ckaran · January 17, 2019, 3:53pm

I’m moving this discussion from github to here at the suggestion of @centril.

I’ve been thinking about how macros work in rust, and realized that what we’re after is ‘rust within rust’. derive macros attempt this, but are limited because they can only provide access to the token stream of the thing that they are wrapping; if you need to know about the crate your macro is executing within, or need to embed external files (e.g., your externally generated lookup table), then things get messier. So when I read about JAI’s #run idea, I got pretty excited. Combined with @Zoxc’s compiler interface ideas, I think that we can make ‘rust within rust’ a reality. Here is what I’m thinking (BTW, my ideas have shifted a little from what I wrote on github, so please read this too):

Following what @Zoxc suggested, make a crate available for the compiler itself. The crate ships with the compiler (you can’t get the crate on https://crates.io/, although there should probably be an entry there so that everyone knows that it is a part of rustc), and is available via rustup doc. Just like any other crate, you can use using rustc_interface::prelude::*, and everything Just Works™. An important part of this is that the crate needs to be available to any code, not just code that is running within a macro when the compiler calls it. This will make it easier to test out potentially complicated ‘macro’ code separately, and only use it when it is ready for prime-time. The create also defines a public ‘Macro’ trait that any code that wants to be called by the compiler needs to implement. This defines one function, which would have a callback similar to fn macro_callback(compiler_state_stack: &mut Vec<&mut CompilerState>); This is the entry point for macros being called by the compiler at compile time.

With that in place, the compiler can have a #run keyword implemented. When it encounters #run, it does the following:

Makes sure that its state information is accurately reflected in the CompilerState object that it will be passing to the callback.
Pauses the current compilation, pushing the CompilerState onto a stack.
Creates a new CompilerState object that is used by a new instance of the compiler to compile the #run block. If there are multiple #run blocks nested within one another, you keep repeating these steps until you reach the inner-most block.
Once the inner-most block is reached, it has the compiler state information for each compiler instantiation up to that point.
Because the top-most CompilerState on the stack corresponds to the inner-most run block, we know that there aren’t any more #run blocks to process. We can complete the compilation of that CompilerState block as a library object. We keep the top-most CompilerState instance after popping it off the stack.
The compiler now starts working its way down the stack. The top-most CompilerState instance is adopted as the current state, and the rest of the stack is fed to the newly compiled library’s, macro_callback() instance.
Once the stack is empty, the last CompilerState instance is adopted as the new state, and compilation continues.

Advantages of this approach

Like derive macros, you don’t have to learn a new macro language; it’s just rust all the way down.
You have the full power of the standard library; you can access the filesystem, you can even have a macro that pops up a GUI if you need to (possibly via WASM and a call to an external browser).
You have access to the compiler’s complete state, not just the little bit that the macro was involved with.
With some work, it should be possible to get a debugger to step through the #run code. That way, you can see where your macro failed.
Speaking of debuggers, they could be macros as well, able to inspect and mutate the CompilerState objects as needed. This may be handy when debugging the compiler itself.
Others?

Disadvantages

Security. Compilation would be equivalent to running untrusted code on your machine. This isn’t that big an issue because the target audience are developers, who probably know more than the average person about how to get into and get out of trouble. There is a chance that someone will write a macro that, when compiled, churns out a virus/worm that promptly tries to infect everything around, but people already have that ability, and getting someone else to download and compile your crate the very first time is more complicated than just writing an ordinary worm.
Compilation time. Although the API as suggested is a pure-ish function (you could make it pure by having it return the vector, instead of mutating it in-place), you still have a stack of compilation happening, which will happen each and every time a #run statement is encountered. By being careful and making the API a truly purely functional API, we should be able to cache compile steps and reuse them, but that would require a great deal more thought. Again, this isn’t that big a deal because you can already hurt yourself with macros.
Others?

Any thoughts/comments would be appreciated.

scottmcm · January 17, 2019, 4:17pm

This description is rather mechanism-focused. Can you elaborate on the kinds of problems you want to solve with this, and why build scripts and existing macros are unacceptable for those cases?

Swoorup · January 17, 2019, 4:38pm

I don’t really see the need to run a script during code compilation. If it is required it should be via a seperate script. I am fine with pure limited functions

ckaran · January 17, 2019, 4:42pm

First, I want to clean up the language. Right now there are both procedural and declarative macros, which to me is one macro system too many. I know that removing anything is forbidden now that rust has reached 1.0, but I figure if python can make the switch from 2.x -> 3.x, then rust will eventually be able to switch from 1.x to 2.x, cleaning up anything has been discovered along the way, and this could be part of that. But before it can be a part of that, we need to really hammer out the details as it could have a major impact on both the language and the compiler.

Second, I want to introduce more power at compile time. Although the API I proposed doesn’t have the power to do this, I can imagine a few small tweaks that make it easy to integrate rust into an IDE. That is, you write some code in your IDE, it hands the code to the rust compiler, and the compiler calls the IDE’s callback handler with a CompilerState reference that the IDE can then use to update what it is showing the user.

Third, this subsumes the old compiler plugin API. Since macros have access to the complete state of the compiler (which includes information about the code being compiled), plugins become #run statements for crates that can be maintained on https://crates.io, which you can now include in your Cargo.toml file. Each time you compile, your whole tool chain will be updated automatically. This also means that the compiler itself becomes more modular as individual tools are externalized.

AFAIK, build scripts can’t handle the incremental build system that IDEs need, and existing macros don’t have enough information to handle the issues above. I hope all that makes sense.

skysch · January 17, 2019, 5:16pm

This is a lot of complexity, and you're glossing over a lot of the code that would need to be written to make it work, and user code that would be needed to make it do anything new. And that code will have bugs and caveats that will certainly degrade the user experience. Overall, this looks neat, but I don't see how it will actually allow people to build better programs without an amount of work that is comparable to what they already have to do to build better programs.

In particular, you're making a passing mention of important points and brushing them aside:

Hurting yourself with macros is a problem. It needs to be fixed, not ignored or made worse. You've come to the wrong conclusion, IMO.

If developers were magic genies who could solve problems without doing work, you'd have a good point. But developers are not: they can only solve problems if they understand them. Making things more complex does not simplify the problem. Similarly, you've reached the wrong conclusion.

ckaran · January 17, 2019, 6:07pm

I agree that this is a lot of work and complexity. And I agree that it would be nice to fix the headaches with macros. However, I see it as a power/safety trade-off. The more powerful macros become, the more problems they can cause. But if we remove the power from macros to make them safer, then they also become less useful. C’s macro system is relatively safe (from a compilation point of view). You can write a macro that eats up all your memory just like you can in rust, but you can’t write macros that can do arbitrary things like what I’m proposing. Personally, I’d rather have the power, but that becomes yet another design choice in the language.

skysch · January 17, 2019, 6:29pm

That power already exists outside of the rust language. You can use any program to generate arbitrary Rust code and compile it, including ones written in rust. The question is what is added by making this happen within the language during a single run of rustc if you intend to invoke a stack of compilers anyway…

notriddle · January 17, 2019, 6:42pm

Please spell out what more about what you actually want to show to the user, and less of how you want to implement it

I'm not sure how all this stuff is supposed to look to the user. Here's what I found in the Jai primer that you linked, that seems related:

the original article:

In Jai, the same task looks like this:

generate_linear_srgb :: () -> [] float {
     srgb_table: float[SRGB_TABLE_SIZE];
     for srgb_table {
         << it = real_linear_to_srgb(cast(float)it_index / SRGB_TABLE_SIZE)
     }
     return srgb_table;
}

srgb_table: [] float = #run generate_linear_srgb(); // #run invokes the compile time execution

real_linear_to_srgb :: (f: float) -> float {
    table_index := cast(int)(f * SRGB_TABLE_SIZE);
    return srgb_table[table_index];
}

That sounds like const fn, except that the restrictions have been lifted. Rust's const fn needs to have those restrictions in order to make sure that the type system remains sound, since the compiler needs to be able to prove whether two types [T; 32] and [T; my_const_fn()] are the same type or not. Because Jai's type system doesn't actually promise to prove anything in particular (they "trust the programmer"), it doesn't need this kind of restriction. #run directives don't have any special APIs in Jai, assuming I'm correctly understanding the primer. They are just ordinary Jai code. The only reason they can introspect types is because Jai has ubiquitous RTTI/Reflection.

However, you're describing a CompilerState struct and a Macro trait. The Jai primer describes nothing like that; that's much closer to the kind of weird, special-case code that procedural macros use. How are they supposed to be exposed to the #run-ing code? If #run only works with code that is specially prepared for running at compile time, then how is it any different from procedural macros? Procedural macros do have access to the whole standard library, including file I/O.

Trying to imagine a trade-off between procedural macros and const functions, your trade-offs seem wrong, too:

The fact that macro rules are a new macro language is actually really nice when you're writing simple stuff. It makes it a lot more terse, like regexes vs. manually parsing with bytes.

Can't procedural macros in 2018 already do that? I know they can already pull in other libraries, like syn and quote...

Do you mean "explicitly accessed through a CompilerState" object, like proc macros, or "implicitly accessed by calling types and functions", like what the Jai primer described? This is where my confusion about the whole post comes from; what will all this look like from a user's perspective?

That can be added to existing proc macros. That's not actually an advantage of any new metaprogramming API; it's a feature that can, and should, be added to rustc for what's already there.

How much API surface are you asking to expose here? Because this sounds like a long, long, long series of RFCs.

Procedural macros and Cargo build scripts already run arbitrary code on the build machine with no sandboxing.

Just for clarity, we don't just want purity. It is also important that any methods of hooking into the compiler should be ordering-agnostic. let output_source = run_1(run_2(input_source)) and let output_source = run_2(run_1(input_source)) aren't necessarily the same result.

ckaran · January 17, 2019, 6:42pm

The same power as you get with macros right now; you get information about where the macro is being called and what it’s being called upon, which lets you make decisions that you can’t make from an external program. This includes behavior that might be dependent on the type of the object that it is being called upon, like anything that requires knowledge of traits. Calling an external program and then manually copying the results in is error prone. The alternative is to add the tool to your build system, in which case you need to make sure that the tool, the build system, and your actual code all stay up to date. We already have a good build system that makes sure that we’re up to date; it’s Cargo. Making #run a part of the language takes advantage of what Cargo offers.

skysch · January 17, 2019, 7:04pm

As far as I understand it, cargo doesn't offer any of these benefits when you rely on unstable compiler internals.

ckaran · January 17, 2019, 7:15pm

I'm sorry, I didn't mean that I wanted an exact duplicate of JAI's functionality; you're right that it wouldn't be possible without breaking rust's guarantees (which would defeat the whole point of rust, IMHO). I meant that the arbitrary code execution would be neat. Which brings up your second point.

I completely missed the fact that the procedural macros in the 2018 edition allow executing arbitrary code. I'm going to have to go back and play around with that to see what the actual compile-time limits are; I suspect that about 95% of my proposal is already covered if they really are able to execute arbitrary code.

The former; I don't like implicit access, it reminds me of OpenGL's state machine model which always irritated me.

Good! In that case, 2018 may already be at the 98%-99% mark of my proposal

...and now we're back down to 1% I actually don't know how much API would need to be exposed. That's why I'm thinking that this proposal should aim towards rust 2.0. Before it could really be implemented, rustc's internals would need to really stabilize, which I expect is going to take a long time.

Agreed, but my assumption is that the #run statements (actually, I need to stop calling them that now that I know that procedural macros can execute arbitrary code at compile time) will be executed in the same order as they would be encountered in while compiling the code. Of course, this only works if there is a known total ordering over all compilation, which may not be possible. It would certainly hamper any efforts to parallelize the comipler's internals.

Hrmmm.... OK, I can see that this is a poorly thought-out proposal. It's too ambiguous, and doesn't add enough to what rust already offers in the 2018 edition. The only thing about it that I still like is the ability to query the compiler about its state in a uniform manner. If you can iterate a TokenStream outside of the block that it is acting over, then you'll get most of the same effect. It may still be useful to have immutable shared references to some CompilerState instance, so that IDEs can get extra information, but the mutation part would be very, very difficult to implement in a correct manner.

How do I yank a proposal, or at least mark it as a bad idea? I want to keep a record of this in case anyone else comes up with the same idea, but there's not much point in continuing with it.

notriddle · January 17, 2019, 7:28pm

Well, I’ve marked it as auto-closing tomorrow. Speak now, or forever hold your peace!

ckaran · January 17, 2019, 7:41pm

Thank you!

scottmcm · January 17, 2019, 8:48pm

I'd just like to say that I think this procedural development is awesome as a way for people on the forum to be able to focus their attention in the most valuable places

notriddle · January 17, 2019, 9:47pm

2 posts were split to a new topic: Is executing arbitrary code at build time a good idea?

notriddle · January 19, 2019, 1:00am

This topic was automatically closed after 29 hours. New replies are no longer allowed.

Topic		Replies	Views
Procedural Macros book documentation	17	3589	April 7, 2019
Communicating with the compiler	1	449	July 14, 2022
CharStream macros language design	7	1035	July 30, 2020
Partial Borrows and macros (wut?) language design	6	882	March 25, 2019
RFC: macro functions language design	58	3839	March 15, 2023

Proposal: Borrow #run idea from JAI

Advantages of this approach

Disadvantages

Please spell out what more about what you actually want to show to the user, and less of how you want to implement it

Related topics