Pre-RFC proc-macros with side effect

Neutron3529 · June 13, 2024, 7:25am

Summary

Allow proc-macros have some side-effect. Side effect includes logging files and modifying (static) variables which may keep across the full compile procedure.

It is worth mention that, allowing side-effect does not means move proc-macros out of sandbox. With proper design, we could allow sandboxing the proc-macro, with its side-effect remains.

Motivation

Register function

There is a need that, some function defined in different locations, must be registered together. For example, R needs a R_init_packageName function to initialize the callable functions. There are currently 3 ways to achieve that, but only the most ugly one is acceptable.

DRY (Don't Repeat Yourself)

According to the doc, Don't Repeat Yourself is one of the reason to write macros. If we neglect this rule, the register function could be simplified to, just repeat the function again:

use extendr_api::prelude::*;

/// Return string `"Hello world!"` to R.
/// @export
#[extendr]
fn hello_world() -> &'static str {
    "Hello world!"
}

// Macro to generate exports.
// This ensures exported functions are registered with R.
// See corresponding C code in `entrypoint.c`.
extendr_module! {
    mod rext;
    fn hello_world;
}

As you can see, the fn hello_world is repeated twice. and thanks to this feature, rextendr won't dealing the register function.

Considering that, the reason writting a macro is that, we do not want to repeat, and we may easily forget register the new function manually, thus extendr's way might not a good enough choice.

Safety

Another R plugin crate, savvy, using a different rule, instead writting macro manually, savvy choosing a new tool, savvy-cli, to write wrappers directly.

savvy-cli update

It is much simpler than rextendr, but it also violate a rule, that is safety. Since there are some plans that sandboxing Rust's proc-macros and build scripts, the proc-macros and build scripts could be regarded more safer than any other executables (although currently not).

After the sandbox is ready, the cli method should be regarded as unsafe method (again, although currently not).

Some dirty attempt

Actually, make a register function with proc-macro is possible, but the proc-macro's code could be very dirty:

Let us started with the documented example:

use proc_macro::TokenStream;
#[proc_macro_attribute]
pub fn show_streams(attr: TokenStream, item: TokenStream) -> TokenStream {
    println!("attr: \"{}\"", attr.to_string());
    println!("item: \"{}\"", item.to_string());
    item
}

With this macro, such program complies:

#![feature(custom_inner_attributes, prelude_import)]
#![tests_pm::show_streams]
fn main() {
    println!("Hello, world!");
}

and the proc-macro yields:

attr: ""
item: "#![feature(custom_inner_attributes, prelude_import)] #[prelude_import] use
std::prelude::rust_2021::*; #[macro_use] extern crate std; fn main()
{ println!("Hello, world!"); }"

Using such proc macro as an inner-attribute, with a small parser, Rust has the ability to write a register function, but the cost is, the proc macro must read all the codes, and the parser must have the ability to expand macros in order to prevent the missingness of macro generated functions.

Since it is possible to write macros that grab all the calls, and there exists the need that grab all calls together, there is no reason disallowing visit or modify a static (global) variable.

IO

The most important thing is IO, some people think IO should be banned since proc-macro should do nothing but permute symbols. But such opinion ignores some special conditions.

In case the output non-binary file is really needed (for example, when writting a plugin for another language, we should write doc for the plugin user, rather than Rust user, thus we cannot just send rust doc directly and say look, that's the doc.).

Still, we could write an inner-attribute macro, makes a pseudo doctest target that just prints all the documents for FFI, then tell user running cargo doc to generate the FFI documents, but that's very direy, and we need a cleaner way to write such documents.

Explanation

Since currently, the proc-macro will only started and exit once while compiling a crate, this RFC is mainly for documenting its behavior.

Currently, macros are expanded in order even with a std::thread::sleep_ms(1000):

use tests_pm::*;
#[show_streams] // first
fn foo(){}
#[show_streams] // second
mod foo{
  #[tests_pm::show_streams] // third, even it is in another file.
  fn foo(){}
}
#[show_streams] // forth
fn main(){}

Since the program only opened up once, the static variable is reliable to store things, and since the expanding order is fixed, there is not so much ambiguous.

Drawbacks

proc-macro in parallel is disallowed since we need the macros expand in order.
it makes proc-macro harder to debug, since each time a proc-macro is called, it might modify the global status and make the function harder to trace (although using a global inner attribute instead has even more problems.)

Rationale and alternatives

As we discussed above, writting a proc-macro and use it as an inner attribute is the solution which could be infered through documented features.

Although proc-macro cannot output things by self, it could easily create an output binary executable, which output all the things proc-macro want to yield.

Both things are (not so well) documented, but will increase the complexity of code.

As for current and pending alternatives

Idea: stateful procedural macros Although this macro seems more flexible, such macro needs lots of implementations.
Global Registration (a kind of pre-rfc) This suggests define a opaque type which could be read at runtime. With side effect of proc-macros, we could easily construct such a slice.

Prior art

As this RFC mentioned above, for writting R plugins, there are 2 crates, rextendr uses export_modules! macro, forcing users export the function again, savvy do not need this, but instead, it needs to run a cli command to generate a C wrapper.

Unresolved questions

a) Should we adding additional grammar about such macro with side effect?

A reasonable choice is adding a ! to attributes:

#[with_side_effect!(...)]item

The item could be {} if unnecessary.

b) Is it possible to make macros expanded into source file directly?

#[attr!] // expand first, write to foo.rs
foo
mod foo; // use the generated source directly

Seems a real chaos, but might helpful debugging the macro.

Future possibilities

There could be an additional config (maybe in Cargo.toml) to decided whether allowing the proc-macro have side effects. Such support might be done after disable a feature is allowed.

SkiFire13 · June 13, 2024, 8:17am

Some alternatives that weren't mentioned:

Idea: stateful procedural macros doesn't cover the IO part, but even then it already had major drawbacks;
Global Registration (a kind of pre-rfc) for some sort of registration that can be queried at runtime, though it won't work if you have to do additional processing at compile time;

The expanding order could change at any time, I don't think it would be wise to document it. If anything it would be nice to force proc-macros not to depend on it. For example an idea that was floating around in Idea: stateful procedural macros was to have each proc macro produce a payload that would then be reduced at the end with some sort of associative operation.

This also ignores the need for caching the result of deterministic proc-macros, which are the vast majority and IMO should be the default behavior. Note that this has been showed to be highly beneficial for compile times.

Neutron3529 · June 13, 2024, 12:09pm

Thank you for your reply:)

I'll modify my RFC later.

The main drawbacks are that, it needs lots of implementations. As for my RFC, the only need is the document.

My RFC could be regarded as one of its implememtation at crate-level... Maybe I can mark this RFC as a temporary RFC, which will works for several editions, and waiting for the final proc-macro.

The document could change in future edition. The cost is at most 3 years. It might be a problem, but not a large problem.

What's more, if generate a "global static variable" is the need, you cannot stop users write a big macro that just read all the source.

Firstly, there might be no time schedule for such feature to be implemented. Even it has one, using the edition is enough. This RFC could be regarded as a temporary RFC, which works for several edition, waiting for the final implememtation of proc-macros.

In my RFC, there is an alternative: manage the cache directly, the output and the state by macro writter themselves, with a very complex and confused option, allowing macros expanded into source file directly:

#[attr!] // expand first, write to foo.rs
foo
mod foo; // use the generated source directly

Only the first time, expand the macro is needed. Later, the macro could use the source file directly, and the macro itself could only focus on manage state changes.

We cannot write code for every possible situations, but we can just enable the ability that a proc macros can write source files (for example, make them have the suffix -macro.rs), and let the macro decide whether to overwrite the existing output source. Since it is the expanding, rather than calling macro, cost CPU time, writting source file could be regarded as a caching method.

SkiFire13 · June 13, 2024, 5:25pm

Macros cannot serialize their output because they don't have direct access to spans (which are required to preserve hygiene).

Neutron3529 · June 14, 2024, 2:52am

Although they cannot touch span, they could write to -macro.rs file directly.

and with this structure:

// before expand
#[foo]
item;
// after expand
mod item_macro; // expand item directly into `item-macro.rs`
use item_macro::*;
pub use item_macro::{needed items};

There is no need to access spans directly.

As for hygiene, maybe we should have a better solution. Currently, we can make a struct have two fields with the same name using hygiene.

Or, we could just add a new method .new_ident(TokenStream,"name") , to create a new ident that makes no conflict with TokenStream. Since we are talking about hygiene, the name of ident is not important at all, and thus we could just modify its name to obtain a expand-safe hygiene identifier.

binarycat · June 14, 2024, 6:31am

one thing to note: non-macro metaprogramming (eg. bindgen) is often much nicer to compile times, since they produce files that can be checked into version control instead of being regenerated every single time.

Neutron3529 · June 14, 2024, 6:37am

It depends on who wrote the FFI interface. If you write a Rust program that uses library-provided C FFIs, bindgen is more attractive. But, if you write a Rust plugin that provides C FFIs, since the FFI interface may change, it is possible that you forget regenerate the bindgen after an FFI change.

djc · June 14, 2024, 7:01am

A number of crates use an integration test to generate code in a way that makes the test fail if the generated code changes. This is a pretty nice approach. Example that generates gRPC bindings with prost:

github.com

open-telemetry/opentelemetry-rust-contrib/blob/main/opentelemetry-stackdriver/tests/generate.rs

use std::collections::HashMap;
use std::ffi::OsStr;
use std::fs;
use std::path::PathBuf;
use std::process::Command;

use futures_util::stream::FuturesUnordered;
use futures_util::stream::StreamExt;
use walkdir::WalkDir;

/// Download the latest protobuf schemas from the Google APIs GitHub repository.
///
/// This test is ignored by default, but can be run with `cargo test sync_schemas -- --ignored`.
#[tokio::test]
#[ignore]
async fn sync_schemas() {
    let client = reqwest::Client::new();
    let cache = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("proto/google");
    let schemas = PREREQUISITE_SCHEMAS
        .iter()

This file has been truncated. show original

SkiFire13 · June 14, 2024, 7:31am

And that's fine because the hygiene is different, so the names are different too.

Macros will still be callable from older editions where this is possible to do, so they will always have to support this.

This won't work when the same-name-different-hygiene identifiers are given to the proc macro as input.

And even if you want to add something like that, you may as well not pass "name" since you want an identifier different from any other.

Neutron3529 · June 15, 2024, 8:39am

You're correct, and I have a new idea, adding 2 new method in the future:

token_stream.to_hygiene_string(); // maybe that's enough
let hygiene_ident = ident.hygiene(token_stream);

Since such method mainly works for exporting script to the disk, it could be added in the future.

You may make a wrong assumption of my situation. In my case, it is you who decided to write the FFI interfaces. When you change the FFI call, you mean the call should be changed, not I'm crazy, please stop me from doing that. The additional test is unnecessary in my use cases.

system · September 13, 2024, 8:39am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: Sandboxed, deterministic, reproducible, efficient Wasm compilation of proc macros language design	60	19434	April 5, 2024
Const fn + proc macros language design	30	3322	August 23, 2021
Let proc macros pass information to the compiler	40	5768	March 25, 2019
Deterministic isolated proc-macros	27	1763	September 5, 2024
Sandbox build.rs (and possibly proc-macro) by providing a runner as env variable cargo	13	1183	January 25, 2023