Allow proc. macros to query limited contextual information

It might be advantageous to some procedural macros to have limited knowledge about their context. So far a procedural macro can only inspect the AST that it is allowed to operate on without any contextual information.

Limited contextual information, for example the module_path! at which the procedural macro has been invoked, could potentially bring a lot to the table for what a procedural macro can achieve.

In one of my code bases the feature to query the call-site module_path! would solve an ambiguity problem we currently have to work around with "hacks". On an abstract level we try to find unique serializable identifiers to call methods of traits for communication between different end points.

Now imagine we'd have a TraitMethod<const ID: u64> trait with which we can query some useful information about a trait method. (Reflection-like) The problem is that we need to calculate this unique ID: u64 at proc. macro compilation time.

pub trait TraitMethodInfo<const ID: u64> {
    /// Is true if the trait method has a `&mut self` receiver.
    const IS_MUT: bool;
    /// The number of inputs without the `self` receiver.
    const NUM_INPUTS: usize;
}

Now let us consider the following code:

mod foo1 {
    #[derive_trait_method_info]
    pub trait Foo {
        fn foo(&self);
    }
}
mod foo2 {
    #[derive_trait_method_info]
    pub trait Foo {
        fn foo(&self);
    }
}

Note: Using a work around we can indirectly implement traits on trait definitions. (At least in our own use case.)

In this example, both foo1::Foo and foo2::Foo have absolutely the same internal layout. So a procedural macro without this contextual information about their module path would not be able to properly disambiguate them and would end up generating the same non-unique identifiers ID for both foo1::Foo::foo and foo2::Foo::foo trait methods.

This is a problem to us. I happily appreciate other ideas for potential solutions to our problem or a nice discussion whether a feature like this was useful to other Rust projects. If deemed useful I'd be willing to write an RFC.

We could solve this by adding another parameter to procedural macros that is similar to syn::Path in order to provide the proc. macro with this kind of limited context information.

If you read until this point, thank you a lot for your attention!

1 Like

From having been one of the persons interacting the most with the #macros channel in the Rust community Discord, which besides these forums, is the most "official" / centralised place where people discuss about macros, I can attest there is an important need for eager expansion of macros (or, at least, some specific subset of it, such as include…!, env!, module_path!). One very desired use case, for instance, is that of allowing macros to know whence they are called (module path, and file path) –the OP is one such example–, as well as being able to cleanly / properly load certain file contents.

That is, you may consider trying to implement println! yourself without magic compiler help, and you will notice that you can't reproduce the following behavior:

macro_rules! prefixed {() => (
    "{greeting}, {name}!"
)}
println!(my_template!(), greeting = "Hello", name = "World");

// or:

macro_rules! em {( $fmt:expr $(,)? ) => (
    concat!("<em>", $fmt, "</em>")
)}
println!(em!("Hello, {}!"), "World");

As to a possible API for this, I created a PoC a few months ago:

Basically, by featuring a special callback / preprocessor pattern (much like paste! does), it is possible to let macro authors operate off "eagerly expanded macros", without requiring any special language support, besides for that info being available / such callback macros having been written.

So, for instance, in the case of the OP, they could have the attribute expand to:

with_builtin!( let $module_path = module_path!() in {
    their_real_proc_macro!( ($module_path), $trait_def );
});

Or some other syntactic variations / ideas:

  • magic match! macro (to eagerly expand macros):

    match! module_path!() {
        ( $($module_path:tt)* ) => (
            their_real_proc_macro!( ( $($module_path)* ), $trait_def );
        );
    }
    
  • callback-style (CPS):

    with_module_path! {( $($module_path:tt)* ) => (
        their_real_proc_macro!( ( $($module_path)* ), $trait_def );
    )}
    

And so on and so forth. The key idea is that this could be used for things such as the include! family of macros, as well as env!, etc.

  • This would solve many "impurity" problems of macros trying to access the environment or the filesystem and having their expansion be "incorrectly" cached: by using the built-in include…! macros from the language (which can be guarded not to access paths outside the CARGO_MANIFEST_DIR or something along those lines), we get to have a guarded and tracked access to these environmental elements.

    That being said, for the case of env! and include…! specifically, another (non-incompatible!) approach seems to be underway: special proc-macro APIs, such as proc_macro::tracked_env - Rust (no equivalent for the fs, though). But given how unergonomic setting up proc-macros is and will keep being for the foreseeable future, I think that macro_rules!-accessible ought not to be overlooked (that being said, in the same fashion that I've written ::with_builtin_macros, it is not hard to offer macro_rules!-targeted proc-macro helpers, such as paste!).


Regarding the OP: @Robbepop, since your objective is to hash "macro call location information" so as to generate unique ids, it so happens that your actual problem might be XY-ed if you embed the Span::call_site() among the hashed stuff. It's not pretty, but could get the job done while waiting for some of these APIs to be implemented (if they ever are!).

1 Like

Regarding the OP: @Robbepop, since your objective is to hash "macro call location information" so as to generate unique ids, it so happens that your actual problem might be XY-ed if you embed the Span::call_site() among the hashed stuff. It's not pretty, but could get the job done while waiting for some of these APIs to be implemented (if they ever are!).

Thanks for this input! Never thought about this and it probably won't result in generating deterministic output between compilations. However, it is worth a try.

I don't see any Hash implementation for Span or a way to write one though

Yep, unfortunately .. it seems like the only way to hack your way around it is to use its Debug output probably. But even if that works it was not a recommendable option though.

Or this could be solved via the unstable SourceFile in proc_macro2 - Rust API.

Span and SourceFile both seem like they would have issues when your proc-macro invocation is generated, e.g. something like

macro_rules! mkfoo {
  () => {
    #[derive_trait_method_info]
    pub trait Foo {
        fn foo(&self);
    }
  };
}

mod foo1 {
    mkfoo!()
}
mod foo2 {
    mkfoo!();
}

Also, since SourceFile is affected by --remap-path-prefix it can't really be relied on for much of anything except diagnostic prints, it may have no relation to the filesystem and two different files may have the same value.

2 Likes

(Please forgive me, I don't know anything about how macros are expanded in the compiler, so if there are obvious errors in the following it is due to ignorance, not malice).

Are macros expanded in the order that they are encountered exclusively, or is there something like a global completion buffer that contains partially expanded macros? I'm asking because if it's the latter, then as long as there isn't a cycle in dependency graph expansion should eventually complete. This could be exploited to create not just eagerly expanded macros, but to deal with macro expansion in a general way.

I don't know enough about the precise internals of how eager expansion is handled w.r.t. it depending on expanding other code (cc @Aaron1011?), but I can tell you that it does have its limits.

For instance, consider:

mod foo {
    macro_rules! mk {() => (
        #[macro_export]
        macro_rules! m {() => ( "Ferris" )}
    )}
    mk!();
    pub use m;
}

fn main ()
{
    let _ = crate::foo::m!(); // OK
    let _ = concat!("Hello ", crate::foo::m!()); // Error
}
  1. mk!() gets defined and called;

  2. m!() is defined and located both at the absolute path of the crate's root crate::m, as well as in a syntactic-limited scope after its definition (the call to mk!()) and until the end of the module foo.

  3. The "latter" m! is publicly re-exported so that it gets attached to the crate::foo::m path as well.

  4. We can see in the OK line, that this does work;

And yet, concat!, which is one of the built-in macros which eagerly expands the macros it is given, is unable to eagerly expand crate::foo::m!.

More generally, even just using crate::m!(), outside of any eager expansion considerations whatsoever, causes a "future-compatibility-lint", i.e., it shall eventually become a hard error.


So full-proof eager expansion shall, for sure, be tricky. But assuming the "typical" use case of macro invocations not defining new macros (nor use statements regarding them), it does seem that those should not be that hard to support, and would already allow the very vast majority of use cases :slightly_smiling_face:

@dhm, you've just blown my mind!

I'll outline what I had been thinking of over the past few days, but now that I see your example, I strongly suspect that it won't work. NOTE! This isn't a complete idea, it's the start of an idea which may or may not be able to be completed by someone other than me (that is, someone with a deep understanding of the macro expansion system).

  1. We treat macro expansion kind of like futures. Each time you encounter a new macro to expand, it's pushed to the tail of a FIFO queue.
  2. Pop macros from head, and try to expand them. If you are able to, great, if not, push them back on the tail of the queue.
  3. Repeat until the queue is empty, or 10 minutes, which ever comes first.
  4. If the queue is empty, then you've fully expanded that macro. Reset the timer and keep on compiling.

The idea is that we don't have to solve the Halting Problem (10 minute limit), nor do we need to figure out a-priori what the dependencies between macros are. As macros complete, they affect some state somewhere, which other macros then reference, which means they can make their own progress towards completion.

So why am I concerned that this might not work? If I understand what @dhm just pointed out correctly, then there is a chance that the scoping rules will cause some kind of conflict that I'm not currently educated well enough to know & predict. That is, concat!() is known to eagerly expand its contents. If this knowledge is now stable, then my expansion method could be a breaking change, solely because it may permit concat!() to complete when it can't currently. So, is the order of macro expansion something that is guaranteed stable? Or can we mess with it?

Other thoughts

BTW, as I suspect that all of you can see that my 10 minute rule was just made up. If something like this was adopted, I'd expect flags that would let you specify how long you have to wait, with some useful aliases to give estimates on how long you'll have to wait for the compile to complete. E.g., -Ztime_to_brew_coffee, -Ztime_to_go_to_lunch, -Ztime_until_heat_death_of_the_universe, -Ztime_until_the_meeting_is_over.

Also, I'm assuming that crate authors are, in general, nice people that want their stuff to compile quickly and be used by everyone. It should be fairly obvious that with very little effort a malicious author could make your compiles last an arbitrary period of time. I'm not trying to defend against that, just against the accidental buggy macro that takes longer than it should.

Macro expansion is already a fix-point operation as macros can expand to other macros or use items importing macros. To prevent an infinite loop at the end of every loop it is checked if at least one macro expansion succeeded. If not you will get an error for each not yet expanded macro.

I think the problem with concat!() is that there is currently no way for expansion of a single macro to say that it needs to be retried at later time when the macro does resolve and thus gets expanded.

4 Likes

What happens if a macro does something ridiculous, like flipping a boolean between true and false each time it's expanded? Or is that taken care of by stack overflow?

(Genuinely curious, what I know about the subsystem can be written on one side of a sticky note, in crayon, and leave room left over, so I'm learning from you guys as you answer my questions!)

It's handled by the macro recursion limit, which is a (really small by default) simple check on macro expansion depth, roughly analogous to a stack overflow in imperative code.

macro_rules! silly {
    (true) => (silly!{false});
    (false) => (silly!{true});
}

silly!(true);
error: recursion limit reached while expanding `silly!`
 --> src/lib.rs:3:17
  |
3 |     (false) => (silly!{true});
  |                 ^^^^^^^^^^^^
...
6 | silly!(true);
  | ------------- in this macro invocation
  |
  = help: consider adding a `#![recursion_limit="256"]` attribute to your crate (`playground`)
  = note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)

The reason that "eager expansion" doesn't work is that the macro just sees the input tokens (with some built-in exceptions that are processed in a later pass(?)) and emits output tokens. Providing eager expansion to proc macros would "just" be adding the ability for the proc macro to request the server to expand a macro with some span information, and then halting and resuming the proc macro invocation so that the new macro expansion can be processed by the fixpoint expansion.

I believe that the last time I ran into this it was mentioned that the safeguard was supposed to be specifically against the use of the crate::m! path, and that using #[macro_use]d m! or a pub use of the m! without the absolute path is supposed to work. Given the interaction with concat!, though, I'm unsure about that, now.

1 Like

OK, just so I'm 100% clear on this, this is similar to what I was saying with the futures idea, correct? Or am I completely wrong?