RFC: $crate for proc_macro

CAD97 · July 8, 2022, 5:47am

Previously: $crate metavariable for procedural macros?

Summary

Procedural macros often pair with a runtime crate, and need to generate paths to that runtime crate. We add a way for procedural macros to use a $crate-like token to refer to their runtime crate.

Motivation

Procedural macros most often refer to their runtime library crate by assuming that a user of the procedural macro will have an explicit dependency on the library crate and does not rename the crate, allowing the procedural macro to emit extern crate library or use ::library paths. However, this scheme breaks if the runtime crate is renamed in cargo. To combat this, a technique like proc-macro-crate can be used to lookup the crate name from the cargo manifest. However, this still leads to issues when reexporting derives, as then the library crate is not depended on by the derive user's crate at all! The best known solution to this as used by bevy and encase is to provide the procedural macro implementation in an implementation crate which takes a path to the library crate and uses that for the implementation, and anyone who wants to wrap your library providing a copy of your derive with the package name lookup customized to use their library crate as the entry point instead.

Guide-level explanation

Macro users

Basically nothing changes. If macro authors use the new functionality, it will be possible to rename crates which provide macros and reexport them from wrapper crates without running into "crate not found" style errors.

Macro writers

When writing a procedural macro that needs to refer to some types in a runtime library, use a new accepted signature for declaring procedural macros:

#[proc_macro]
pub fn my_function(
    input: TokenStream,
    library_path: TokenStream,
) -> TokenStream {
    /* implementation */
}

#[proc_macro_attribute]
pub fn my_attribute(
    input: TokenStream, 
    annotated_item: TokenStream,
    library_path: TokenStream,
) -> TokenStream {
    /* implementation */
}

#[proc_macro_derive(MyDerive)]
pub fn my_derive(
    annotated_item: TokenStream,
    library_path: TokenStream,
) -> TokenStream {
    /* implementation */
}

This provides a new library_path: TokenStream argument to your procedural macro entry point. library_path contains a sequence of tokens usable as a module path provided by your library crate, typically to a module containing any symbols which the macro expansion needs to refer to.

The tokens provided to library_path make up a path accepted by the declarative macros pattern $(::)? $($path_segment:ident)::+. Splitting library_path into individual tokens and trying to use them in any way except printing them as the library_path stream is not guaranteed to have any particular behavior. (For example, it would be valid for library_path to be a single source-invalid identifier which the compiler recognizes as referring to the chosen library path.) Additionally, the token hygiene/spans must be preserved for the library_path to function.

In your library crate, you reexport your procedural macros as such:

#[macro_library_path(crate::__macro_support)]
pub use library_macros::{my_function, my_attribute, MyDerive};

The path provided to #[macro_library_path] is the path used by library_path. The provided path is required to be an absolute path (that is, start with either crate or a name in the extern prelude), and the path must be externally visible from the crate root. When used in the expansion of the procedural macro, library_path will refer to the provided path and can use any pub item in it, no matter what crate uses the macro, even if the user crate does not have visibility of your library crate.

If a procedural macro is used from the procedural macro crate without specifying #[macro_library_path], it is treated as if they wrote #[macro_library_path(::crate)]. When an item is used from a reexport not from the procedural macro crate, it inherits the #[macro_library_path] unless the use is also a use and provides a new #[macro_library_path]. In particular, when a procedural macro is invoked, it calls the procedural macro server using the #[macro_library_path] provided when the macro name was used, or if the used name does not have #[macro_library_path], where that name was used from, continuing until a #[macro_library_path] is found (or a procedural macro crate is found, in which case the crate of the first non-proc-macro-crate the item is used in is used in instead).

Example

In the procedural macro crate library_macros:

#[proc_macro_derive(Trait)]
pub fn my_derive(
    annotated_item: TokenStream,
    library_path: TokenStream,
) -> TokenStream {
    let DeriveInput { attrs, vis, ident, generics, data } =
        syn::parse_macro_input!(input as syn::DeriveInput);
    let (impl_generics, ty_generics, where_clause) = generics.split_for_impl();
    let expanded = quote! {
        impl #impl_generics #library_path::Trait for #ident #ty_generics #where_clause {
            fn consume(&self, food: #library_path::Food) {
                // just throw it away, they won't know the difference
                ::std::mem::drop(food);
            }
        }
    };
    expanded.into()
}

In the runtime library crate library:

#[macro_library_path(crate)] // inferred if omitted
pub use library_macros::Trait;

pub trait Trait {
    fn consume(&self, food: Food);
}

pub struct Food {
    calorie_count: usize,
}

Reference-level explanation

TODO: explain how this functions in more detail.

Implementation notes:

The library_macros crate is still only compiled a single time for the compiler host platform. The library_path is purely a runtime concept to library_macros.
All procedural macros are always treated as taking a library_path, and a library_path is passed over the procedural macro bridge. (It is for this reason the default #[macro_library_path] is provided, rather than requiring the presence of the attribute; to support old-style macros which don't take library_path.) If the procedural macro is not declared to take the library_path argument, it simply is discarded by the bridge and not provided to the function.

Drawbacks

Additional surface area complicating the procedural macro bridge.
Makes the procedural macro entry points more magic by both
- allowing them to be declared with different airities, and
- providing yet another argument not distinguished by type.

Rationale and alternatives

This fills an obvious need in the ecosystem; people are building workaround which cover most use cases but which require significant manual intervention to set up and still can break in edge cases. Additionally, reading the macro caller's Cargo.toml is not a thing that procedural macros are necessarily guaranteed to be able to do, such as if they were to be sandboxed into wasm without adhoc filesystem access.

While this can be almost completely polyfilled, it requires significant manual work (e.g. providing a new proc macro crate for each new facade) and the extra crates involved negatively impact compile time compared to the build and module system supporting this use case.

Alternatively to providing the runtime library path in an entry argument, we could support the procedural macro outputting a special compound token like $crate which is resolved to refer to the path provided to #[macro_library_path]. However, using literally $crate is likely a bad idea, as procedural macros which emit macro_rules! definitions would like to emit literal $crate for the macro_rules! implementation.

More likely is instead providing an API like Ident::macro_library_path() which returns a compound identifier which resolves to the configured #[macro_library_path], but which cannot be constructed directly. However, this is a pure library addition on top of the bridge support, which can be polyfilled by 3rd party crates and/or added to the standard proc_macro distribution at a later date.

Prior art

In the crates ecosystem:

proc-macro-crate, which offers a reusable way to read Cargo.toml to determine how your library crate can be named
bevy_encase_derive (link to PR), which wraps encase_derive_impl to provide a version of encase_derive which uses bevy_macro_utils to set the path to encase through bevy's facade

Note also that this functionality can and sometimes already is emulated for functionlike procedural macros by exporting a declarative macro wrapper instead, e.g.

#[macro_export]
macro_rules! functionlike {
    ( $($tt:tt)* ) => {
        $crate::__proc_macros::functionlike! {
            #![crate = $crate]
            $($tt)*
        }
    }
}

In other ecosystems:

None known yet.

Unresolved questions

Unknown unknowns.

Future possibilities

Cargo packages that provide both a proc-macro crate and a runtime crate versioned together will naturally be a primary user of this technique. In that case, the default macro library path should likely be the associated library crate, since it is known by the build system.
A global Ident::macro_library_path() (see rationale-and-alternatives).
Supporting more than one path for the library to pass to the procedural macro implementation.

jhpratt · July 8, 2022, 6:17am

I have (had?) an RFC planned for having fn proc_macro::resolve_crate(crate_name: &str) -> Option<proc_macro::Ident> (or Result), which would naturally be a compiler built-in function. Cargo could provide a mapping of crate names to their renamed equivalents, and the function would just use that mapping. This handles the renaming of crates in Cargo.toml with only one or two functions. There should probably be a way to specify a semver version for the case where multiple versions of a crate are present, but that should be easily doable in the same manner.

This would only be half of the RFC I had planned, but it's the only part that would affect proc macros (the other half would be macro_rules!/macro).

Could you elaborate? Are you generating code that relies on a third crate? The need for handling renamed dependencies is clear, but I'm not sure I follow why the proc macro author wouldn't know what the path of the code it's generating should be.

I should probably make a public list of things that I want to write RFCs for. I already have one that's private. Then coordination can occur at a high level rather than duplicating efforts. I'll try to remember to do this this coming weekend.

CAD97 · July 8, 2022, 7:01am

Specifically, it's the case that the bevy crate includes e.g. pub use bytemuck::Pod. So you can write e.g. #[derive(bevy::utils::Pod)] with just a dependency on bevy. The bytemuck derive cannot name the bytemuck crate from the user crate because it (in the extreme) it doesn't even have a path to bytemuck available.

(The paths and such used here are illustrative and not necessarily the ones used in practice.)

Any derive necessarily has this issue even if no crates are renamed; if you pub use a derive, downstream users of your crate cannot use the derive without a direct dependency on the derive's runtime crate.

Would it happen to be something like $mod by any chance?

(And this was really just writing out a small amount of previously discussed concepts together, so don't worry about this being duplicated work. These are primarily here to restart and get a bit more discussion on the concept(s).)

SabrinaJewson · July 8, 2022, 8:43am

As another alternative, Rust could get around to supporting declarative attributes and derives. That way, Bytemuck could write Pod like this (imaginary syntax):

#[macro_rules_derive]
macro_rules! Pod {
    ($($tt:tt)*) => {
        $crate::derive::pod!{ ($crate) $($tt)* }
    };
}

Note this trick can be used today for bang macros. This would have additional benefits elsewhere, since attribute macros no longer need a separate crate. Another way of achieving this suggested by Yandros is to add ::macro_rules_attribute to libcore, although I don’t really like that because it’d make proc ⇔ decl a breaking change.

For the proc implementation, I would oppose adding more parameters to proc macro functions. Instead, I think it would be better for clarity and forward compatibility to put it in a struct:

pub struct AttrInput { /* ... */ }
impl AttrInput {
    pub fn args(&self) -> TokenStream;
    pub fn item(&self) -> TokenStream;
    pub fn library_path(&self) -> TokenStream;
}

Nemo157 · July 8, 2022, 9:00am

An alternative design I've had in the back of my mind for a while is to allow declaring "runtime-dependencies" in the Cargo.toml which give you a token to refer to the crate with, something like:

# Cargo.toml
[runtime-dependencies]
library = "1.0"

// generated via the `runtime-dependencies`
// const ::library: TokenStream;

#[proc_macro_derive(Trait)]
pub fn my_derive(
    annotated_item: TokenStream,
) -> TokenStream {
    let DeriveInput { attrs, vis, ident, generics, data } =
        syn::parse_macro_input!(input as syn::DeriveInput);
    let (impl_generics, ty_generics, where_clause) = generics.split_for_impl();
    let expanded = quote! {
        impl #impl_generics #library::Trait for #ident #ty_generics #where_clause {
            fn consume(&self, food: #library::Food) {
                // just throw it away, they won't know the difference
                ::std::mem::drop(food);
            }
        }
    };
    expanded.into()
}

Though, that will commonly result in a source dependency loop which can be a pain for publishing and other tooling (technically there is a build dependency loop too, but I think it would be possible to resolve that with some care with how the build process actually occurs).

Comparing the approaches I think I like something closer to your design more, it's very similar to what I'm already using for an expression proc-macro—just wrapping it in a macro_rules! macro to inject the $crate token—but extended to work with attribute and derive proc-macros too.

#[macro_export]
macro_rules! format_args {
    ($fmt:literal $(, $($arg:tt)*)?) => {
        $crate::𓀄::stylish_macros::format_args!(crate=$crate, $fmt $(, $($arg)*)?)
    };
    ($fmt:expr $(, $($arg:tt)*)?) => {
        $crate::𓀄::with_builtin!(let $fmt_lit = $fmt in {
            $crate::𓀄::stylish_macros::format_args!(crate=$crate, $fmt_lit $(, $($arg)*)?)
        })
    };
}

jonasbb · July 8, 2022, 9:51am

What would the second #[macro_library_path] do? Would it overwrite the first one or would it be concatenated? Or phrased differently: How would reexporting a reexport work, i.e., A reexports from B which reexports from the macro crate M.

bjorn3 · July 8, 2022, 10:38am

How would you publish such a crate if the runtime dependency depends on the proc macro? Crates.io doesn't allow publishing crates until all their dependencies are published first.

Nemo157 · July 8, 2022, 12:41pm

You can publish and yank a version without the circular dependency first, then you can publish circular dependencies just fine; crates.io doesn’t verify that the version constraints work, just that the named crate exists.

bjorn3 · July 8, 2022, 2:41pm

Doesn't it also check that a semver compatible version exists? Also publish and yank is kind of an ugly hack and non-trivial to automate in CI.

Nemo157 · July 8, 2022, 2:56pm

Nope.

It doesn't need automating, just has to occur once to make the crate name exist in crates.io.

CAD97 · July 8, 2022, 5:23pm

Requiring a decl macro trampoline to inject the $crate compound token would work (and allow the ecosystem to decide how best to pass the token), but requiring it for what is basically a universal need is unfortunate.

Procedural macros usually assume that they can use ::{std, core} as well as prelude names without problems, and a macro only generating paths to std items could thus avoid the need for a runtime library portion... but it's still probably better to proxy through a library crate, because

the crate you're expanding in could be #![no_std] and not have (or even rename) ::std, and
the crate you're expanding in could be yes_std and cargo-rename a dependency to ::core shadowing the standard one.

(The effort of writing a macro resilient to deliberate try-to-break-me use is surprising.)

I've used it before, including to pass features to a watt-compiled macro

I actually agree that I'd like restructuring proc macro declarations to just something like

pub fn my_function(input: proc_macro::BangInput) -> proc_macro::TokenStream;
pub fn my_attribute(input: proc_macro::AttributeInput) -> proc_macro::TokenStream;
pub fn MyDerive(input: proc_macro::DeriveInput) -> proc_macro::TokenStream;

and maybe even support passing adhoc key-value metadata along (e.g. feature flags)

possible syntax bikeshed

pub macro MyDerive = wrap_with_ancillary_token_streams! {
    ::library_macros::MyDerive,
    {
        macro_support_module: $crate::__macro_support,
        extra_traits: cfg!(feature(extra_traits)),
    },
};

though as written this would probably require saying cfg! checks the cfg of the crate the cfg span belongs to?

I clarified the OP; calling a procedural macro uses the first #[macro_library_path] in the use chain of custody back to the original definition.

crates-io doesn't; the local pre-publish verification does by checking that a build can be successful, but it also uses the path dependencies. (I think; publish verify may potentially ignore path dependencies.)

jhpratt · July 8, 2022, 7:36pm

In the future when people are allowed to write proc macros and library code in the same crate, is there any reason they couldn't have a dependency and re-export it themself?

Not at all. My high-level concept was to permit crate!(foo) resolving to the possibly-renamed crate originally called "foo", erroring if the crate doesn't exist. It would be analogous to the proc macro function I mentioned in my previous comment.

Nemo157 · July 8, 2022, 7:39pm

What if there are multiple crates in the current build-tree originally called "foo" (presuming this means "defined with package.name = "foo"")? (Possible via multiple registries, or multiple versions of a crate from a single registry).

jhpratt · July 8, 2022, 7:58pm

As with my first comment, that's where a second "parameter" to the macro would be necessary. It's not too difficult to imagine crate!(foo, 0.1) working. Multiple registries would be an issue in any situation.

CAD97 · July 8, 2022, 8:01pm

This would additionally require allowing expanding bang-macros in path-fragment position; it's currently impossible to write m!()::name. This could of course potentially be made possible, but is not trivial, and might not be possible to do without arbitrary length lookahead. (You could say it would be <m!()>::name, but that also doesn't work because then m!() is a type.)

Especially since ::name actually refers to lib.name which can have duplicates within a single registry.

I don't think in the same crate is possible; same package is. Though this may be more just a terminology question of whether the crate refers to the package or the library. (extern crate says that the library is the crate, not the package.)

This still requires the proc macro to be written in a way that it takes the macro support module path, otherwise it's not possible to override. e.g. this is exactly what bevy_encase_derive is doing, just necessarily as a separate package from the bevy facade.

CAD97 · July 21, 2022, 2:20am

I just made a realization which I intended originally but apparently forgot about: there is, by design, no need for anyone other than the runtime crate to provide #[macro_library_path]. Forbidding reapplying the attribute is probably better than allowing reapplying it.

system · October 19, 2022, 2:21am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
$crate metavariable for procedural macros?	4	1283	March 8, 2020
Proc_macro in an existing library crate language design	23	4653	August 30, 2020
Pre-RFC: Add macros target to Cargo manifest language design	24	2986	December 28, 2020
Weird idea: use as crate language design	5	727	November 24, 2023
Is the module name `meta` forbidden? compiler	5	2776	June 9, 2019