Proc_macro in an existing library crate

In multiple crates that I work on, proc_macro is used as a performance optimization helper for a parsing library.

For example, in case of unic_locale I use it to allow users to specify a valid locale identifier at build time with no runtime cost:

use unic_locale::{locale, Locale};

const loc = locale!("en-US");
const loc2: Locale = "en-US".parse().unwrap();

The former validates at build time and spits out a pre-computed struct.

Unfortunately, proc_macro brings a pretty high maintenance overhead due to the limitation of lib crates and proc_macro crates being separate.

It requires me to maintain unic_locale_impl crate, unic_locale_macro crate which imports unic_locale_impl and unic_locale crate which reexports both.

Is there an interest in such feature? I filed this as an issue originally, and got asked to come back with RFC.

4 Likes

The first thing that comes to mind for me for these sorts of cases is const fn, or at least, a hypothetical future more powerful const fn

This is one of proc_macro's major limitations, along with no $crate-like metavariable which means your macro either needs to take the lib-crate's name as a parameter (very annoying and verbose) or you just cross your fingers and pray the user hasn't renamed your lib-crate in their Cargo.toml file.

One thing that would be useful in discussions like these are a list of reasons (or links to reasons) why Rust made proc_macros the way they are (with no "normal lib" features). Unfortunately I don't know the answers, but if anyone else does I think sharing them here would be helpful to the discussion.

There are many things const fn cannot do (and will never be able to do) which a macro can do. I think const fn is a bit off topic.

1 Like

For this particular use case, spitting out a precomputed struct, const fn is right on topic. IIRC, the author of the regex crate expressed desires to build compile time regex instances using const fn. This seems like very much the same thing.

2 Likes

I'm not particularly talking about const only, the proc macro here is useful for just offloading from runtime to build time, and can be used as:

fn find_matching(locales: Vec<Locale>) {
    let enUS = locale!("en-US");

    for loc in locales {
        if loc == enUS {
           println!("Match found!");
        }
    }
}

That's true that it doesn't work now as a full substitute now, but as const implementation matures it could one day be the solution to your multi crate issue. Ideally in the future, even in this example, you could replace the macro call with a call to a const function that does the parsing and validation.

This is a technical limitation that we would be happy to see lifted, but the technical details are hairy. I'm not sure who has the technical expertise to help drive the feature, but I think everyone would be excited to see progress here.

4 Likes

Would it be possible for someone with understanding of the context to provide list of blockers that are already filled? It would be good to understand what has to happen before this can be tackled.

This requires splitting source code for something that looks like a single crate into multiple parts that are compiled (or interpreted) separately, possibly for different targets if we are cross-compiling.

This is a major change to the compiler, but it would also be useful for other things (like SYCL, for example) where you need to generate code for different targets from a single source.

4 Likes

How difficult is it to enumerate the data-structure and component boundaries within the compiler where target-architecture annotation would need to be added? Presumably proc-macros would be just a virtual target-architecture, though probably a distinguished member of the target-architecture enumeration, along with the compiler's host target-architecture.

This same effort could also lay the groundwork for the modular ABI proposal that is in a concurrent thread, as in some sense those different ABIs can be considered to be alternate target-architectures.

This requires splitting source code for something that looks like a single crate into multiple parts that are compiled (or interpreted) separately, possibly for different targets if we are cross-compiling.

But this is already the case with build.rs, how is that different?

There could be in the Cargo.toml something like

[proc-macro]
path = "macro/macro.rs"

# Or can it be merged with  [build-dependencies]
[proc-macro.dependencies]
syn = "1"
quote = "1"
9 Likes

That's an interesting middle ground proposal! Previously, we've imagined just writing them anywhere in your library, and having rustc figure it out. But having it work like build.rs means that cargo knows how to compile a separate crate as part of the build step for your library. That would be much more trivial to implement.

3 Likes

This would be fantastic. It'd be nice if the path was treated as a whole module (e.g., src/macros/mod.rs for 2015 edition or src/macros.rs for 2018 edition modules) so you could structure your code such that everything for proc macros were in a particular (multi-file) module tree.

It would have to be treated as a crate root, not a module root, so it would probably want to be src/macros/lib.rs. And then it would need its own set of dependencies, as @ogoffart already identified.

1 Like

Would it be feasible for said macro root crate to depend on the normal crate it's a part of? For example, the crate mentioned in this thread looks like this:

facade dependencies:

  • macro crate
  • impl crate

macro dependencies:

  • impl crate

impl dependencies: none (well, except third party ones)

It can go one way or the other. Either the impl crate can use the proc macro, or the proc macro crate can use the impl library.

I think as a default if one direction has to be chosen, making the impl crate have access to the macro as crate::macro_name is better than the inverse (where it would be there for the outside world but not in the impl).

In your case you'd still be able to reduce the number of crates from three to two.

I have a personal project where the dependency is actually circular (the macro directly depends on itself, even), but because I use watt for the macro impl, this isn't exposed to the user. Your example could easily reduce to one crate if the macro is precompiled to wasm and run via watt.

The proc-macro crate cannot depend on the crate itself.

Note that, already today, you can re-use code between the macro crate and the implementation crate.

imagine a directory structure like this:

 ├── Cargo.toml
 ├── src
 │   └── lib.rs
 └── macro
     ├── Cargo.toml
     ├── src
     │   └── lib.rs
     └── common
         └── mod.rs

in the main create src/lib.rs you have:

// re-export the macro
pub use my_macro_crate::my_macro

// import the common code
#[path="../macro/common/mod.rs"]
mod common;

// optinally re-export it
pub use common::*;

And the macro/src/lib.rs can also make use of the shared code with

#[path="../common/mod.rs"]
mod common;

But one must be careful inside the common code that referring to crate:: might not be the right thing.

This works because the macro crate is within a sub directory of the other crate. So you need to publish only two crates on crates.io instead of three.

But ideally, if we could have a [proc-macro] section in the root's cargo.toml, then we would only need to publish a single crate, which would be much more convenient.

Under this new model, we could also redefine what it means to do quote!(crate::foobar) where crate refer to the actual parent crate. Maybe depending on the span site. But that actually might be more difficult.

One use case is to make your life easier when implementing a crate ie just abstraction, but in this case done with a proc macro. The other use case is extending the public API of a crate with a proc macro.

I have encountered both use cases ie. they are both extant in the wild. So having to make a choice for at most 1 of them is a bit like having to choose between sync and async code support in rustc: they serve different use cases, so having to choose between them is not really acceptable.

As for the proc macro as build.rs-like construct: that's an interesting idea, but I would like to add that I would want to be able to declare multiple proc macros in 1 crate (for either use case outlined above in any mixture) so it shouldn't be too much like build.rs, which is limited to just 1 per crate.

Perhaps it can't in actuality, but can we maybe pull a sleight of hand to make it look like it can? Eg. by using another hidden crate or some such?

I was not aware this existed. That could be useful indeed when implementing a proc macro given the current infrastructure, but ultimately this is still a hack compared to the conceptual ideal where I simply don't have to care about proc macro crates and impl crates; rather I would be able to just write a proc macro as easily as a function ie without extra crate shenanigans. Anything less than that will still leave people in general pining for that ideal.

You can say declare multiple proc macro entry points in a single proc macro crate. A "build-rs like proc-macro" would still be able to declare multiple proc macros from the single crate.

As I said, there's two possibilities.

  1. The proc-macros defined are available within the library as crate::macro and usable within the library.
  2. The proc-macros do not exist within the library. They can be accessed by someone using the library as ::your_crate::macro, but the library cannot access or use the proc macros.

A dependency from the proc macro onto the library is only possible in the second case. That's because in the first case, the dependency is circular: the library uses the proc macro uses the library uses [snip]

I think it makes sense to handle "build-rs like proc_macro" like build-rs: it can't depend on the library directly, because it's used to build the library.

There exist simple (enough) hacks to get around this bootstrap problem:

  • Use mod to mount (parts of) the library into the buildscript/proc_macro (requires consumers to compile that subset of the library twice)
  • Use an extra crate to factor out the shared code (I still want workspace-private crates that don't have to be published to crates-io)
  • The buildscript/proc_macro uses an older version of the library from crates-io (requires consumers to compile at least two versions of the library; be careful about depending on the same version (infinite recursion doesn't work) or growing the bootstrap chain longer)
  • Use WASM or other precompilation to drive the buildscript/proc_macro (requires some setup, but provides the best buildscript/proc_macro compile time to users, as only the developer compiles your library more than once)

Somewhat obviously, I'm personally in favor of the "extra crates" version (especially if we get workspace private crates eventually, please?) and/or the WASM version (especially if we can get support baked into cargo for uploading/using a WASM precompiled version).

1 Like

For the particular case that I listed at the top of this issue, I don't think that would work.

I need the proc_macro to be able to use my library, and then I need to expose it for users of my library (my library does not have to use the proc_macro).