Proc-macro output dir

I've opened an issue about that

Currently, there is an environment variable $OUT_DIR for build.rs, but such dir and variable is not set for crates which only use proc-macro.

Either bypassing the check (by detect whether the compile is triggered by R) or just put an empty build.rs seems unreasonable.

Is it possible to enable the proc-macro's OUT_DIR?

What's the usecase? A build script needs a tempdir to communicate files between its run and the subsequent compile. A proc-macro can just generate tokens into the source code so doesn't need it for the same reasons (and use a normal tempdir for any intermediate files).

1 Like

The linked issue sounds like the macro is generating an equivalent of FFI bindings or rustdoc for another language. I can see how this is more convenient than parsing Rust files outside of the compiler.

Macros are not supposed to be writing files, and this use-case is a bit at odds with eventually sandboxing them.

You could use syn to parse the files yourself, if they don't rely on specific cfgs or other macros.

There's also rustdoc JSON output (nightly only unfortunately). It contains much more information about the types in the crate.

You could use the proc macro to generate output as a Rust function or constant, and then use the crate as dev dependency, and write it out in build.rs, or even a standalone tool (packaging out_dir is tricky too)

4 Likes

I'm currently writting a crate that generate R packages, R needs a dll to load, and a wrapper define the simple functions in it.

The thing I want could be:

// Rust source

/**fine

@description
a simple function tell I'm fine

@details
blah blah

@param id integer.
*/
#[export]
pub fn fine(id: Sexp<i32>)->Owned<character> {
    Owned::raw_from_str(format!("I'm fine{}",id[0]))
}

Some wrapping could be done directly with the rust macro:

    mod fine {
        use super::*;
        #[no_mangle]
        extern fn _rust_fine_wrapper__0 (id: Sexp<i32>) -> Owned < character > {
            if id.missing() {
                rmin::handle_panic(||panic!("Parameter missing detected\n  missing id: {}", id.missing()))
            } else {
                _rust_fine_wrapper_unsafe__0(id)
            }
        }
        #[no_mangle]
        extern fn _rust_fine_wrapper_unsafe__0 (id: Sexp<i32>) -> Owned < character > {
            rmin::handle_panic(||fine(id))
        }
    }

and there is 2 things which will go beyond your discription that a proc-macro can just generate tokens into the source code

The first one is auto-registeration, R dll need an entry that register some functions, which could be done by the proc-macro, but need read/write to some static things (currently using a static Mutex).

mod _please_do_not_use_rmin_export_interface_as_your_mod_name_ {
    use ::rmin::{Sexp, Owned, reg::*};
    use ::core::ptr::null;
    extern "C" {
        fn _rust_fine_wrapper__0(arg0: Sexp<()>)->Owned<()>;
        fn _rust_fine2_wrapper__1()->Owned<()>;
        fn _rust_add_wrapper__2(arg0: Sexp<()>, arg1: Sexp<()>)->Owned<()>;
        fn _rust_add4_wrapper__3(arg0: Sexp<()>, arg1: Sexp<()>, arg2: Sexp<()>, arg3: Sexp<()>)->Owned<()>;
        fn _rust_another_sum_wrapper__4(arg0: Sexp<()>)->Owned<()>;
        fn _rust_fine_wrapper_unsafe__0(arg0: Sexp<()>)->Owned<()>;
        fn _rust_fine2_wrapper_unsafe__1()->Owned<()>;
        fn _rust_add_wrapper_unsafe__2(arg0: Sexp<()>, arg1: Sexp<()>)->Owned<()>;
        fn _rust_add4_wrapper_unsafe__3(arg0: Sexp<()>, arg1: Sexp<()>, arg2: Sexp<()>, arg3: Sexp<()>)->Owned<()>;
        fn _rust_another_sum_wrapper_unsafe__4(arg0: Sexp<()>)->Owned<()>;
    }
    const R_CALL_METHOD:&[R_CallMethodDef]=&[
        R_CallMethodDef {name:c".c0".as_ptr(), fun:_rust_fine_wrapper__0 as *const _, numArgs: 1},
        R_CallMethodDef {name:c".c1".as_ptr(), fun:_rust_fine2_wrapper__1 as *const _, numArgs: 0},
        R_CallMethodDef {name:c".c2".as_ptr(), fun:_rust_add_wrapper__2 as *const _, numArgs: 2},
        R_CallMethodDef {name:c".c3".as_ptr(), fun:_rust_add4_wrapper__3 as *const _, numArgs: 4},
        R_CallMethodDef {name:c".c4".as_ptr(), fun:_rust_another_sum_wrapper__4 as *const _, numArgs: 1},
        R_CallMethodDef {name:c".u0".as_ptr(), fun:_rust_fine_wrapper_unsafe__0 as *const _, numArgs: 1},
        R_CallMethodDef {name:c".u1".as_ptr(), fun:_rust_fine2_wrapper_unsafe__1 as *const _, numArgs: 0},
        R_CallMethodDef {name:c".u2".as_ptr(), fun:_rust_add_wrapper_unsafe__2 as *const _, numArgs: 2},
        R_CallMethodDef {name:c".u3".as_ptr(), fun:_rust_add4_wrapper_unsafe__3 as *const _, numArgs: 4},
        R_CallMethodDef {name:c".u4".as_ptr(), fun:_rust_another_sum_wrapper_unsafe__4 as *const _, numArgs: 1},
        R_CallMethodDef {name: null(), fun: null(), numArgs: 0}
    ];

    #[no_mangle]
    extern fn R_init_rext(info:*mut DllInfo){
        unsafe {
            R_registerRoutines(
                info,
                null(),
                R_CALL_METHOD.as_ptr(),
                null(),
                null()
            );
            R_useDynamicSymbols(info, 0);
            R_forceSymbols(info, 0); // change this to 1 will make most of the functions unsearchable, which is sad for people who want to compile in Rust and load in R directly.
        }
    }
}

Another need is writting an .R wrapper, tell R what to load and how to use the exported plugins:

# R uses `#` as a comment, roxygen2 uses `#'` as doc comments

#' fine
#' 
#' @description
#' a simple function tell I'm fine
#' 
#' @details
#' blah blah
#' 
#' @param id integer.
#' 
#' @export
fine <- function(id).Call(.c0, id)

As you can see, the best way for generating the R wrapper file is generate it with proc-macro.

Why not use the cargo xtask pattern for this?

The crate structure might be a little bit complex:

I wrote rmin, and some related crates.

R users (not only me) write their own plugins that uses rmin

Perhaps cargo doc(seems cannot do it for cdylib) or cargo test could be commands to generate R bindings... But it is dirty (writting some file in doc test...seems worse than allow writting doc directly.) What's more, they are commands that very easy to forget.

What's more, since generate the binding is a goal, the best way is just allow it. I tried cargo test/cargo doc with an additional function that write the output file, works fine. But, the program bahaves more like a virus:

Works fine with normal build, but for developer, re-writting its source file.

That might not be a good idea.

As for build the wrapper with an extra tool, that seems nice, but the support of the extra tool is a problem. Since tool is seperated from the main macros, I cannot assure whether the tool have the same version with the macro, which will bring extra complexity to my problem.

I just wonder, why not allow proc-macro output to a specific directory?

R has a similar mechanism, using Makevars to execute a series of making task. There is no need to use cargo xtask.


some off-topic discussion: syn could be a choice, but not very good, since the first time compile syn crate takes about 20 seconds to finish its compile. Due to R's design, If I want to compile a small function after the first time I open a R session, it may take 20s to compile syn crate.

Doesn't it at least cache built binaries? That seems like a bad design. Most languages I know of have native extensions you compile in advance to the session, rather than compile on demand. Python comes to mind, where this would be handled by setup.py.

One option if compiling in advance isn't an option would be to use sccache, that would work for a dev setup, but not a user setup.

Some of the caches are stored in the /tmp/Rtmp**** folder, which will be clean after R session is closed.

Totally agree on this. Adding OUT_DIR to proc-macros explicitly seems like encouraging side effects in proc-macros, which is a thing we try to walk away in general.

As for side effect, I have a question:

How to register functions without side effect?

Currently, the need is, generate a register function which register all exported function. Without side effect, how a proc-macro could know what function could be registered?

A proc macro is not the intended tool for this. Consider a build.rs or cargo xtask instead. Consider precompiling instead of building it on demand. You are going against the intended usage and that creates a rough user experience.

2 Likes

if I'm understanding correctly, not even build.rs works for this. OUT_DIR is for intermediate artifacts that later get consumed during your build. It sounds like you are wanting your proc-macro to generate a final artifact. This is currently not supported in build.rs though its a common topic. I believe this is one of the use cases discussed in Need a reliable way to get the target dir from the build script · Issue #9661 · rust-lang/cargo · GitHub

3 Likes

Right now global registration is a side effect, there's no way around that. However some are working to change this.

Ah, that explains a few things. I found using clap_mangen, completions and similar from build scripts to be very unergonomic. I don't want to build the man generation into the final binary in order to reduce the binary size. What is the current suggested workaround for this?

Currently, let proc-macro output the results is available, but there is no doc to ensure such implementation will not change. This is why I write this thread.

It seems allow writting to some dir is acceptable, since you have to write something.

Is there any shortadvantage that allow a "stable output dir" will have?

IMHO, if we have the need, the better way is to allow (just like unsafe), rather than write doc tells you, you cannot bypass this feature, and the best workaround need a lot of costs.

That consider would introduce a lot of unnecessary logic.

My current implementation is, read data from #[export] macro, save meta data in a static variable. (Thanks to rust's current implementation, the proc-macro start only once for a crate, thus the static variable could save all the exported meta data) At end of file, adding an extra done! macro, which reads the static variable, write a module that could register all the exported function, and then writting docs.

build.rs cannot writting registeration modules, xtask can read all the function, but I can only have the most ugly implementation:

cargo build --feature rmin-macro-output-meta > output-meta.txt
reg-parse output-meta.txt > ./src/register.rs # suppose we have such binary executable
doc-parse output-meta.txt > ./wrapper.R
cargo build --release

In such implementation, all code are weii-documented. (println in proc-macro comes from the official example which tell us how to write proc-macros, using other executable seems a normal usage of xtask.)

BUT, the cost is, we have to generate more temporary files and several binary executables, which makes the crate harder to use.

Currently, the first run takes 3-6s, and I cannot imagine what the cost with the above awful xtask use.

Maybe I misuse that crate, but I cannot find any better usage.