Pre-RFC: Add a builtin macro to indicate build dependency to file

Motivation

It is not uncommon that proc macros introduce build dependency to non-Rust files. For example, pest reads the grammar file, graphql-client reads schema file and query file.

Currently there is no direct way for those proc macro to inform compiler about the dependency. One workaround is to use include_str! or include_bytes! macro and assign its result to a const. This isn’t ideal as the generated program doesn’t necessarily need such content, while including all the data into generated source file may slow down build time and bloat memory usage, especially when the file is large.

The largest example I can find right now is GitHub’s GraphQL schema file, which is ~256KiB. It isn’t too bad to include. But proc macro may at some point want to read larger files like archive files, system images, and so on, at which point, using include_str! or include_bytes! may not be an option.

Cargo build scripts can output rerun-if-changed= to inform Cargo about such dependency, but proc macro has no way to do so at the moment.

Detailed design

Introduce a new compiler built-in macro depends_on (name to be bikesheded) which takes a file name as argument (either absolute path or relative path to current file?). It informs the compiler that when the specified file is modified, the current file should be re-compiled.

For example:

depends_on!("../grammar.pest");

Alternatives

There are several possible alternatives:

Add opt-in parameter to proc macro

For example, we can pass an additional, for example, &mut Context parameter to proc macro function, via which the function can add new dependencies. To avoid breaking change, this may be controlled by an additional parameter to #[proc_macro] and #[proc_macro_derive] for whether such argument should be passed in.

Allow proc macro to return more than just TokenStream

Convert return type of proc macro to something like impl Into<ProcMacroResult>, and have ProcMacroResult contain the current TokenStream as well as dependency information. Then we just have impl From<TokenStream> for ProcMacroResult.

This way we don’t break any existing proc macro, but make it possible to extend what can be returned from proc macro.

Have the compiler not actually import file content in certain case

This was suggested below by CAD97.

Either the compiler don’t read the content when include_str! or include_bytes! is assigned to an underscore const, or an attribute can be added to communicate that.

Have compiler track file open in proc macro

This was suggested below by josh.

Have the standard file functions track file opens when called from a proc macro, and emit dependencies accordingly.

Do nothing

Just have proc macros generate include_bytes! for that purpose.


What do you think?

(Edit: added several proposals below to alternatives.)

2 Likes

Just to note: state of the art is const _: &[u8] = include_bytes!("");. If we want to just bless this pattern with some more magic around that macro, we could maybe make that pattern (which deliberately makes the include not usable) add the data dependency without actually reading the file and inlining it.

As an alternative to a new macro or special casing include_bytes, add a #[generated()] attribute that can be emitted from code generation, one of the properties of which allows adding data dependency by noting external resources used to generate it.

1 Like

Can you elaborate on that patter? How is it used? Why does it work?

It’s the exact same as include_str!, I’ve just elided the actual path. Underscore const is #54912, and evaluates the const but leaves it unnameable (thus unusable and always optimized out).

Using inlude_bytes! is more general over _str! because it works on non-UTF8 files. Adding the macro adds a dependency on the external file because rustc will recompile the crate when that file changes, as its contents are nominally part of the crate (even if unused).

Perhaps we could have the standard file functions, when called from a proc macro, track what files they open, and then emit dependencies accordingly? That would make this Just Work automatically.

1 Like

I’ve proposed File -> TokenStream a couple times, and that definitely would have to do that. Unfortunately, I don’t think that would work well for anything that wants its own lexer and definitely not for binary data.

Maybe we could have three “read an external file” functions that handle data dependency and span assignment? meta::include(Path) -> TokenStream, meta::include_str(Path) -> String, meta::include_bytes(Path) -> Vec<u8> (modulo naming bikeshed).

Although it would make this Just Work™, it feels rather implicit. There may be cases uncovered, e.g. if a proc macro invokes an external executable to generate something, as well as cases incorrectly covered, e.g. a proc macro opens a file for writing.

The second case reminds me that it is also possible that the compiler passes in an output path for proc macro to write such information to via environment variable or so, or maybe proc macro can just println! to inform the compiler… None of these sounds like a good idea, though…

1 Like

I thought about having some global functions for that, but it seems to me that using global state is kinda against Rust’s general principles. For example, if at some point, the compiler decides to invoke proc macro in parallel, such settings may cause problem.

The ideal is that the rustc internals are parallelized anyway. I do agree that exposing this as a “global” to proc macros feels “off”, though. (A meta::Context or similar passed to the proc macro definition function would eliminate the “global” access though still need synchronization.)

1 Like

Hi, does any of the proposed solutions cover directories or new test-data/subfolders being added to a directory? As for test-generator I have got the case, that files/directories may be added, renamed or removed at all. Therefor I would favor a feature, observing a dedicated folder, for example a cargo feature like

[trigger]
test-data = "data/*"

And every time files would be changing in there, a rebuilt is triggered.

EDIT FYI: the test-generator is enumerating directories using the glob-crate

1 Like

Another example for which this feature can be useful is shader files for graphical apps, e.g. written with vulkano.

I think adding a macro specifically for declaring dependencies, adding opt-in parameter, and extending return value are all able to be revised to support directories the same way as that for build script (that changes to directory itself like adding / removing file triggers the rebuild, but changes to content of file inside doesn’t), and you would just need to list all the directories and subdirectories and files you care about in the output somehow.

Continuing using include_str! and include_bytes! but hint compiler not to actually read as suggested by CAD97 may be less ideal for handling this case since those macros were not designed to “read directory”.

Having the compiler track file open as suggested by josh may or may not work (you can probably track read_dir I guess).

Anyway, that’s a very interesting usecase that I indeed didn’t think about, but I would say that most of the solutions proposed here are likely being able to handle that case.

Maybe the build-script-feature cargo:rerun-if-changed=PATH might help to solve the issue with the conditional rebuild (build-script-docs)

AFAICS: the procss might work as follows, please correct me if I am wrong:

  1. the build-script build.rs would just glob/iterate the specific files of the test-data directory, printing the rerun-if-changed for all of them to console.

  2. cargo will evaluate the console-output of build.rs and if it differs to previous-fileset (?) or the timestamps of those files did change, then the crate will be re-built.

In this case, it would be possible to list/enumerate generator-input-files via build.rs Drawback: it would be a rebuild of the crate and not of single files

Proposal: The feature rerun-if-changed could be extended, specifying the rs-file that should be rebuilt.

EDIT crate ready for use crate: https://crates.io/crates/build-helper

To define dependencies to input-files so far it is necessary to implement a build-script pouring into cargo-procss some rerun-if-changed instructions.

it would be much easier, if Cargo.toml would support a config key directly, eg

    rerun-if-changed = "data/*"

Supporting the glob-syntax/pattern the dependency could be adapted each build-procss.

I’ll take a look at this but I wonder how difficult an include_hash!("data/foo") macro would be. That seems like it hits all the necessary points.

This doesn’t work for proc macros. It at least requires additional declaration for users of the proc macro, which isn’t ideal.

Its semantic would still require the compiler to read the whole file when it may not be strictly necessary.

Well, as for pest-grammars, the following post explains how to define dependencies to files or directories via build-script.

There is no need to “include” any files or so; this build-script is just listing all entities/files/dirs at a specified location (glob) and sends this list via stdout to the parent cargo-process (statement rerun-if-changed=PATH). This way it is possible to watch directories for new files or modification of files, triggering a crate-wide rebuild.

AFAICS, this would work perfectly for PEST grammar files, too.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.