Would it be possible to have a binary formatted cache specific to all individual procedural macro invocations? I would primarily use this to cache data that was collected from files in the filesystem as specified by the invocation. (This would be amazing for me as current invocations can take up to 5-6ms and I plan to have hundreds.)
A lot of popular macros already achieve this via temporary files in the build folder, buts thats way too hacky of a solution to be the community approved solution. Especially as they leave unmanaged dump data behind. I would personally really like to avoid this solution.
Does cargo clean already clean up random files in the target folder? If not, a way to improve the current practice is to have some metadata (maybe in Cargo.toml) to say that a given file or directory in the target directory should be removed with cargo clean.
The other problem is that two crates could potentially clash on which file or directory they use. Then perhaps there should be some API that returns an unique per-crate directory inside target that will be consistent for the same crate but will not be the same for different crates. (This could also be just a convention to prefix them with your crate name). This unique directory would then be automatically removed with cargo clean. This seems like the best approach.
Now what I really, really wanted is for cargo to provide a checksum of every file in the crate as an input for build.rs. I suppose cargo is already hashing the files to discover when it needs to recompile, and I don't want to hash them again.
If those APIs were provided and crates migrated to use them, they could gain most benefits of a centralized cache, but they could continue to do their own thing, ensuring minimal churn.
Proc macros are not supposed to communicate with each other. There is intent to cache the expansion of proc macros as part of incremental compilation, but caching partial results between multiple independent proc macro invocations in the same compilation isn't a supported use case.
That said, the current proc macro server will run proc macros multiple times in the same process, so you can opportunistically memoize work using a global value in your proc macro lib. This isn't guaranteed to continue working, and using globals for anything but memoizing theoretically pure computation is in bad form, but it is something you're capable of doing.
If it's in a location that gets removed by cargo clean (and the proc macro respects CARGO_TARGET_DIR and doesn't just assume it can use ./target), that is just the normal behavior for intermediate artifacts in the build process. You don't delete intermediaries, as a future rebuild can utilize them, and clean removes all of the build cache in a uniform manner.
If proc macros are dumping intermediate state in a location that cargo cleandoesn't remove (and the option exists to use a location it does), that's just in poor form from the proc macro implementation and should be reported as a bug.
Just thinking out loud—you expect that proper file locking inside the build cache (proc macros may be invoked in parallel) and reading the cache files to be more performant than... collecting filesystem data? Isn't this just substituting one bit of IO with another?
A much better solution would be to somehow change the system such that you don't need to cache or redo the work. One common solution is to share a macro_rules! assembled once between the other consumers, although I can never recall the exact rules around how macro-expanded macro name resolution behaves.
Speaking from experience: if the macro performs a computationally intensive task, this might not be so easy or even feasible, especially if the implementation uses an algorithm with a tight or lower complexity bound (Θ(...) and Ω(...), respectively).
Whats left is rearchitecting the datastructures involved to be more cache efficient, adding parallellization for a nice-but-not-asympotically-relevant speedup, and otherwise optimize the code itself. But in the abstract those gains are expected to range from modest to negligible (e.g. it would be unrealistic to expect a 5x perf gain from this for the macro as a whole).
While it's far from possible in every case, I'm more alluding to changing how the macro is used (so the computationally difficult work doesn't get computed as often) rather than improvements within the macro.
One specific example, although vague, is a concept for imitating Unreal Header Tool's generated.h. The naive approach would be to have each file call UHT to determine what needs to be injected, but a better approach is to have a single call into UHT generate all the necessary code snippets in one pass, along with a macro_rules! that can grab the appropriate generated code for each file.
In this way we replace N expensive invocations with one expensive invocation and N comparatively cheap calls that just fetch the work that's already done, and without any direct communication between macros, just coordinated use of the shared build environment.