Const fn + proc macros

During some of the discussion about the dangers of automatic proc macro expansions by IDEs I saw an interesting idea discussed which seems to solve a number of cross-cutting concerns:

What if proc macros were implemented were in terms of const fn?

Some possibilities I can see with this:

Allow defining proc macros within the same crate they're used in

I'm sure there are gaps here people will no doubt point out, but using const fn would seem to address the main concerns for why proc macros are separated out, namely that they could impact codegen in the crate they're used in. But what if proc macros were const fns that could only call other const fns defined in core/std or a crate's dependencies (or potentially even the same crate, although that seems tricky)?

"Sandboxing"

With const fn proc macros they'd be evaluated by Miri, which at least for now heavily restricts what is possible for an attacker to do with macro expansion. For example, you could still include_str! or include_bytes! to embed a secret into compiled binary, but unless that binary were executed it couldn't automatically exfil those secrets just by an IDE performing macro expansion.

Those IDE expansions are definitely a very helpful feature to have, but per some recent demonstrations a rather scary one to enable for any project you don't trust. Perhaps a more restricted macro system could find a middle ground?

10 Likes

While this makes sense conceptually, I think in practice const fns are not nearly expressive enough to let people write programs like serde_derive.

5 Likes

This sounds tempting, but for many purposes Miri is simply not fast enough. Complex macros can already be quite slow in debug builds, and Miri is even slower than debug builds.

I think the idea would be to have a const fn that receives a token stream and outputs a token stream.

Yes, that is how procedural macros work. The problem mentioned by alex is that many things are not possible in const fns, such as calling trait methods or panicking. But these features are being worked on.

3 Likes

Also, include_str! and include_bytes! can be accessed by const functions.

I already mentioned that:

For example, you could still include_str! or include_bytes! to embed a secret into compiled binary, but unless that binary were executed it couldn't automatically exfil those secrets just by an IDE performing macro expansion.

1 Like

A general reply regarding the existing limits of this proposal regarding both Miri features and performance:

  • They're being worked on
  • A system like this would be supplemental to the existing proc macro system, because that isn't going away any time soon
2 Likes

Then it doesn't fix the security issue, does it? To make Rust more secure, both procedural macros and build scripts must be sandboxed (or rust-analyzer must be prevented from running cargo check).

From what I know, Zig seems to get away with not having macros and using only its equivalent of const fn, which is comptime.

1 Like

VSCode appears to be moving to a model where workspaces must opt-in to expanding proc macros and running build scripts.

But it would be nice to have safer proc macros that could potentially be expanded by default.

3 Likes

With const fn proc macros they'd be evaluated by Miri, which at least for now heavily restricts what is possible for an attacker to do with macro expansion

If you want to go that way, the compiler might as well compile macros to WebAssembly, which is a target that's actually designed for the use case you mention (sandboxing untrusted code) with good performance.

There are some proc macros that would break the sandbox, so we'd need a capability model to account for them.

8 Likes

Another, more fundamental issue with const fns is allocation. Const fns cannot allocate at this point in time.

But without that capability, we can forget about basing proc macros on const fns for heavy lifting, since there's a lot of existing macros out there that rely on allocation.

2 Likes

One thing that we have to keep in mind here is warning/checkbox fatigue. If a dev always has to check some "do you want good intellisense in this project" setting for every project, she's just going to check the box without really considering it (unless maybe if she's reviewing a potentially malicious crate). It's not going to help at all with the supply chain issue of some dependency-of-a-dependency has a dependency on some innocuous crate that added a secret stealer to their buildrs in some patch update.

That said, I can definitely get behind providing targeted improvements that reduce our reliance on running arbitrary code at IDE time.

IntelliJ Rust has a hardcoded list of "not-a-macro," or effectively "inert," proc macros, such as #[tokio::test] or #[tokio::main]. Providing some standardized IDE-consumable metadata that a proc macro is "functionally inert" would allow IDEs to skip running the attribute macro.

Similarly, derive macros that just do what they say on the tin - emit an implementation of some trait - don't necessarily need to be run by the IDE (exception: to know concrete associated types). Metadata to tell the IDE that would allow the IDE to skip running the derive macro. (Wven if the macro isn't pure, and requires filesystem access!)

Common buildscript functionalities are another candidate for uplifting to (IDE) metadata. A buildscript that just uses autocfg to set some config flags isn't uncommon; this could be skipped by the IDE with some metadata that says what defaults to set. cc and cxx_build buildscripts also can be skipped by the IDE if we have a way to tell it to do so.

And of course, providing some official way to run proc macro plugins on wasm for the macros that are just pure ast transforms removes the need to run arbitrary (unsandboxed) code for them as well.

Once we've lowered the number of crates that require running arbitrary code to get a reasonable IDE experience (that is, one that isn't missing completions and type checking for major common crates), then it's beneficial to start warning "did you mean to run this" when a crate wants to run arbitrary code at IDE time.

7 Likes

Yeah, I wish there was a way to provide a "fallback" to proc macros for IDE purposes.

So eg if you have a derive_serialize proc macro, you could annotate it with #[fallback=fake_derive_serialize] where fake_derive_serialize is another proc macro that just produces methods filled with unimplemented!().

Is there a reason editors do not sandbox their compiler invocations, other than just not having implemented support yet? At least on linux using something like bubblewrap is very easy, and I regularly use a cargo sandboxer with --unshare-net which blocks the recent proof-of-concept.

2 Likes

It's outside of the threat model. Currently the whole stack assumes that the code is trusted. Even something like cargo metadata can execute arbitrary code (Security breach with Rust macros - #4 by matklad), and you can't reasonably sandbox that, as it needs network, write access to disk and ability to spawn external processes.

3 Likes

While we’re at it, I’d consider slowly phasing out token streams in favour of something more structured and thus amenable to static analysis. Right now it’s impossible to know what kind of arguments can be taken by a macro without expanding it, i.e. running the procedure implementing the macro.

The simplest example:

foo!(Bar)

What is Bar? Is it an expression? A pattern? Just an identifier? In what scope will it be resolved? If foo is a pattern-match macro, maybe this can be inferred from the definition, but for procedural macros it cannot be known without running the procedure. And this information is crucial to IDEs and refactoring tools.


Actually that gave me an idea: what about being able to invoke const fns from within declarative macros? One could write a ${ ... } block in the expansion body which could invoke any const fn it wanted with all matched productions and would then generate a token stream to be spliced into the macro body.

It would subsume this proposal, since you can always write

macro_rules! foo {
    ($tokens: tt) => { ${ expand_foo($tokens) } }
}

const fn expand_foo(tokens: TokenStream) -> TokenStream {
    /* ... */
}

but it will encourage people to put as little in the const fn as possible and keep the macro statically parseable.

3 Likes

That can't work. To get the MIR for a function such that you can const eval it, you first need to expand the crate. To expand the crate when using a proc macro defined inside the current crate you would have to get the MIR for the proc macro function to const eval it.

I think compiling proc macros for wasm by default would be a much more realistic method for sandboxing. Macro expansion currently happens before the TyCtxt is created which means that const eval is not possible yet at that point. Creating the TyCtxt requires the expanded and resolved ast as input. Changing this would be much harder than adding a pre-existing wasm engine. In addition const eval by default (without all the shims in rust-lang/miri) is much more limited than wasm. You can't even perform heap allocations. Wasm is also much much faster. Const eval can easily be more than 1000x as slow as native execution. Wasm runs at near native speed when jitted.

5 Likes

Can it not be done lazily, just-in-time, with an error in case of a cyclic dependency? It doesn’t strike me as inherently impossible.