Const fn + proc macros

VSCode appears to be moving to a model where workspaces must opt-in to expanding proc macros and running build scripts.

But it would be nice to have safer proc macros that could potentially be expanded by default.

2 Likes

With const fn proc macros they'd be evaluated by Miri, which at least for now heavily restricts what is possible for an attacker to do with macro expansion

If you want to go that way, the compiler might as well compile macros to WebAssembly, which is a target that's actually designed for the use case you mention (sandboxing untrusted code) with good performance.

There are some proc macros that would break the sandbox, so we'd need a capability model to account for them.

8 Likes

Another, more fundamental issue with const fns is allocation. Const fns cannot allocate at this point in time.

But without that capability, we can forget about basing proc macros on const fns for heavy lifting, since there's a lot of existing macros out there that rely on allocation.

2 Likes

One thing that we have to keep in mind here is warning/checkbox fatigue. If a dev always has to check some "do you want good intellisense in this project" setting for every project, she's just going to check the box without really considering it (unless maybe if she's reviewing a potentially malicious crate). It's not going to help at all with the supply chain issue of some dependency-of-a-dependency has a dependency on some innocuous crate that added a secret stealer to their buildrs in some patch update.

That said, I can definitely get behind providing targeted improvements that reduce our reliance on running arbitrary code at IDE time.

IntelliJ Rust has a hardcoded list of "not-a-macro," or effectively "inert," proc macros, such as #[tokio::test] or #[tokio::main]. Providing some standardized IDE-consumable metadata that a proc macro is "functionally inert" would allow IDEs to skip running the attribute macro.

Similarly, derive macros that just do what they say on the tin - emit an implementation of some trait - don't necessarily need to be run by the IDE (exception: to know concrete associated types). Metadata to tell the IDE that would allow the IDE to skip running the derive macro. (Wven if the macro isn't pure, and requires filesystem access!)

Common buildscript functionalities are another candidate for uplifting to (IDE) metadata. A buildscript that just uses autocfg to set some config flags isn't uncommon; this could be skipped by the IDE with some metadata that says what defaults to set. cc and cxx_build buildscripts also can be skipped by the IDE if we have a way to tell it to do so.

And of course, providing some official way to run proc macro plugins on wasm for the macros that are just pure ast transforms removes the need to run arbitrary (unsandboxed) code for them as well.

Once we've lowered the number of crates that require running arbitrary code to get a reasonable IDE experience (that is, one that isn't missing completions and type checking for major common crates), then it's beneficial to start warning "did you mean to run this" when a crate wants to run arbitrary code at IDE time.

7 Likes

Yeah, I wish there was a way to provide a "fallback" to proc macros for IDE purposes.

So eg if you have a derive_serialize proc macro, you could annotate it with #[fallback=fake_derive_serialize] where fake_derive_serialize is another proc macro that just produces methods filled with unimplemented!().

Is there a reason editors do not sandbox their compiler invocations, other than just not having implemented support yet? At least on linux using something like bubblewrap is very easy, and I regularly use a cargo sandboxer with --unshare-net which blocks the recent proof-of-concept.

2 Likes

It's outside of the threat model. Currently the whole stack assumes that the code is trusted. Even something like cargo metadata can execute arbitrary code (Security breach with Rust macros - #4 by matklad), and you can't reasonably sandbox that, as it needs network, write access to disk and ability to spawn external processes.

3 Likes

While we’re at it, I’d consider slowly phasing out token streams in favour of something more structured and thus amenable to static analysis. Right now it’s impossible to know what kind of arguments can be taken by a macro without expanding it, i.e. running the procedure implementing the macro.

The simplest example:

foo!(Bar)

What is Bar? Is it an expression? A pattern? Just an identifier? In what scope will it be resolved? If foo is a pattern-match macro, maybe this can be inferred from the definition, but for procedural macros it cannot be known without running the procedure. And this information is crucial to IDEs and refactoring tools.


Actually that gave me an idea: what about being able to invoke const fns from within declarative macros? One could write a ${ ... } block in the expansion body which could invoke any const fn it wanted with all matched productions and would then generate a token stream to be spliced into the macro body.

It would subsume this proposal, since you can always write

macro_rules! foo {
    ($tokens: tt) => { ${ expand_foo($tokens) } }
}

const fn expand_foo(tokens: TokenStream) -> TokenStream {
    /* ... */
}

but it will encourage people to put as little in the const fn as possible and keep the macro statically parseable.

3 Likes

That can't work. To get the MIR for a function such that you can const eval it, you first need to expand the crate. To expand the crate when using a proc macro defined inside the current crate you would have to get the MIR for the proc macro function to const eval it.

I think compiling proc macros for wasm by default would be a much more realistic method for sandboxing. Macro expansion currently happens before the TyCtxt is created which means that const eval is not possible yet at that point. Creating the TyCtxt requires the expanded and resolved ast as input. Changing this would be much harder than adding a pre-existing wasm engine. In addition const eval by default (without all the shims in rust-lang/miri) is much more limited than wasm. You can't even perform heap allocations. Wasm is also much much faster. Const eval can easily be more than 1000x as slow as native execution. Wasm runs at near native speed when jitted.

5 Likes

Can it not be done lazily, just-in-time, with an error in case of a cyclic dependency? It doesn’t strike me as inherently impossible.

In terms of "same crate" macros, there was a proposal to use the same package for proc macros to eliminate separate *_derive crates, and build them the same way build.rs is:

Of course that's orthogonal to the security issue.

4 Likes

We could have the proc macros compile to wasm if a specific flag is specified (--wasm-macros?) . Then, all "unimplemened" functions in std would raise a special type of panic. IDEs could then pass --wasm-macros by default, and if the build fails because of an unimplemened wasm std function - prompt the user for permissions (perhaps listing the dependency ancestry of the crate whose compilation failed). If a more sophisticated sandbox is implemented (for example, allowing fs read access to the workspace), the same fallback strategy can still be used.

How well this can work depends on how many popular crates need resources outside of the sandbox. But I think that this helps deal with the issue of left-pad needing network access.

4 Likes

I don't think const fns can adequately replace proc macros. You may, for example, want to write part of the project in a DSL that would be transpiled into rust and have that part in separate files, which would require you to have access to file IO.

As I've mentioned a couple of times now, you can still do that via include_bytes! and include_str!

That's probably not a good idea, if you have to parse a lot of stuff. Though I'll admit I can't imagine a case where the developer would need to parse a large number of files before the software can be compiled. Still, since this does not solve the build.rs problem, it seems like a half-measure.

Expanding proc macros is pretty important for IDEs to be able to comprehend code. I'd argue that there are a lot more crates that rely on proc macros than they do build.rs generating code.

2 Likes

For what it's worth, many of the most-used libraries on crates.io have build scripts, including:

  • log
  • syn
  • libc
  • proc-macro2
  • serde
  • num-traits

Some relatively self-contained crates can be checked without running any build scripts, but I expect that a very large proportion of Rust projects depend on one of these, or on some other crate with a build script.

Notably, almost all crates that use proc macros have syn and/or proc-macro2 in their dependency graphs, which means they also depend indirectly on build scripts, at least for now.

1 Like

As it were, none of those crates (at least in their current form) would be helpful in the context of a const fn macro system as they are not implemented in terms of const fn.

I think that the goal is to minimize the proc macros and build.rs scripts that need to be audited. An entire tree of crates depending on lets say serde doesn't imply you need to look for compile time shenanigans in those crates if they don't use either proc macros or build.rs.

3 Likes

Well, scratch that. Someone has already squatted that syntax.