- Feature Name:
proc_macro_sandbox
- Start Date: 2023-08-20
Summary
Procedural macro crates can opt-in to precompilation to WebAssembly as part of the publish process. The crate compiles locally (as already done by cargo publish
) and then is securely verified by a crates.io-managed service, conceptually similar to docs.rs, which enforces that sources in the crate exactly reproduce the Wasm artifact before the new release becomes available to any package registry users. Users can opt-in to running procedural macros they depend on via Wasm sandbox by installing a suitable Wasm runtime as a Rustup component.
Motivation
Auditability: Rust's current only supported approach to procedural macros exposes anyone who uses them to a variety of unsavory possibilities: macros are arbitrary Rust code that can spawn processes and make network requests. So can build.rs, but the implementation of a typical procedural macro is 2 orders of magnitude larger than a typical build script. We reduce audit burden by sandboxing macros such that their only possible interaction with the outside world is inspecting tokens and producing tokens. An estimated 99% of macros are amenable to sandboxing; unsandboxed macros remain expressible exactly as before, but would invite great scrutiny, analogous to unsafe code during audit.
Determinism: Inadvertent proc macro nondeterminism is a source of pain in build systems. For example, Buck likes to race a local recompilation against a download from distributed cache, taking whichever finishes first. When artifacts from the same source code diverge, builds can fail or take longer than needed. In a recent instance, rustc nondeterminism was found to be the largest contributor to slow builds due to poor cache utilization in a codebase using Remote Execution and a distributed build cache in Meta. These bugs can be as simple as a proc macro using HashMap with randomized iteration order, and are impossible in a Wasm sandbox because there is no possible way to access OS randomness within the sandbox. Hash-based containers use pseudo-randomness with a hardcoded seed.
Isolation: For monorepo-scale build systems to work on billion-line codebases, perfect understanding of the inputs of every build step is critically important. A rustc invocation must not result in reading some arbitrary file from the local filesystem in a manner that is not tracked by the build system, for it will not know the crate needs to be rebuilt when that file changes. Workarounds exist but it is opt-in by the macro, rather than something that can be enforced codebase-wide through sandboxing.
0-second compile time: Today, procedural macros are widely understood to be "bad for compile time". Citing work by @nnethercote and @lqd who have jointly been investigating and improving Rust compile time for years: "These crates incur a lot of compilation costs, being among the most popular crates and quite slow to compile. Also, they block compilation of proc macro crates that are themselves often slow to compile. This leads to long and slow dependency chains". Users opting in to sandboxing the macros they depend on (which, again, 99% of macros would be compatible with) would almost never need to compile syn
or quote
or proc-macro2
, which commonly appear on the critical path before useful parts of an application can begin to compile. 10-15 seconds (depending on syn features) is not an enormous benefit in CI, but is enough to be frustrating in interactive situations such as opening a project in an IDE.
Macro ecosystem rejuvination: In response to the previous motivation, compile time for crates like syn
and serde_derive
are begrudgingly tolerable to most people today, but this has come at the cost of brutal concessions to functionality over many years. A precompiled codepath for those sensitive to build time would open the doors to more complete and powerful macro libraries, for example greater ability to focus on fantastic diagnostics rather than scrutinizing whether a user-facing diagnostics improvement would be worth the build time. Syn-based diagnostics today are far from where they would be if build time were a less pressing concern.
Even faster incremental builds: While the "0-second compile time" applies to the macro dependency itself, which is relevant to clean builds and otherwise cached by Cargo, 2 other opportunities exist for this feature to benefit incremental compile times too, and interactive latency (IDE). First, precompiled macros being optimized builds, rather than unoptimized native builds. Even with a 50% overhead from a high-performance Wasm runtime compared to native code, complex macros like serde_derive
will still expand faster than they do today. Second, architectural advancements which are prohibitively expensive in an unoptimized macro, such as the "ultimate non-generic DeserializeSeed" concept for massively reducing monomorphization cost in Serde-generated code.
User choice: Design choices throughout the rest of this document are motivated by an understanding that many users and environments are uninterested in a precompiled artifact for legal or technological reasons, and satisfied with unisolated natively compiled macros. Both macro author and macro user must have opted-in for the sandboxed artifact to kick in; if not, the user will build the macro from source just like today. Furthermore, macro crates containing a precompiled implementation are 100% compatible with old versions of Cargo, with the embedded Wasm artifact simply being ignored and the library built from source.
Guide-level explanation
Guide-level explanation for macro authors:
The vast majority of procedural macros do nothing more than inspect input tokens and emit output tokens. No filesystem access, no subprocesses, no network I/O. For such macros, sandboxing their execution in a WebAssembly-based runtime offers a number of benefits (auditability, determinism, and 0-second compile time for users).
A crate author can precompile their procedural macro to an efficient, reproducible, sandboxable Wasm artifact as part of publishing it to crates.io by introducing the following field in Cargo.toml:
[package]
name = "serde_derive"
version = "..."
[lib]
proc-macro = true
sandboxed = true # <---
Now cargo publish
, which ordinarily would locally verify a build for the host platform before uploading, will instead (or additionally?) build for a target platform called wasm32-macro
. The resulting Wasm artifact is uploaded to crates.io in the same .crate file as the crate's source code.
There will be a delay before your new release becomes publicly available on crates.io. The server will repeat the build, confirming that it produces precisely the same Wasm binary that your computer did. So, no need to try any funny business.
Restrictions apply to macros that use sandboxed = true
:
-
Your macro must not have any (enabled) transitive dependency which is a procedural macro.
-
While your macro may have a build script or depend on crates that have a build script, those build scripts will only run if a user of your macro is rebuilding it from source. They will not run when you or crates.io builds your macro for the
wasm32-macro
target, nor when a user's build picks up the precompiled implementation. -
Your macro must build using the latest stable release of Rust. This is the version that you must publish with. This is the version that crates.io will verify your build with.
-
As a consequence of the previous, your macro must not use any unstable APIs within the
proc_macro
crate. -
Publish's
--no-verify
setting is incompatible with precompilation. -
While usage of sandbox-incompatible standard library APIs such as
std::fs
orstd::thread
will not cause your macro to fail to build, such APIs unconditionally return an error or panic during macro expansion. All you get to do in a precompiled macro is inspect tokens and produce tokens/diagnostics. -
Only one single precompiled build is produced. As part of publish, you control what features of your crate are enabled in that build, using publish's
--features
and--all-features
and--no-default-features
flags. Users who depend on any other feature set will not get the precompiled build. (In the future, we may allow specifying a collection of feature sets to precompile for.) -
Unwinding does not happen. To keep code size in check for precompiled artifacts, builds are equivalent to
-Zbuild-std=panic_abort -Zbuild-std-features=panic_immediate_abort
. Proc macros are never supposed to panic in the first place. If you observe panics, a generic error message will be shown to the user, instructing them that they can temporarily disable the proc macro sandbox and rebuild from source to get the message and line number at which the macro panicked, to report a bug.
Guide-level explanation for macro users:
By default, procedural macros you depend on are always built from source. Be aware that macros involve running arbitrary Rust code on your computer at compile time, so there is a large degree of trust in depending on a macro provided by somebody else. If you are sensitive to such issues, you may wish to dedicate extra scrutiny to auditing your proc macro dependencies each time they're added or upgraded.
Many procedural macros publish both source code and a sandboxable precompiled build to crates.io. Sandboxing mitigates the "arbitrary code" aspect of macros, making them easier to audit and trust. Sandboxed macros are limited to doing nothing more than inspecting tokens and producing tokens.
They also take 0 seconds to compile, since the compilation has already been done securely by crates.io.
To opt-in to using these sandboxed macros in your Cargo builds when one is provided by the macro's author, run rustup component add proc-macro-sandbox
. To stop using precompiled macros and go back to building all macro dependencies from source without isolation, run rustup component remove proc-macro-sandbox
.
Reference-level explanation
Reference-level explanation for compiler:
A new Tier 2 target platform is introduced, wasm32-macro
. Almost everything about it is similar to wasm32-unknown-unknown
, with the following exceptions:
-
Bin crates, including tests, cannot be built for this target.
-
A build of libproc_macro is available for this target.
When one builds a --crate-type=proc-macro --emit=link --target=wasm32-macro
, the output is a .wasm artifact. The API exported by the Wasm artifact is closely related to the API exported by the natively compiled .so
of a procedural macro.
Instead of the current --extern serde_derive=path/to/libserde_derive.so
, rustc can be passed --extern serde_derive=path/to/libserde_derive.wasm
. If any Wasm macro is passed to a rustc invocation, it must also be passed -Zproc-macro-sandbox=
containing the path to a suitable Wasm runtime, normally installed by the user as a rustup component. A particular proc-macro-sandbox is specific to a particular version of rustc, just like proc-macro-srv is.
Rather than dlopen
-ing the macro to dynamically link it into the rustc process and communicate with it over serialization through the proc macro bridge, rustc will:
-
Spawn a subprocess of the given proc-macro-sandbox executable (a single one reused throughout the duration of the rustc execution),
-
Perform IPC to load it with the .wasm artifact and a set of Host Functions performing the function of the proc macro bridge (the subset of it corresponding to stable APIs),
-
Expand macros by dispatching invocations to the subprocess.
The proc-macro-sandbox is developed alongside libproc_macro. It implements a high-performance Wasm runtime based on Wasmtime, although this remains an implementation detail.
Reference-level explanation for Cargo:
Cargo recognizes the [lib] sandboxed = true
setting described in the guide-level explanation, and enforces the applicable Cargo-related restrictions.
As part of cargo publish
on a sandboxed proc macro, Cargo first communicates with crates.io to query the current compiler version being used for server-side verification of precompiled artifacts. This will almost always be the most recent stable Rust, or briefly, the one before. If the rustc currently configured for Cargo to use is not that one, Cargo fails with an informative message.
Building the precompiled artifact during publish, and during server-side verification, consists of an invocation similar to:
cargo +stable build \
--release \
--target wasm32-macro \
-Z unstable-options \
-Z build-std=std,panic_abort \
-Z build-std-features=panic_immediate_abort
Rustc will produce a .wasm
output which Cargo must include in the packaged .crate
archive, inside of a directory called target/wasm32-macro
.
When building a proc macro as a dependency, Cargo determines whether the crate's author has opted-in to sandboxing based on the Cargo.toml, and determines whether the local user has opted in to sandboxing by looking for a proc-macro-sandbox Rustup component for the current toolchain (the same way that Cargo currently knows to find the current toolchain's rustfmt
for cargo fmt
, clippy-driver
for cargo clippy
, etc).
If both opt-ins are present, Cargo passes -Zproc-macro-sandbox=
to rustc, and --extern
containing the .wasm
artifact, rather than building the macro and its dependencies as .so
from source.
Reference-level explanation for crates.io:
Proc macros published to crates.io containing precompiled artifact do not immediately become available to users. There need not be any indication that the version has been published, except maybe to the crate's logged-in owners.
Asynchronously, a crates.io-managed service that is conceptually similar to docs.rs will fetch these newly published pending proc macro releases, and build them using the latest stable Rust toolchain.
As described in restrictions #1 and #2 in the guide-level explanation, this will not involve arbitrary code execution on the server-side, as proc macro dependencies are not allowed and build scripts do not run. This greatly reduces the surface area for attacking this service. The only binaries running are a stable rustc and a stable Cargo.
If the server-side build does not reproduce a .wasm artifact with the same exact content as the uploaded one, crates.io is notified and the release appears in a permanent yanked-like status. It can be downloaded for forensic analysis using the usual download endpoint, but is impossible to pull into builds, and cannot be unyanked.
If server-side verification successfully reproduces the uploaded .wasm, crates.io is notified and the release becomes instantly available to users (and docs.rs) just as an ordinary upload would be.
There is already some mechanism substantially similar to this in crates.io, because "Documentation" links only appear in the UI after the docs.rs build of that crate has succeeded. If a documentation build failed, crates.io does not show a Documentation link.
Reference-level explanation for rustup:
Rustup distributes proc-macro-sandbox
as a new component. Not part of the default profile.
Drawbacks
Though the user-facing surface area is small, this is undeniably a complex feature with involvement across multiple Rust subteams.
-
The current, arbitrary unsandboxed native code execution of procedural macros is a good enough model for most Rust users, even those who never audit the source code of their dependencies.
-
Macro compile times are not painfully bad for most people. (The Motivation section suggests why this may be a misleading impression and comes at significant expense to the ecosystem.)
-
Non-human-readable package contents are anathema. (This sentiment appears to be elevated in the crates.io ecosystem and is not present to the same extent in other places, like Python's wheels.)
-
"Someone else is always auditing the code and will save me from anything bad in a macro before it would ever run on my machines." (At one point serde_derive ran an untrusted binary for over 4 weeks across 12 releases before almost anyone became aware. This was plain-as-day code in the crate root; I am confident that professionally obfuscated malicious code would be undetected for years.)
-
Procedural macros, including their transitive dependency graphs, are normally easy and pleasant to audit.
-
Reproducible builds are hard, and will never work as envisioned, or will be onerous to maintain support for.
-
High-profile crate publishers like dtolnay probably won't ever get their crates.io credentials hacked by state-sponsored actors, and even if he does, eventually crates.io will be able to do 2FA publishes, which will block any threat to anybody.
-
If build scripts won't be sandboxed eventually, sandboxing proc macros is worthless. (Typical macros contain 100Ă— more code than typical build scripts.)
-
This will be hard to support in build systems other than Cargo. (They can literally do nothing and everything continues to work.)
Rationale and alternatives
Alternative: black-box vs white-box sandbox model
I think of what's proposed above as the black-box plan. A lot of mechanics are handled under the hood by Cargo and rustc. The interface surface area to users (both macro authors and consumers) is tiny.
I considered a white-box alternative. The only thing Rustup provides is a generic Wasm runtime with no specificity to macros. The whole thing is a single API to hand it your own Wasm blob and your own set of Host Functions. The only thing Cargo provides to crates is a cfg
for whether this Wasm runtime is installed. Literally nothing else from rustc or crates.io or cargo.
Crates are responsible for producing their own Wasm artifact by whatever means, using the existing wasm32-unknown-unknown
target most likely, adding conditional compilation in their procedural macro crate root to shuffle wasm into the provided runtime, defining their own host fns that plug everything together with the proc_macro
's API.
I think this loses a range of benefits to security and compile performance, but would be comparatively trivial to ship.
Alternative: pre- vs post-publish verification
In the RFC, I propose that the macro author runs cargo publish
, and it returns successfully without the published release appearing on crates.io right away. Server-side verification runs asynchronously, just like docs.rs today.
I know little about crates.io internals, but the following alternative might be easier to implement depending:
The author runs cargo publish
and it fails because the uploaded .crate has not already been verified server-side. The author runs some other command first. The server-side build occurs, and crates.io commits a checksum of the successfully verified .crate into a database. Only after this point may the author run cargo publish
. The same .crate is produced again locally (or was cached) and this time the publish can succeed.
Alternative: no server-side verification
What's the point? It's sandboxed! Trust the ecosystem to audit, as they do with sources.
Is server-side verification prohibitively expensive in CPU time (unlikely compared to docs.rs which processes vastly more crates and traffic) or maintenance cost (a lot more likely)?
Alternative: should any of the restrictions be lifted now or later?
How valuable is it really to deny build scripts and proc macro dependencies during server-side verification? Is it trivial to implement verification in a hardened way? Docs.rs and crater do arbitrary remote code execution, but their blast radius may be small compared to smuggling through a malicious wasm.
Alternative: Cargo.toml syntax
This RFC proposes:
[lib]
proc-macro = true
sandboxed = true # <---
A more compact spelling would be:
[lib]
proc-macro = "sandboxed"
with the downside of being instantly incompatible with any pre-existing stable Cargo.
Prior art
Compiler MCP: "Build-time execution sandboxing"
Watt. A 4-year-old working proof-of-concept that compiles procedural macros to WebAssembly and executes in either a from-source slow interpreter or an optimized Wasmtime runtime, complete enough to pass serde_derive
's test suite. Unfortunately we never got expansion performance good enough to ship officially in serde_derive for the following reasons:
-
Watt is not coupled with compiler version, so there end up being redundant IPCs: first between the Wasm code and the proc macro dylib, then between the dylib and rustc over proc macro bridge. This RFC proposes developing proc-macro-sandbox in concert with libproc_macro which enables Wasm to communicate directly with rustc over host fns, eliminating the function of the proc macro bridge that a native proc macro would use. I am confident that expansion performance will be on par with natively compiled debug-mode macros, or better.
-
The need for a build script to detect the presence of a suitable Wasmtime runtime on every macro call, and the need to compile all the macros dependencies unconditionally (syn) before Wasmtime detection applies, both eat away significantly from the benefit.
-
The developer experience cannot be nearly as seamless as proposed by this RFC. Watt involved juggling a
[patch]
of the proc-macro2 crate to swap out its libproc_macro-based implementation with Watt's stable Wasm-based one.
Unresolved questions
-
The optional feature situation. This RFC proposes precompiling exactly one feature combination, based on the feature flags passed to
cargo publish
; if someone's build uses the macro with some other choice of features enabled, they'll get a from-source build. Is this good enough? It is good enough for serde_derive.
Future possibilities
Future possibility: delete the whole thing!
If things go wrong, the Wasm spec becomes a disaster over time, stakeholders go bankrupt, Wasmtime goes unmaintained and no suitable replacement runtime emerges, ... what does Rust do?
This RFC has been designed to make this easy and inconsequential to deprecate and delete. cargo publish
would inform you to remove sandboxed = true
from your manifest, citing an explanatory blog post. Rustup would no longer ship the proc-macro-sandbox component and all macros would seamlessly resume building from source.
Future possibility: opt-in rejection of un-sandboxed macros
In the future, some users may want assurance that all macros in their dependency graph must be sandboxed.
Maybe there'd be a setting in ~/.cargo/config.toml
that tells Cargo to refuse to build any unsandboxed macro (with exemption for local path dependencies).
Maybe we'd need a configurable allowlist of vetted unsandboxed macros, with all others being rejected.