Pre-RFC: Sandboxed, deterministic, reproducible, efficient Wasm compilation of proc macros

dtolnay · August 20, 2023, 10:33pm

Feature Name: proc_macro_sandbox
Start Date: 2023-08-20

Summary

Procedural macro crates can opt-in to precompilation to WebAssembly as part of the publish process. The crate compiles locally (as already done by cargo publish) and then is securely verified by a crates.io-managed service, conceptually similar to docs.rs, which enforces that sources in the crate exactly reproduce the Wasm artifact before the new release becomes available to any package registry users. Users can opt-in to running procedural macros they depend on via Wasm sandbox by installing a suitable Wasm runtime as a Rustup component.

Motivation

Auditability: Rust's current only supported approach to procedural macros exposes anyone who uses them to a variety of unsavory possibilities: macros are arbitrary Rust code that can spawn processes and make network requests. So can build.rs, but the implementation of a typical procedural macro is 2 orders of magnitude larger than a typical build script. We reduce audit burden by sandboxing macros such that their only possible interaction with the outside world is inspecting tokens and producing tokens. An estimated 99% of macros are amenable to sandboxing; unsandboxed macros remain expressible exactly as before, but would invite great scrutiny, analogous to unsafe code during audit.

Determinism: Inadvertent proc macro nondeterminism is a source of pain in build systems. For example, Buck likes to race a local recompilation against a download from distributed cache, taking whichever finishes first. When artifacts from the same source code diverge, builds can fail or take longer than needed. In a recent instance, rustc nondeterminism was found to be the largest contributor to slow builds due to poor cache utilization in a codebase using Remote Execution and a distributed build cache in Meta. These bugs can be as simple as a proc macro using HashMap with randomized iteration order, and are impossible in a Wasm sandbox because there is no possible way to access OS randomness within the sandbox. Hash-based containers use pseudo-randomness with a hardcoded seed.

Isolation: For monorepo-scale build systems to work on billion-line codebases, perfect understanding of the inputs of every build step is critically important. A rustc invocation must not result in reading some arbitrary file from the local filesystem in a manner that is not tracked by the build system, for it will not know the crate needs to be rebuilt when that file changes. Workarounds exist but it is opt-in by the macro, rather than something that can be enforced codebase-wide through sandboxing.

0-second compile time: Today, procedural macros are widely understood to be "bad for compile time". Citing work by @nnethercote and @lqd who have jointly been investigating and improving Rust compile time for years: "These crates incur a lot of compilation costs, being among the most popular crates and quite slow to compile. Also, they block compilation of proc macro crates that are themselves often slow to compile. This leads to long and slow dependency chains". Users opting in to sandboxing the macros they depend on (which, again, 99% of macros would be compatible with) would almost never need to compile syn or quote or proc-macro2, which commonly appear on the critical path before useful parts of an application can begin to compile. 10-15 seconds (depending on syn features) is not an enormous benefit in CI, but is enough to be frustrating in interactive situations such as opening a project in an IDE.

Macro ecosystem rejuvination: In response to the previous motivation, compile time for crates like syn and serde_derive are begrudgingly tolerable to most people today, but this has come at the cost of brutal concessions to functionality over many years. A precompiled codepath for those sensitive to build time would open the doors to more complete and powerful macro libraries, for example greater ability to focus on fantastic diagnostics rather than scrutinizing whether a user-facing diagnostics improvement would be worth the build time. Syn-based diagnostics today are far from where they would be if build time were a less pressing concern.

Even faster incremental builds: While the "0-second compile time" applies to the macro dependency itself, which is relevant to clean builds and otherwise cached by Cargo, 2 other opportunities exist for this feature to benefit incremental compile times too, and interactive latency (IDE). First, precompiled macros being optimized builds, rather than unoptimized native builds. Even with a 50% overhead from a high-performance Wasm runtime compared to native code, complex macros like serde_derive will still expand faster than they do today. Second, architectural advancements which are prohibitively expensive in an unoptimized macro, such as the "ultimate non-generic DeserializeSeed" concept for massively reducing monomorphization cost in Serde-generated code.

User choice: Design choices throughout the rest of this document are motivated by an understanding that many users and environments are uninterested in a precompiled artifact for legal or technological reasons, and satisfied with unisolated natively compiled macros. Both macro author and macro user must have opted-in for the sandboxed artifact to kick in; if not, the user will build the macro from source just like today. Furthermore, macro crates containing a precompiled implementation are 100% compatible with old versions of Cargo, with the embedded Wasm artifact simply being ignored and the library built from source.

Guide-level explanation

Guide-level explanation for macro authors:

The vast majority of procedural macros do nothing more than inspect input tokens and emit output tokens. No filesystem access, no subprocesses, no network I/O. For such macros, sandboxing their execution in a WebAssembly-based runtime offers a number of benefits (auditability, determinism, and 0-second compile time for users).

A crate author can precompile their procedural macro to an efficient, reproducible, sandboxable Wasm artifact as part of publishing it to crates.io by introducing the following field in Cargo.toml:

[package]
name = "serde_derive"
version = "..."

[lib]
proc-macro = true
sandboxed = true    # <---

Now cargo publish, which ordinarily would locally verify a build for the host platform before uploading, will instead (or additionally?) build for a target platform called wasm32-macro. The resulting Wasm artifact is uploaded to crates.io in the same .crate file as the crate's source code.

There will be a delay before your new release becomes publicly available on crates.io. The server will repeat the build, confirming that it produces precisely the same Wasm binary that your computer did. So, no need to try any funny business.

Restrictions apply to macros that use sandboxed = true:

Your macro must not have any (enabled) transitive dependency which is a procedural macro.
While your macro may have a build script or depend on crates that have a build script, those build scripts will only run if a user of your macro is rebuilding it from source. They will not run when you or crates.io builds your macro for the wasm32-macro target, nor when a user's build picks up the precompiled implementation.
Your macro must build using the latest stable release of Rust. This is the version that you must publish with. This is the version that crates.io will verify your build with.
As a consequence of the previous, your macro must not use any unstable APIs within the proc_macro crate.
Publish's --no-verify setting is incompatible with precompilation.
While usage of sandbox-incompatible standard library APIs such as std::fs or std::thread will not cause your macro to fail to build, such APIs unconditionally return an error or panic during macro expansion. All you get to do in a precompiled macro is inspect tokens and produce tokens/diagnostics.
Only one single precompiled build is produced. As part of publish, you control what features of your crate are enabled in that build, using publish's --features and --all-features and --no-default-features flags. Users who depend on any other feature set will not get the precompiled build. (In the future, we may allow specifying a collection of feature sets to precompile for.)
Unwinding does not happen. To keep code size in check for precompiled artifacts, builds are equivalent to -Zbuild-std=panic_abort -Zbuild-std-features=panic_immediate_abort. Proc macros are never supposed to panic in the first place. If you observe panics, a generic error message will be shown to the user, instructing them that they can temporarily disable the proc macro sandbox and rebuild from source to get the message and line number at which the macro panicked, to report a bug.

Guide-level explanation for macro users:

By default, procedural macros you depend on are always built from source. Be aware that macros involve running arbitrary Rust code on your computer at compile time, so there is a large degree of trust in depending on a macro provided by somebody else. If you are sensitive to such issues, you may wish to dedicate extra scrutiny to auditing your proc macro dependencies each time they're added or upgraded.

Many procedural macros publish both source code and a sandboxable precompiled build to crates.io. Sandboxing mitigates the "arbitrary code" aspect of macros, making them easier to audit and trust. Sandboxed macros are limited to doing nothing more than inspecting tokens and producing tokens.

They also take 0 seconds to compile, since the compilation has already been done securely by crates.io.

To opt-in to using these sandboxed macros in your Cargo builds when one is provided by the macro's author, run rustup component add proc-macro-sandbox. To stop using precompiled macros and go back to building all macro dependencies from source without isolation, run rustup component remove proc-macro-sandbox.

Reference-level explanation

Reference-level explanation for compiler:

A new Tier 2 target platform is introduced, wasm32-macro. Almost everything about it is similar to wasm32-unknown-unknown, with the following exceptions:

Bin crates, including tests, cannot be built for this target.
A build of libproc_macro is available for this target.

When one builds a --crate-type=proc-macro --emit=link --target=wasm32-macro, the output is a .wasm artifact. The API exported by the Wasm artifact is closely related to the API exported by the natively compiled .so of a procedural macro.

Instead of the current --extern serde_derive=path/to/libserde_derive.so, rustc can be passed --extern serde_derive=path/to/libserde_derive.wasm. If any Wasm macro is passed to a rustc invocation, it must also be passed -Zproc-macro-sandbox= containing the path to a suitable Wasm runtime, normally installed by the user as a rustup component. A particular proc-macro-sandbox is specific to a particular version of rustc, just like proc-macro-srv is.

Rather than dlopen-ing the macro to dynamically link it into the rustc process and communicate with it over serialization through the proc macro bridge, rustc will:

Spawn a subprocess of the given proc-macro-sandbox executable (a single one reused throughout the duration of the rustc execution),
Perform IPC to load it with the .wasm artifact and a set of Host Functions performing the function of the proc macro bridge (the subset of it corresponding to stable APIs),
Expand macros by dispatching invocations to the subprocess.

The proc-macro-sandbox is developed alongside libproc_macro. It implements a high-performance Wasm runtime based on Wasmtime, although this remains an implementation detail.

Reference-level explanation for Cargo:

Cargo recognizes the [lib] sandboxed = true setting described in the guide-level explanation, and enforces the applicable Cargo-related restrictions.

As part of cargo publish on a sandboxed proc macro, Cargo first communicates with crates.io to query the current compiler version being used for server-side verification of precompiled artifacts. This will almost always be the most recent stable Rust, or briefly, the one before. If the rustc currently configured for Cargo to use is not that one, Cargo fails with an informative message.

Building the precompiled artifact during publish, and during server-side verification, consists of an invocation similar to:

cargo +stable build \
    --release \
    --target wasm32-macro \
    -Z unstable-options \
    -Z build-std=std,panic_abort \
    -Z build-std-features=panic_immediate_abort

Rustc will produce a .wasm output which Cargo must include in the packaged .crate archive, inside of a directory called target/wasm32-macro.

When building a proc macro as a dependency, Cargo determines whether the crate's author has opted-in to sandboxing based on the Cargo.toml, and determines whether the local user has opted in to sandboxing by looking for a proc-macro-sandbox Rustup component for the current toolchain (the same way that Cargo currently knows to find the current toolchain's rustfmt for cargo fmt, clippy-driver for cargo clippy, etc).

If both opt-ins are present, Cargo passes -Zproc-macro-sandbox= to rustc, and --extern containing the .wasm artifact, rather than building the macro and its dependencies as .so from source.

Reference-level explanation for crates.io:

Proc macros published to crates.io containing precompiled artifact do not immediately become available to users. There need not be any indication that the version has been published, except maybe to the crate's logged-in owners.

Asynchronously, a crates.io-managed service that is conceptually similar to docs.rs will fetch these newly published pending proc macro releases, and build them using the latest stable Rust toolchain.

As described in restrictions #1 and #2 in the guide-level explanation, this will not involve arbitrary code execution on the server-side, as proc macro dependencies are not allowed and build scripts do not run. This greatly reduces the surface area for attacking this service. The only binaries running are a stable rustc and a stable Cargo.

If the server-side build does not reproduce a .wasm artifact with the same exact content as the uploaded one, crates.io is notified and the release appears in a permanent yanked-like status. It can be downloaded for forensic analysis using the usual download endpoint, but is impossible to pull into builds, and cannot be unyanked.

If server-side verification successfully reproduces the uploaded .wasm, crates.io is notified and the release becomes instantly available to users (and docs.rs) just as an ordinary upload would be.

There is already some mechanism substantially similar to this in crates.io, because "Documentation" links only appear in the UI after the docs.rs build of that crate has succeeded. If a documentation build failed, crates.io does not show a Documentation link.

Reference-level explanation for rustup:

Rustup distributes proc-macro-sandbox as a new component. Not part of the default profile.

Drawbacks

Though the user-facing surface area is small, this is undeniably a complex feature with involvement across multiple Rust subteams.

The current, arbitrary unsandboxed native code execution of procedural macros is a good enough model for most Rust users, even those who never audit the source code of their dependencies.
Macro compile times are not painfully bad for most people. (The Motivation section suggests why this may be a misleading impression and comes at significant expense to the ecosystem.)
Non-human-readable package contents are anathema. (This sentiment appears to be elevated in the crates.io ecosystem and is not present to the same extent in other places, like Python's wheels.)
"Someone else is always auditing the code and will save me from anything bad in a macro before it would ever run on my machines." (At one point serde_derive ran an untrusted binary for over 4 weeks across 12 releases before almost anyone became aware. This was plain-as-day code in the crate root; I am confident that professionally obfuscated malicious code would be undetected for years.)
Procedural macros, including their transitive dependency graphs, are normally easy and pleasant to audit.
Reproducible builds are hard, and will never work as envisioned, or will be onerous to maintain support for.
High-profile crate publishers like dtolnay probably won't ever get their crates.io credentials hacked by state-sponsored actors, and even if he does, eventually crates.io will be able to do 2FA publishes, which will block any threat to anybody.
If build scripts won't be sandboxed eventually, sandboxing proc macros is worthless. (Typical macros contain 100× more code than typical build scripts.)
This will be hard to support in build systems other than Cargo. (They can literally do nothing and everything continues to work.)

Rationale and alternatives

Alternative: black-box vs white-box sandbox model

I think of what's proposed above as the black-box plan. A lot of mechanics are handled under the hood by Cargo and rustc. The interface surface area to users (both macro authors and consumers) is tiny.

I considered a white-box alternative. The only thing Rustup provides is a generic Wasm runtime with no specificity to macros. The whole thing is a single API to hand it your own Wasm blob and your own set of Host Functions. The only thing Cargo provides to crates is a cfg for whether this Wasm runtime is installed. Literally nothing else from rustc or crates.io or cargo.

Crates are responsible for producing their own Wasm artifact by whatever means, using the existing wasm32-unknown-unknown target most likely, adding conditional compilation in their procedural macro crate root to shuffle wasm into the provided runtime, defining their own host fns that plug everything together with the proc_macro's API.

I think this loses a range of benefits to security and compile performance, but would be comparatively trivial to ship.

Alternative: pre- vs post-publish verification

In the RFC, I propose that the macro author runs cargo publish, and it returns successfully without the published release appearing on crates.io right away. Server-side verification runs asynchronously, just like docs.rs today.

I know little about crates.io internals, but the following alternative might be easier to implement depending:

The author runs cargo publish and it fails because the uploaded .crate has not already been verified server-side. The author runs some other command first. The server-side build occurs, and crates.io commits a checksum of the successfully verified .crate into a database. Only after this point may the author run cargo publish. The same .crate is produced again locally (or was cached) and this time the publish can succeed.

Alternative: no server-side verification

What's the point? It's sandboxed! Trust the ecosystem to audit, as they do with sources.

Is server-side verification prohibitively expensive in CPU time (unlikely compared to docs.rs which processes vastly more crates and traffic) or maintenance cost (a lot more likely)?

Alternative: should any of the restrictions be lifted now or later?

How valuable is it really to deny build scripts and proc macro dependencies during server-side verification? Is it trivial to implement verification in a hardened way? Docs.rs and crater do arbitrary remote code execution, but their blast radius may be small compared to smuggling through a malicious wasm.

Alternative: Cargo.toml syntax

This RFC proposes:

[lib]
proc-macro = true
sandboxed = true  # <---

A more compact spelling would be:

[lib]
proc-macro = "sandboxed"

with the downside of being instantly incompatible with any pre-existing stable Cargo.

Prior art

Compiler MCP: "Build-time execution sandboxing"

Watt. A 4-year-old working proof-of-concept that compiles procedural macros to WebAssembly and executes in either a from-source slow interpreter or an optimized Wasmtime runtime, complete enough to pass serde_derive's test suite. Unfortunately we never got expansion performance good enough to ship officially in serde_derive for the following reasons:

Watt is not coupled with compiler version, so there end up being redundant IPCs: first between the Wasm code and the proc macro dylib, then between the dylib and rustc over proc macro bridge. This RFC proposes developing proc-macro-sandbox in concert with libproc_macro which enables Wasm to communicate directly with rustc over host fns, eliminating the function of the proc macro bridge that a native proc macro would use. I am confident that expansion performance will be on par with natively compiled debug-mode macros, or better.
The need for a build script to detect the presence of a suitable Wasmtime runtime on every macro call, and the need to compile all the macros dependencies unconditionally (syn) before Wasmtime detection applies, both eat away significantly from the benefit.
The developer experience cannot be nearly as seamless as proposed by this RFC. Watt involved juggling a [patch] of the proc-macro2 crate to swap out its libproc_macro-based implementation with Watt's stable Wasm-based one.

Unresolved questions

The optional feature situation. This RFC proposes precompiling exactly one feature combination, based on the feature flags passed to cargo publish; if someone's build uses the macro with some other choice of features enabled, they'll get a from-source build. Is this good enough? It is good enough for serde_derive.

Future possibilities

Future possibility: delete the whole thing!

If things go wrong, the Wasm spec becomes a disaster over time, stakeholders go bankrupt, Wasmtime goes unmaintained and no suitable replacement runtime emerges, ... what does Rust do?

This RFC has been designed to make this easy and inconsequential to deprecate and delete. cargo publish would inform you to remove sandboxed = true from your manifest, citing an explanatory blog post. Rustup would no longer ship the proc-macro-sandbox component and all macros would seamlessly resume building from source.

Future possibility: opt-in rejection of un-sandboxed macros

In the future, some users may want assurance that all macros in their dependency graph must be sandboxed.

Maybe there'd be a setting in ~/.cargo/config.toml that tells Cargo to refuse to build any unsandboxed macro (with exemption for local path dependencies).

Maybe we'd need a configurable allowlist of vetted unsandboxed macros, with all others being rejected.

matklad · August 20, 2023, 11:27pm

I still remeber how we spend weeks debugging mysterious crashes in rust-analyzer, which were heroically tracked down by edwin0cheng (Edwin Cheng) · GitHub to be a result of non-deterministic hash-map-iterating proc-macro breaking salsa's assumptions. So, very much in favor!

Two specific comments:

Only one single precompiled build is produced. As part of publish, you control what features of your crate are enabled in that build, using publish's --features and --all-features and --no-default-features flags.

I think a more logical solution is to say that --all-features is used, period. This fits with Cargo's "additive features" model. We could further specify that this --all-features WebAssembly build can be used regardless of the set of actual features specified by the downstream user. I don't know if that's workable and if that's a good idea, but it certainly fits with the originally intended semantics of features.

One reason why it might not be workable is that the sandboxed and non-sandboxed builds could be different, but that's already the case, because proc macro can cfg() and just behave differently in wasm.

Sandboxing mitigates the "arbitrary code" aspect of macros, making them easier to audit and trust.

I think this might want to be a bit more nuanced. Even sandboxed proc macros can inject arbitrary code into user's projects. This is different from "normal" sandboxing security. So, auditing proc-macros is still required. What we guarantee is that the WASM blob is built by the official compiler from the source code uploaded to the crates.io. so its enough to validate only source code. And, frankly, the ecosystem could use a bit more auditing, as the recent large scale study very valuably demonstrates

EDIT:

Also, I really love how this serves as a forcing function to have bit-for-bit reproducible builds for .wasm.

jhpratt · August 21, 2023, 12:53am

Is this to avoid issues with regard to nightly features, similar to how vendor prefixes on the web were unintentionally and de facto standardized in the past?

Am I correct in interpreting that this means they will get the macro built from source? How is a user to know which feature set it is precompiled for?

Also — why can't cfg! be made evaluated at runtime (by say deferring to a function), which would still allow behavior to change as necessary. That behavior changing could include emitting compile_error!, which would effectively be the same as gating the macro.

For time-macros, I use two feature gates (controlled entirely by the re-exporting crate). One of these simply enables/disables a macro, while the other controls whether certain inputs are valid or not.

I'd vastly prefer there be a way to control this at the project or even dependency-level, rather than globally. I have a ton of Rust projects on my computer, but would not want this enabled for all of them.

Is it necessary to guarantee anything here? I see no immediately reason proc macros couldn't be expanded in parallel.

This is really unfortunate. Why can't the release be rejected outright? This would effectively pollute the version numbering (admittedly that is a small issue).

Respectfully, it's almost certainly best to leave out any mentions of serde_derive. It's going to draw a lot of (further) negative attention for little gain.

More generally, I find the layout of the drawbacks section very confusing. What purpose do the parentheticals serve? A rebuttal? It's not clear to me — a native English speaker.

It's worth noting that the sandboxing doesn't necessarily have to be through a specific runtime, or even wasm at all.

workingjubilee · August 21, 2023, 1:41am

"Someone else is always auditing the code and will save me from anything bad in a macro before it would ever run on my machines." (At one point serde_derive ran an untrusted binary for over 4 weeks across 12 releases before almost anyone became aware. This was plain-as-day code in the crate root; I am confident that professionally obfuscated malicious code would be undetected for years.)

The first comment on this was over 3 weeks ago. People had indeed noticed. The onslaught of remarks followed your rejection of their pleas to reconsider, which was regarded as insulting by some. I detect a hint of mockery in this comment, which is unwarranted as it is based fundamentally on a lie. You should not misinterpret community outrage with your refusal as synonymous with people "noticing".

kayabaNerve · August 21, 2023, 2:00am

This is extremely misleading IMO. Multiple issues were opened weeks ago about your usage of binaries. It was solely the community-at-large unaware. If there was a reason to believe the binaries directly malicious, I personally believe the community-at-large would've been made aware with it much sooner.

According to pinkforest, you also did trip cackle, a tool for automatically checking if crates exceed their claimed scope.

If crates.io wants to precompile macros in a reproducible environment, and offer them as an opt-in feature, I'd support it despite my complete and continued objections to what happened with serde_derive on a professional level and the maintainer's actions on a personal level.

My sole notable objection to this RFC/pre-RFC has nothing to do with its content, yet rather the process of introducing what's widely considered a security issue into the ecosystem to then further justify changes to the toolchain, holding the security concerns over the ecosystem (RFC commentators, project members, implementers) in the process. It's effectively impossible to fairly review this on its merit now, nor to say it isn't being reviewed on an accelerated time span than it would otherwise have been.

For actual RFC feedback, I'd like to object to

I believe this should be an error, not a yanked publication. There are several ways non-reproducible builds can be triggered. We shouldn't waste version numbers finding out a reproducible build isn't working when there's no benefit to existing with a yanked status unless there's some intricacy to the crates.io backend I'm unaware of. I'll admit inexperience with it. jhpratt seems to have raised the same comment.

Then as a question, I'd like to ask how you plan to achieve reproducible wasm builds. From my understanding, depending on the platform built from, different wasm outputs will be created. Is there a proposed mechanism other than always building from x86_64 (requiring CPU emulation, and not just containerization)?

I'd, personally, insist crates.io does verification (not that my personal insistence means anything) in order to ensure publishers don't each setup their own build processes each needing their own replication. I'd also like to note the value in crates.io rebuilding an uploaded artifact (not just performing the build) to ensure reproducible builds are possible (without multiple server-side runs).

As for watt, I do not believe it resolved reproducible builds from different host architectures.

alice-i-cecile · August 21, 2023, 2:06am

Both macro author and macro user must have opted-in for the sandboxed artifact to kick in

The motivation for macro users to have to opt-in is well-explained and clear, but after reading the existing proposal I don't fully understand why macro authors would choose not to (perhaps security reasons for exceptionally sensitive crates?). Obviously not all proc macros can be sandboxed, but this seems like something that can be automatically tried. Swapping to an opt-out mechanism for authors (not users!) seems like a sensible choice to me.

ekuber · August 21, 2023, 2:07am

dtolnay:

Asynchronously, a crates.io-managed service that is conceptually similar to docs.rs will fetch these newly published pending proc macro releases, and build them using the latest stable Rust toolchain.

As described in restrictions #1 and #2 in the guide-level explanation, this will not involve arbitrary code execution on the server-side, as proc macro dependencies are not allowed and build scripts do not run. This greatly reduces the surface area for attacking this service. The only binaries running are a stable rustc and a stable Cargo.

If the server-side build does not reproduce a .wasm artifact with the same exact content as the uploaded one, crates.io is notified and the release appears in a permanent yanked-like status. It can be downloaded for forensic analysis using the usual download endpoint, but is impossible to pull into builds, and cannot be unyanked.

If server-side verification successfully reproduces the uploaded .wasm, crates.io is notified and the release becomes instantly available to users (and docs.rs) just as an ordinary upload would be.

What's the benefit of having the crate author provide a wasm file if crates.io will have to rebuild them anyways? Why not publish the crates.io generated binary?

crates.io has the entire reverse dependency graph for crates, including the features they set for their direct dependencies. Providing pre-compilation only for the top N crates/feature-set would make the compute cost bounded while maximizing the effect of "reduction in compile time of 'Can't Believe It's Not Std!' crates" while making it so crate authors don't have to do anything special (beyond meeting the restrictions in the reference-level explanation for compiler).

kayabaNerve · August 21, 2023, 2:13am

In order for crates.io to practically ensure they're reproducible, it'd have to do multiple runs or have a run provided by another user (the publisher). While the other user could cheat, they can't cheat with security impact, only w.r.t. reproducibility (by forcing a match with crates.io they couldn't actually achieve). The amount of users who'd modify cargo to do this is negligible.

ekuber · August 21, 2023, 2:16am

Multiple compilations (from different environments) merely doubles the cost of this check, it doesn't materially change the costs of having crates.io having to build. If almost every proc-macro crate would opt-in automatically, that would most likely be an even bigger set than the one that crates.io would select.

kpreid · August 21, 2023, 2:35am

Verified identical builds as part of crates.io's duties sound possibly a bit difficult, but I would really like to see sandboxed proc-macros even if they aren't precompiled and no performance benefits result.

It would reduce “opening a Rust project in an IDE can execute malware” to “opening a Rust project can execute malware if there's an exploitable bug in the compiler or if it includes a build script”. (Sandboxed build scripts are also desirable, but a much harder problem. Small steps forward.)

kayabaNerve · August 21, 2023, 2:36am

You can't check an item is reproducible if you only produce it once. You need multiple complications (at least 2). This has crates.io perform one and the publisher perform one.

bb010g · August 21, 2023, 2:38am

Building with a sandboxed, wasm32-wasi or wasm64-wasi stable release of rustc & Cargo & friends would likely be ideal.

steffahn · August 21, 2023, 4:13am

I haven't understood the story of rust versions from this RFC. Is the binary future-compatible with later Rust versions? Does this imply some sort of stable (WASM-) ABI for proc macros? Or will older crates eventually (silently?) fall back to from-source compilation? Surely crates.io won't go ahead and the-compile all existing macros for all version. I feel like this should be explained explicitly in the RFC.

Also I don't like the failure conditions around using "the latest stable Rust version" around times of new Rust releases. Rust releases are often, we don't want to break people's publishes for a few days every 6 weeks because either they or crates.io haven't updated yet (and the other one has) or because their crate waited in the queue post an update. Also what if the latest stable Rust release has some regression that wasn't noticed? Now suddenly it may have become impossible to properly publish your proc macro crate until that's fixed (unless you work around the regression). So: How about supporting the latest 2 stable versions instead?

jrose · August 21, 2023, 4:13am

This is very neat and I’ll let people with better perspective on the necessary issues than me speak to it…except for one editorial note:

There are “panics that terminate the app” and “panics that unwind for a bit and then get caught”. I’ve never heard that the latter aren’t supposed to be used in proc-macros. I think it’s fine to say “it’s a limitation of the sandbox that there’s no unwinding”, but, well, I don’t think we know who’s using catch_unwind in a macro, and I don’t think I would categorically say it’s wrong to do so.

jmjoy · August 21, 2023, 6:03am

I think the root of the problem is that build.rs affects compile time and is not safe.

The wasm sandbox can solve compile time and security issues, but the source code is not visible.

If rust can be used as a sandboxed interpreter, and build.rs can be run as a script in the sandbox, it may perfectly solve all the above problems, but the cost is that the workload is huge.

bjorn3 · August 21, 2023, 7:01am

For many proc macros it is the way that errors are reported.

Please also make the extern "C" ABI match the official C ABI for wasm like for wasm32-unknown-emscripten and wasm32-wasi rather than the weird ABI of wasm32-unknkwn-unknown.

Any reason to not embed the wasm runtime in rustc itself?

What about all the targets without wasmtime support?

Could the wasm file contain the exact rustc version it was built with as well as the Cargo.lock to allow reproduction by third parties?

proc_macro2 requires a build script, right? Wouldn't that make it impossible to use pretty much any proc macro with this?

edward · August 21, 2023, 7:57am

From what I understand reading both reference-level sections for authors and crates.io, I think what's being suggested is that publishing a sandboxed proc macro stages it for verification by the async service, and only when it gets reproduced does a new version get committed and published (using version control concepts). On failure then does an artifact download link becomes available to only the author for debugging purposes?

Or is the intent here that if reproducibility failed, it publishes regardless, perhaps with a "Reproducibility failed" marker?

I think the former is a reasonable approach, avoids versioning pollution, and allows crates.io to limit the time these debugging artifacts can exist for (and thus keep costs down). It might be better to mention that publish is really a stage command in this case.

The latter interpretation I feel has a couple unresolved issues:

Failed reproducibility artifacts has less value over time. If we assume the ideal scenario that the majority of users will opt-in to reproducible versions only and that non-reproduction is a bug, then these artifacts are likely only useful for the author to assist in resolving failures. I agree with the sentiment that in this failure mode it's better to reject the package all together.
If there's a bug in the service where it begins to reject certain or all builds, then we're going to see a wide spread release of versions where reproducibility fails. This adds noise to any potential metrics while likely going to cause confusion/scare it brings to any less informed user or community. A failed publish instead would only notify the author, which would lead to a much smaller publicity blast zone.

Why not panic on build? I expect panic-on-build to be both better for the iterative development experience and remove ambiguity on if a function is permitted or not.

As maybe an interesting idea, would it be possible to add feature flags to core, or split core into even smaller? It sounds like the best solution here would be to have a subset of the core API, e.g. proc-macro-core.

Might be out of scope, but how is the signalling going to be for this feature? Once implemented there's going to be tension between promoting sandboxed proc macros as strictly better than non-sandboxed macros while also conveying that there might be legitimate reasons to not be sandboxed. In other words, is this RFC also asserting a stance that there is no legitimate reason to not sandbox?

bjorn3 · August 21, 2023, 8:28am

You mean splitting std? Splitting is impossible in many cases due to coherency preventing existing trait impls. I don't think anything in libstd other than the default allocator makes sense to be used in samdboxed proc macros. File reading should be done through a new proc macro specific api to allow rustc to declare to cargo that it needs to rebuild when the file changes. Intercepting std::fs wouldn't work when doing a non-sandboxed build.

thomaseizinger · August 21, 2023, 9:11am

The above proposal requires opt-in from both parties, the author and the user. Why do we need the opt-in from the author? As an analogy, I can attempt to compile a crate for a certain platform, without the author opting in. Can we apply the same concept to this and make it entirely a user-decision whether the macro is pre-compiled?

The pre-compilation could be done by a remote service that is invoked by crates.io. Until that remote service commits back the WASM artifact, users will only download the source and compile it locally.

The remote service would compile the crate (at least twice to ensure reproducibility) and notify crates.io of the result. From then on, crates.io can embed or link to the WASM artifact.

If it is the author's explicit intention to make the macro WASM compatible, they should have a CI job that verifies that, similar to compatibility with any other target.

This would reduce the surface area of this feature further. It might also be easier to experiment with because it could be built as a custom registry and a nightly flag in cargo to use the new WASM blobs.

dlight · August 21, 2023, 9:19am

Here's my two cents.

Rustup distributes precompiled stdlib

Why can't rustup also distribute precompiled syn (for example?) The infrastructure to do so is already in place.

Topic		Replies	Views
Pre-RFC: procmacros implemented in wasm compiler	98	10428	January 21, 2020
Deterministic isolated proc-macros	27	2070	September 5, 2024
Pre-expanding proc macros tools and infrastructure	29	3859	February 9, 2023
Caching of Proc-Macro expansions libs	45	2037	September 25, 2024
Sandbox build.rs and proc macros	24	3606	July 30, 2022