Pre-RFC: procmacros implemented in wasm

I'm primarily concerned with build determinism; any security benefits are secondary, but I think worth mentioning.

Well, that's a whole other concern - how can one tell whether a crate is (and remains) trustworthy?

1 Like

I would not want that either. Proc-macros emitting things in Hash(Map|Set) order is a source of non-determinism I'd want to squash. (I'm not concerned about hash collision attacks in proc-macros.)

Similarly, I wouldn't want any access to any kind of clock.

Access to things like compiler version, platform, cfg flags, etc wouldn't affect determinism, but they'd undermine being able to generate a single proc-macro wasm and distribute it via crates.io, so I'd probably want to disallow those as well.

3 Likes

Oh, yes, then indeed it's a completely different question.

If you're cross-compiling to WASM or CloudABI, you can have that, can't you? In which case build scripts and proc macros might actually be the biggest hole.

It would be nice, though, to be able to audit code by opening it in your normal IDE that wants to run proc macros to hook up reliable code navigation, to be able to build the documentation that needs to generate the proc macros, etc.

3 Likes

I don't think it is security theater to exclude classes of attacks. Going from 3 wide open gates the attacker can go through to 1 is still helpful. It means more eyes can focus on that gate. And it helps for situations, as mentioned by others above, where the the "still open" gates don't come into play -- like when you need to only run macro expansion but not the resulting binary.

Rust's type system guarantees can be easily circumvented in safe code by launching gdb with std::process, or by accessing /proc/self/mem. Should we thus stop fixing other soundness holes? No, we should not. By restricting attackers' options, we make it easier to find them.

So, to me it does indeed look like you are advocating "imperfect security is security theater", aka "perfect security or none at all".

Miri's isolated mode (currently the default, eventually probably not default any more but behind a flag) is 100% reproducible, including things like ptr-int-casts. No true randomness, no clock, nothing.

10 Likes

In this case it think a better way to view this is Raymon Chen's "airtight hatchway" analogy:

The Windows security team gets reports such as "if I run as Administrator, and modify my system, I can pwn myself" or "if I drop a malicious DLL on my system I can make it execute it in a very elaborate way", and he patiently explains that these aren't holes to be fixed, and trying to address them isn't a reduction in the attack surface — the attacker is already on the other side of the "airtight hatchway".

4 Likes

You analogy is flawed.

  • The point of Administrator is to do any arbitrary thing.
  • The point of running a program you compiled is to do some arbitrary things.
  • The point of running a proc-macro is to compile a program.

A proc-macro isn't meant to do arbitrary things, so it shouldn't be allowed to do arbitrary things. Only things necessary to compile a program.

4 Likes

The whole point is that the complicated steps in between, and their stated purpose, ultimately doesn't matter. It all simplifies to "execution of arbitrary code leads to execution of arbitrary code". You can make proc macros jump through hoops, but you can't stop them from producing malware.

2 Likes

That is where we disagree. There aren't three wide open gates: this is the same one hole, not being closed by a "partial" "mitigation" – namely, foreign code that someone trusts and executes. The syntax by which it was defined and invoked is completely irrelevant.

No, but if you don't understand my point, please do not twist my words. I'm done arguing about this because it's getting pointless.

I think we can agree on that at least.

3 Likes

A sandboxed macro could infect the target binary, but it won't corrupt the machine where the compiler runs, which can be different from the one where the resulting binary is being executed. The sandbox thus closes that one door.

Where limited file IO for macros is important, an explicit permission system could be devised for those crates (defaulting to no permission).

8 Likes

I would kind of hate to see anything fancy, rather just have the compiler be able to open files, and the wasm inherit already opened file descriptors without the ability to open any new ones.

This would make proc-macros limited to apriori known files, and subject to the max file descriptor limits, but avoids any attempts at a fancy mechanism to abstract over the various host security models.

1 Like

This is the kind of thing we're designing WASI API for, using capability-based security. If stdin/stdout/stderr are enough, WASI can do it today -- applications can read/write on those streams, but they can't open anything else unless you give them additional capabilities to base an open on.

WASI also has a mechanism to inherit additional file descriptors too. At the moment it only supports directory file descriptors; extending it to handle plain-file file descriptors is an obvious next step.

9 Likes

One use case to consider is a public online binary cache, the dream I mentioned earlier in the thread – where crates.io (or another site) would automatically build all crates uploaded to it.

That use case obviously needs the ability to safely build untrusted code. If the crate is malware, sure, the cache will serve malware to anyone requesting a binary for that crate – but the malware won't infect the builder itself.

But it's already theoretically possible to achieve that using OS-level sandboxing and/or virtual machines. The question is whether a sandboxing guarantee provided by rustc itself would be a meaningful improvement.

At a minimum, build.rs would have to be sandboxed as well. Sandboxing just procedural macros without build.rs seems pretty useless from a security perspective. I guess it could help a hypothetical IDE that wants to automatically run procedural macros when the user opens a project, but doesn't want to automatically run build.rs... but that's kind of an iffy design, considering that build.rs can provide flags that are required to build the code.

On the other hand, there are legitimate use cases for accessing random external stuff from build.rs, so the sandboxing couldn't apply to all crates; it would have to be some kind of opt-out. Among other things, any crate that builds C code would probably have to opt out.

Even with both procedural macros and build.rs sandboxed, you'd have to worry about random things like #[link_args] being used to escape the sandbox. Who knows how many little features like that are lurking within rustc; it was never designed as a security boundary. And then there's the possibility of exploiting memory safety vulnerabilities in rustc itself, which contains a fair bit of unsafe code. All in all, it just doesn't seem trustworthy enough – again, from a security perspective.

But that's not the only perspective. Determinism is a real benefit, and itself an essential feature to make binary caching viable...

5 Likes

I actually think IDE should work roughly like this. Specifically, the setup I imagine looks like this:

IDE delegates to the build system the task of running build scripts and compiling proc macros. IDE doesn't automatically run build.rs, but it might show notification "I've noticed that build.rs is modified, do you want to rebuild?". However, IDE runs procmacros on users code continuously, and preferably in-process.

So, IDE doesn't need to sandbox build.rs, b/c it runs in a separate process. That's enough isolation for an IDE, because it's really not about security, it's about not allowing a random segfault or loop {} to crash the IDE process.

3 Likes

While nice from a security perspective, all mainstream WASM implementations (i.e. the browser based ones) are currently terribly slow, easily up to 50% slower for real world applications in my experience. This makes a WASM based solution for running proc macros rather undesirable from a performance perspective.

Of course I expect performance to be improved in the future, but at this moment, if a wasm based solution was adopted, I'd like to see it be opt-in, rather than opt-out or worse, the only way to run proc macros.

That fully depends on what the proc macro actually does :wink: As a trivial example, a proc macro could calculate (and include in the generated binary) all prime numbers below some threshold (which could be given as an argument). If we set aside the asymptotic behavior of the algorithm, 50% perf loss is quite significant.

Chromes's WASM implementation has thread support on the desktop.

While this may be true, there is an offsetting factor to be considered, especially for precompiled wasm.

Most compiles (that is, debug builds) will be running the proc macros in a debug build. This by itself can be very painful in some cases (such as phf (though when I used it was pre proc macro)). A precompiled wasm would be in release mode, and get the speedup from such to offset the VM cost of running it in wasm and shepherding between the two.

Aren't Proc macros required to be in their own crates? Doesn't that mean that once a particular proc macro crate is compiled locally (to native) that it is cached locally anyways and there is no need to repeatedly rebuild it (unless it changes)?

I've been following this thread, and the only advantage I can see for Proc Macros is that an IDE could run them in-process and be isolated from segfaults etc. and the IDE could keep control better of what happens with the proc macro. There doesn't seem to be any REAL security benefit of Proc macros over native code realistically. There would be a performance hit. The notion of "pre-compiled" proc macros sounds ludicrous from a security and effectiveness stand-point. This really does sound like a case of "Golden Hammer Syncdrome" (i.e. because we have WASM, we should use it for everything no matter how much we need to come up with contorted arguments for why it should be used).

Even the argument that the IDE would have better control and be able to run in-process proc-macros with better isolation than native proc-macros sounds a little contrived to me. Also, if IDE's would find using WASM for proc-macros to be an effective thing, they could easily compile WASM versions of those crates as needed, there is definitely not a need for pre-built versions that seems like it would justify the additional indirection and security ramifications.

That's just how all this reads to me. Just another point of view so-to-speak.

6 Likes

Note that the IDE can't "just" compile wasm proc-macros; rustc's proc macro RPC server is "required" for the proc macro API. It would have to grow support for running wasm proc macros for the IDE to use them. At that point, it might make sense to offer using them outside the IDE as well.

And while running a proc macro wasm binary downloaded from a hopefully trusted source is somewhat of a security nightmare, it's marginally better than doing so with native object code as has been suggested in the past. I for one would love the ability to add dependencies to my proc macro with impunity without worrying about downstream clean build time impact.

And yes, they are required to be separate crates and thus not recompiled between incremental builds. But clean build time matters as well.

Overall, I think wasm proc-macro capabilities are an incremental and marginal improvement, but should probably be both opt-in and an extra component download so shipping rustc doesn't include shipping a wasi runtime as well (yet).

3 Likes