Yes, it would be linked into rustc.
I didn't read everything written so far, but I wanted to mention one thing:
libproc_macro uses custom RPC over byte buffers right now, and there are two implementations of an
ExecutionStrategy (IIRC) trait, the "separate thread per invocation" one being relevant here, I feel.
I suspect only the actual function pointer call parts of it (see
bridge::Client) would need changing (so that it can call into/out of the wasm VM, or even miri), everything else is just byte buffers containing serialized data and handles (which are just integers).
I think that has a lot more complexity. Prebuilt procmacros have a very narrow interface for a fixed architecture and are not tied to specific compiler version. The only real variation you'd need to deal with is what feature flags to set.
Arbitrary targets have a lot more constraints and variations, which means the likelihood of any individual cached artifact being useful for any specific build is much lower. If you're talking about infrastructure where crates.io (or some other trusted builder) is doing the actual building then that's a lot of required capacity, and it might not have any advantage over building locally.
Is that serialized data ABI independent? Ie, it doesn't assume that either side is running approximately the same architecture (eg, same pointer size or endianness)?
It used to use LEB128 which is a target-agnostic varint format but I think we switched to fixed-width little endian.
Even if it's not target-agnostic yet in some way, that's a bug and should be fixed.
WASM does support threads. WASM multithreading was a problem for browsers because shared memory multithreading lets you create high-precision timers which let you mount Spectre attacks against code in the same address space. So, browsers decided to delay WASM multithreading until "process per site" is implemented. Chrome is shipping WASM multithreading now and Mozilla is working furiously on "Fission" so they can enable it too.
For proc-macros, WASM multithreading is probably fine. Being able to mount Spectre attacks against rustc is not a big issue AFAICT.
One WASM limitation that's often overlooked: it's 32-bit only right now. That could be annoying.
It has a lot more complexity, but it's still worth doing in the long term.
If the crate author is doing the building, even with wasm, you have to trust them not to upload a malicious binary. (As previously mentioned, even if wasm execution itself is sandboxed, a malicious syntax extension can inject code into the output binary.)
I think the risk of that is pretty much the same as the risk that people upload code to crates.io that contains something malicious, but it would be good to have the ability to rustfmt and view the code that gets generated from proc macros. Especially when you're expecting something simple.
Do we have any sort of estimate for how much this would increase the size of
rustc by (both on-disk and in download times)? If a WASM interpreter is as large as some modern JITs, I fear this could be significant.
If the publisher uploads the WASM then reproducible builds along with something like
cargo-crev could be used to crowd-source verification that the published source corresponds to the published binary.
Just because we'll never be able to plug all the holes perfectly, that's not a reason to stop trying to plug some of them.
We hope to expand what is possible in non-sandboxed mode though. File system access and starting processes should be possible eventually.
10x? I wish. It's more around 1000x currently. CTFE probably has less overhead because it does not do all the UB checks. but I still expect it to be more than 100x slower than native code.
(Referring to uploading the wrong code.) Strongly disagreed. Code is much easier to verify than binaries, so it makes a huge difference whether the attacker has to upload code or can upload binaries.
I already find the lack of syncing/comparing between GH and crates.io code (for those crates hosted on GH) rather disturbing...
Yes, but that's not my reasoning. Restricting proc-macros doesn't plug any holes, because if someone can't run their malicious code from a proc-macro, they will run it from regular, non-macro code.
Sounds like plugging a hole to me. Or making it smaller, if you want to view it that way.
This is called "principle of least privilege": code that doesn't have to have permission to read my SSH key, really shouldn't. So wherever we can make sure that code has minimal privilege (build scripts, proc macros), we should.
Your argument seems to be along the lines of "if we cannot get perfect security we deserve no security". I disagree strongly with that sentiment. I think it is worth restricting what the attacker can do, even if that "just" moves the attacker's focus elsewhere.
I like the gist of the plan, but I would be wary of putting compiled code on crates.io, for security reasons: it is far easier get away with sneaking in nasty code in binaries than it is in source code. If the artifacts are built by crates.io, it paints a target on your back (compromising the crates.io compiler would let the attacker corrupt whatever package they want).
In both cases, given how central crates.io is to the Rust ecosystem, it could lead up to trusting trust issues.
Again, that is not at all what I'm saying, as I already stated above. I'm all for security and safety – if I weren't, I wouldn't be an enthusiastic Rust user. But I just don't see how restricting proc-macros helps the principle of least privilege at all, when there are equally easy (or in the case of non-proc-macro crates, even easier) ways of injecting malice. It's not about not having "perfect security" – the problem is not that subtle at all, because even if you restrict proc-macros, the rest of the unrestricted code is a wide open hole, so to speak. I'm not trying to advocate for the lazy viewpoint of "every attempt at securing a system is completely pointless".
So now, this would have a point if there were a way of ensuring this kind of attack is prevented from non-proc-macro code as well. A general privilege or trust system would make a lot of sense, one that secured and sandboxed code no matter how you defined or generated it. Of course there could be small mistakes and security holes in such a system too, but that's not equivalent with the security theater solution of sandboxing one part of the code and declaring it safe, while leaving the rest trivially exploitable.
It reduces the attack surface. It means you don't have to audit the proc macro code for that very kind of hole.
I agree that uploading Wasm binaries wouldn't be ideal. What'd be possible is to upload the source and have crates.io compile the Wasm file once, and make it available for download. That might require some restrictions for the kind of build system allowed, but should be fine.
Per the post you quoted, that would make the crates.io build system the target of attacks that could corrupt the whole ecosystem (edit: or strategically chosen crates, for maximal sneakiness).
If you also sandbox the tests (and solve build scripts issues) you have secured CI, haven't you?
I would say a more useful and probably easier to implement improvement would be to add an ability to easily download and/or inspect source code directly on crate.io. Right now doing it is really inconvenient...