This kind of problem is as old as Makefiles, although LSP makes "building" more implicit.
This is well known to the Rust community. Over here it's "that's just how Rust works".
Procedural macros can execute arbitrary code at build time. Rust analyzer builds Rust projects it opens, and therefore executes arbitrary code.
There's a possibility of sandboxing the procedural macros via WebAssembly, but it's currently a prototype, and not part of Rust yet.
Rust also has
build.rs scripts which also execute arbitrary code at build time, and these can't be as easily sandboxed, because they may actually need to access the system to find and build C dependencies, generate code, etc.
I would caution against thinking about sandboxing X and Z. It seems that, if we want to do something meaningful here, we first should come up with an allow list of operations that is safe for untrusted code.
00:51:35|~/projects/cargo-pwned|HEAD⚡? λ cargo metadata warning: please specify `--format-version` flag explicitly to avoid compatibility problems error: `rustc -vV` didn't have a line for `host:`, got: PWNED 00:51:43|~/projects/cargo-pwned|HEAD⚡? λ rustc -v PWNED PWNED PWNED PWNED PWNED PWNED error: infinite recursion detected 00:51:46|~/projects/cargo-pwned|HEAD⚡? λ exa -T .cargo/ .cargo ├── bin │ └── rustc ├── config └── x 00:51:51|~/projects/cargo-pwned|HEAD⚡? λ bat -p rust-toolchain.toml [toolchain] path = "./.cargo/" 00:51:53|~/projects/cargo-pwned|HEAD⚡? λ bat -p .cargo/config [build] rustc = "./.cargo/x"
Any code that can affect your final binaries is a security risk. Even if you can make it so that no nefarious code can do harm to your machine, the final binaries are going to be distributed, no? You may even want to use them yourself!
To be clear, I'm not at all against doing more to protect developer machines even if it's not perfect. But we must also be aware of the limitations. At the end of the day you do have to trust the code you're using, whatever form it comes in.
I think the important distinction is between "I cloned this code to review it" and "I'm developing this code". The latter clearly has implicit trust, and that implicit trust cascades to its dependencies. But in the case you're auditing some unknown code -- the exact purpose is to (attempt to) detect bad actors like this.
cargo crev performs some mitigations to encourage IDEs to not automatically run code from a repository you're using the CLI to audit. If you want to set up a paranoia system, you could refuse to automatically run build scripts and proc macros defined by crates with a crev score below some threshold. This, of course, requires solving the social problem of building a web of trusted reviews.
But inherently, building code means that you trust that code's build process. You can make a good argument that opening an untrusted project in VSCode for the first time shouldn't be able to steal your secrets, but by the time you run
cargo check, you inherently trust the build system.
Why are you making a difference between "opening project in VSCode" and
cargo check (and
cargo build)? Of course by the time I do
cargo test or
cargo run, all is lost, but for the other operations I think it would be reasonable to expect that they cannot steal secrets from my system. Yes, figuring out the right sandboxing rules is not trivial, but that is not an excuse to not even try.
I agree that the choice of
cargo check as the point of trust is rather arbitrary. In a from-scratch, first-principles design, roughly
- IDE viewing conveys zero trust
checkconveys minimal trust; only that needed to generate APIs in dependencies, and implementation code for local workspace (dependencies are assumed to tyck cleanly)
buildconveys medium trust; it needs to be able to discover system libraries in order to link against them
runconveys full trust; you're unambiguously running arbitrary code at this point
But we don't have that, we have an existing design and backwards compatibility guarantees to uphold.
Mainly, I draw the line before
cargo check because there's already so much that we can (and would need to) do to make IDE intelligence work without requiring running unsandboxed arbitrary code practical. As such, it seems to make sense to focus on the clearly meaningful "I opened it in my IDE to look at it" vector before spending effort on the harder problem of retrofitting automatic sandboxing on top of
cargo check and
That said, serving the IDE use case almost necessarily means solving
check at the same time. You want the IDE to show you any and all warnings/errors that
check would generate, so if the IDE can do so without running untrusted code, so can
check by doing the equivalent process. I think the key insight here that we can abuse to make this somewhat practical is that
check can assume (registry) dependencies to (within a first order approximation) always compile cleanly, so some work can be skipped for less capable tools that don't require arbitrary code execution.
cargo build, though, I think is necessarily on the trusted side of the line. Cargo builds currently can and do rely on being able to perform arbitrary tasks in build scripts and proc macros, and backwards compatibility means we need to support doing so going forward. Even if in 20XX cargo builds are typically fully hermetic, all it takes is some edition 2015 crate in the dependency tree to bring in arbitrary code to the build process again.
That's not to say that making builds easier to trust and sandbox is a bad or completely losing initiative, though! Tools like cargo-deny could easily be used in 20XX to forbid any crates that use hermaticity breaking build features. I just think that focusing on getting the IDE use case working first (and by extension most if not all of
check) pays much stronger dividends.
That is my thinking, too.
Sure, but those arbitrary tasks don't need read permissions for my secrets and permission to access the internet (thereby having the ability to leak my secrets).
As a case in point, the
opam package manager for OCaml switched to sandboxed builds with its 2.0 release (based on
bubblewrap), and none of the packages I am following required any adjustment for this, including those that link against system libraries such as GTK.
opam is build-system agnostic, so packages perform an even larger variety of actions than we are used to from cargo. (It is even language-agnostic, e.g. the Coq ecosystem also uses
opam to manage Coq libraries.) Granted, OCaml is not Rust, but still -- I think there's useful sandboxing to be achieved here. (And arguably, with OCaml having done that move, Rust is now lacking behind the state-of-the-art in build isolation.) We might need opt-out for some particularly notorious edition 2015 crates but I expect that to be a very small minority (and thus I expect there to be some incentive to fix those crates to work with isolation, or an incentive to develop replacements that work with isolation).
In Racket, where we have both a long tradition of powerful macros, and an IDE that expands macros in the background, the behavior is to sandbox all automated expansion, preventing all file writes as well as network access, but not to sandbox the other aspects of running macros (the equivalent of
cargo check or any other explicit build steps).
"Implicit computation is sandboxed, explicit is not" has worked out well for us.
That's exactly the same restrictions that
opam imposes, so this seems like a reasonably established baseline.
(Some directories inside the build folder are writable though, similar to what
build.rs scripts would need to generate "OUT" files.)
Of course then the question arises, what is implicit, what is explicit?
run are clearly explicit, and we seem to agree that
cargo check is implicit, but we don't seem to have consensus on which side
cargo build sits on. (IMO it's implicit.)
EDIT: Re-reading your text, actually you seem to be saying that the equivalent of
cargo check is considered explicit? That is surprising to me.
Agreed – I think in this case, IDEs are also at least partly at fault. I don't like surprises, and IDEs are full of unpleasant surprises like this. Automatically executing code, even "only" build code, should be opt-in.
This post was flagged by the community and is temporarily hidden.
No sandbox can support all use cases. It just needs to cover enough of them so that if a user sees a pop-up prompting them to disable it, they will think twice before agreeing.
Right, the equivalent of
cargo check is explicit -- you're running a command (or clicking a button), which is explicit in a way that "opening a project in the IDE" isn't.
It's sort of like the rule in modern web browsers that some things can only happen in response to explicit user action on the page (like a click) instead of just by visiting the page.
I've written a number of proc-macros and benefit from using the great proc-macros written by the community. Adding friction to the common case of having a good code completion and code-hover experience is a bitter pill to swallow due to some bad actors.
Sandboxing the macros seems like a fairly painless step from the perspective of the IDE and
cargo check, since most of the macros I've encountered won't do anything that would run afoul of the sandbox. If a proc-macro does need those capabilities, could it be forced to declare that in the crate's
Cargo.toml, with a trust popup only shown in that case?
It seems like a good thing to do. Just another keyword like
unsafe. Also, how about sandboxing the entire compilation environment (unless explicitly wanted otherwise) which includes dependencies folders.
I think rust-analyzer works, in part, by calling
cargo check. So one way or another
cargo check needs to be sandboxed effectively.
oh... my bad
cargo check is easy enough though, using something like
> bwrap \ --ro-bind / / \ --tmpfs /tmp \ --dev /dev \ --tmpfs /run \ --ro-bind "$workspace_root" "$workspace_root" \ --bind "$target_dir" "$target_dir" \ --unshare-net \ cargo check
That just relies on
cargo metadata and
cargo fetch being safe to call in order to prepare everything that needs networking and disk-write-access before running the sandboxed