This kind of problem is as old as Makefiles, although LSP makes "building" more implicit.
This is well known to the Rust community. Over here it's "that's just how Rust works".
Procedural macros can execute arbitrary code at build time. Rust analyzer builds Rust projects it opens, and therefore executes arbitrary code.
There's a possibility of sandboxing the procedural macros via WebAssembly, but it's currently a prototype, and not part of Rust yet.
Rust also has build.rs
scripts which also execute arbitrary code at build time, and these can't be as easily sandboxed, because they may actually need to access the system to find and build C dependencies, generate code, etc.
I would caution against thinking about sandboxing X and Z. It seems that, if we want to do something meaningful here, we first should come up with an allow list of operations that is safe for untrusted code.
00:51:35|~/projects/cargo-pwned|HEAD⚡?
λ cargo metadata
warning: please specify `--format-version` flag explicitly to avoid compatibility problems
error: `rustc -vV` didn't have a line for `host:`, got:
PWNED
00:51:43|~/projects/cargo-pwned|HEAD⚡?
λ rustc -v
PWNED
PWNED
PWNED
PWNED
PWNED
PWNED
error: infinite recursion detected
00:51:46|~/projects/cargo-pwned|HEAD⚡?
λ exa -T .cargo/
.cargo
├── bin
│ └── rustc
├── config
└── x
00:51:51|~/projects/cargo-pwned|HEAD⚡?
λ bat -p rust-toolchain.toml
[toolchain]
path = "./.cargo/"
00:51:53|~/projects/cargo-pwned|HEAD⚡?
λ bat -p .cargo/config
[build]
rustc = "./.cargo/x"
Any code that can affect your final binaries is a security risk. Even if you can make it so that no nefarious code can do harm to your machine, the final binaries are going to be distributed, no? You may even want to use them yourself!
To be clear, I'm not at all against doing more to protect developer machines even if it's not perfect. But we must also be aware of the limitations. At the end of the day you do have to trust the code you're using, whatever form it comes in.
I think the important distinction is between "I cloned this code to review it" and "I'm developing this code". The latter clearly has implicit trust, and that implicit trust cascades to its dependencies. But in the case you're auditing some unknown code -- the exact purpose is to (attempt to) detect bad actors like this.
I believe cargo crev
performs some mitigations to encourage IDEs to not automatically run code from a repository you're using the CLI to audit. If you want to set up a paranoia system, you could refuse to automatically run build scripts and proc macros defined by crates with a crev score below some threshold. This, of course, requires solving the social problem of building a web of trusted reviews.
But inherently, building code means that you trust that code's build process. You can make a good argument that opening an untrusted project in VSCode for the first time shouldn't be able to steal your secrets, but by the time you run cargo check
, you inherently trust the build system.
Why are you making a difference between "opening project in VSCode" and cargo check
(and cargo build
)? Of course by the time I do cargo test
or cargo run
, all is lost, but for the other operations I think it would be reasonable to expect that they cannot steal secrets from my system. Yes, figuring out the right sandboxing rules is not trivial, but that is not an excuse to not even try.
I agree that the choice of cargo check
as the point of trust is rather arbitrary. In a from-scratch, first-principles design, roughly
- IDE viewing conveys zero trust
-
check
conveys minimal trust; only that needed to generate APIs in dependencies, and implementation code for local workspace (dependencies are assumed to tyck cleanly) -
build
conveys medium trust; it needs to be able to discover system libraries in order to link against them -
test
/run
conveys full trust; you're unambiguously running arbitrary code at this point
But we don't have that, we have an existing design and backwards compatibility guarantees to uphold.
Mainly, I draw the line before cargo check
because there's already so much that we can (and would need to) do to make IDE intelligence work without requiring running unsandboxed arbitrary code practical. As such, it seems to make sense to focus on the clearly meaningful "I opened it in my IDE to look at it" vector before spending effort on the harder problem of retrofitting automatic sandboxing on top of cargo check
and build
.
That said, serving the IDE use case almost necessarily means solving check
at the same time. You want the IDE to show you any and all warnings/errors that check
would generate, so if the IDE can do so without running untrusted code, so can check
by doing the equivalent process. I think the key insight here that we can abuse to make this somewhat practical is that check
can assume (registry) dependencies to (within a first order approximation) always compile cleanly, so some work can be skipped for less capable tools that don't require arbitrary code execution.
cargo build
, though, I think is necessarily on the trusted side of the line. Cargo builds currently can and do rely on being able to perform arbitrary tasks in build scripts and proc macros, and backwards compatibility means we need to support doing so going forward. Even if in 20XX cargo builds are typically fully hermetic, all it takes is some edition 2015 crate in the dependency tree to bring in arbitrary code to the build process again.
That's not to say that making builds easier to trust and sandbox is a bad or completely losing initiative, though! Tools like cargo-deny could easily be used in 20XX to forbid any crates that use hermaticity breaking build features. I just think that focusing on getting the IDE use case working first (and by extension most if not all of check
) pays much stronger dividends.
That is my thinking, too.
Sure, but those arbitrary tasks don't need read permissions for my secrets and permission to access the internet (thereby having the ability to leak my secrets).
As a case in point, the opam
package manager for OCaml switched to sandboxed builds with its 2.0 release (based on bubblewrap
), and none of the packages I am following required any adjustment for this, including those that link against system libraries such as GTK. opam
is build-system agnostic, so packages perform an even larger variety of actions than we are used to from cargo. (It is even language-agnostic, e.g. the Coq ecosystem also uses opam
to manage Coq libraries.) Granted, OCaml is not Rust, but still -- I think there's useful sandboxing to be achieved here. (And arguably, with OCaml having done that move, Rust is now lacking behind the state-of-the-art in build isolation.) We might need opt-out for some particularly notorious edition 2015 crates but I expect that to be a very small minority (and thus I expect there to be some incentive to fix those crates to work with isolation, or an incentive to develop replacements that work with isolation).
In Racket, where we have both a long tradition of powerful macros, and an IDE that expands macros in the background, the behavior is to sandbox all automated expansion, preventing all file writes as well as network access, but not to sandbox the other aspects of running macros (the equivalent of cargo check
or any other explicit build steps).
"Implicit computation is sandboxed, explicit is not" has worked out well for us.
That's exactly the same restrictions that opam
imposes, so this seems like a reasonably established baseline.
(Some directories inside the build folder are writable though, similar to what build.rs
scripts would need to generate "OUT" files.)
Of course then the question arises, what is implicit, what is explicit? cargo test
/run
are clearly explicit, and we seem to agree that cargo check
is implicit, but we don't seem to have consensus on which side cargo build
sits on. (IMO it's implicit.)
EDIT: Re-reading your text, actually you seem to be saying that the equivalent of cargo check
is considered explicit? That is surprising to me.
Agreed – I think in this case, IDEs are also at least partly at fault. I don't like surprises, and IDEs are full of unpleasant surprises like this. Automatically executing code, even "only" build code, should be opt-in.
No sandbox can support all use cases. It just needs to cover enough of them so that if a user sees a pop-up prompting them to disable it, they will think twice before agreeing.
Right, the equivalent of cargo check
is explicit -- you're running a command (or clicking a button), which is explicit in a way that "opening a project in the IDE" isn't.
It's sort of like the rule in modern web browsers that some things can only happen in response to explicit user action on the page (like a click) instead of just by visiting the page.
I've written a number of proc-macros and benefit from using the great proc-macros written by the community. Adding friction to the common case of having a good code completion and code-hover experience is a bitter pill to swallow due to some bad actors.
Sandboxing the macros seems like a fairly painless step from the perspective of the IDE and cargo check
, since most of the macros I've encountered won't do anything that would run afoul of the sandbox. If a proc-macro does need those capabilities, could it be forced to declare that in the crate's Cargo.toml
, with a trust popup only shown in that case?
It seems like a good thing to do. Just another keyword like unsafe
. Also, how about sandboxing the entire compilation environment (unless explicitly wanted otherwise) which includes dependencies folders.
I think rust-analyzer works, in part, by calling cargo check
. So one way or another cargo check
needs to be sandboxed effectively.
oh... my bad
Externally sandboxing cargo check
is easy enough though, using something like
> bwrap \
--ro-bind / / \
--tmpfs /tmp \
--dev /dev \
--tmpfs /run \
--ro-bind "$workspace_root" "$workspace_root" \
--bind "$target_dir" "$target_dir" \
--unshare-net \
cargo check
That just relies on cargo metadata
and cargo fetch
being safe to call in order to prepare everything that needs networking and disk-write-access before running the sandboxed cargo check
.
It would seem strange though that IDEs support "sandboxed checking of code" and the CLI does not.
Likewise, one can use firejail
to externally sandbox vscode
, and that works very well for me, too. (So my vscode
has network access but no access to my secrets or personal data.) But it'd be great to increase security-by-default.