Having a compile-time sandbox could limit the damage of a malicious take over of popular crates like what happened in xz.
The compile-time sandbox would work by introducing a per-crate capability.
By default, the crate would have the full capability and can opt out of any capability:
disable the corresponding interface in stdlib (for capability unsafe it would disable the unsafe keyword)
disable the capability in any of its dependencies
Capability can be combined with the use of features, to conditionally request for capability:
[features]
a = ["cap:fs"]
The capability can also be applied to each dependencies:
[dependencies]
a = { version = "1", capability = ["net"] }
The capability for crate "a" would be unified on dependency resolution, i.e. if another crate specifies a = { version = "1", capability = ["fs"] } then a would have capability net and fs.
The capability constraint put on a would also apply to its normal-dependencies as well.
NOTE that it does not apply to the proc-macro and build-dependencies, for that a separate build-time sandboxing is required.
Supported capabilities:
unsafe, capability to use unsafe keyword
fs, capability to use any fs operations
fs-readonly, capability to use any read-only fs operations
I don't see how this can be reliably enforced. There are too many compiler bugs that allow unsafe things without writing the unsafe keyword (or anything else linted against by the unsafe_code lint, like #[no_mangle]). In addition fs implies both env (through /proc/self/environ), process (through writing eg ~/.bashrc) and through process every other capability. And net implies process (through talking to systemd over a unix domain socket).
Basically the only reliable ways of sandboxing are OS level sandboxing which is with process granularity or something like wasm which is effectively with process granularity too if you consider each wasm module to be a process. Both Java and C# gave up on sandboxing individual libraries.
Also note that if capabilities aren't transitively required (i.e. my crate needs the unsafe capability to use a crate with the unsafe capability) then it's borderline ineffectual (e.g. because I can just make a crate indirection to hide capabilities), but if they are transitively required, then they (at least unsafe) also become nearly ineffectual, since most crates want to utilize a core library crate which correctly utilizes unsafe for performance reasons.
For unsafe specifically, when you want to limit the extent what you want is cfg(ub_checks), which adds O(1) checks to _unchecked operations where that's reasonable and is being worked on.
This needs sandboxing like WASM and/or process isolation, because the whole native code stack trusts the programmer, and has never been a security boundary. The compiler toolchains can't even be trusted to handle file names with spaces safely.
Rust's standard library is not mediating interaction with the OS, like browser JS does, so denying access to one of its modules doesn't deny access to the same functionality from elsewhere. Rust's libstd is just one of many ways of executing arbitrary code and telling the OS to do things. And non-experimental OSes don't have real capability systems that could precisely enforce access.
Rust's separation between crates and modules is not a security boundary. It's only a helpful illusion, but it stops existing when the code is compiled. It's a mere naming scheme when the objects are sent to the linker. The linker doesn't even know Rust is a thing, and just smushes all the objects together, assuming they're trusted input.
Rust can block some obvious bypasses like no_mangle and arbitrary linker flags, but the whole stack was never meant to handle malicious inputs, so I'm afraid there will be many many loopholes from all the less obvious ways of breaking compilers, linkers, or influencing their configuration.
Rust's safety is designed only for catching mistakes of cooperative programmers, and is defenceless against malicious code.
And not just compiler bugs, but soundness bugs in any crate containing unsafe code. If you have any crate in your dependency graph with the unsafe capability, then a soundness bug in that library could lead to arbitrary code execution by other crates, even ones that don’t contain unsafe code themselves.
This is one more reason that, as @kornel says, Rust’s safety checks are not suited to stop malicious programmers.
hanks, in that case I think we should go with something simpler:
What about a pure and safe-only crate?
I.e. a crate that cannot use unsafe, fs, changing env, net, process related API, and cannot pull in any crate using that.
By putting
[sandbox]
pure-and-safe-only = true
into a crate, it would forbidden
use of unsafe keyword and use cfg(ub_checks) to prevent any potential UB
use of std::fs module
use of std::env::remove_var
use of std::env::set_current_dir
use of std::env::set_var
use of std::net module except for socket address types
use of std::process module
And the limit would transitively applies to all its dependencies, if any dependency violates that, the entire build would fail.
This would enable a subset of the dependencies to be marked as pure-and-safe-only, thus removed from the lists of crates to be reviewed manually.
Combined with build-script/proc-macro sandbox (or tools to list all crates with build-script), and tools like cargo-tree to filter out pure-and-safe-only crates, maintainer can get a list of crates they need to review which is a subset of their entire dependencies.
It won't try to "sandbox" the entire process, but rather reduce the maintenance burden of auditing dependencies.
I wonder how many crates could enable that lint in practice though, and for those that could enable it by changing some dependencies, what would be the ergonomic and performance impact of doing so.
It’s tempting to think this, but it is not totally true.
For example, tinyvec is a crate with no default dependencies, no unsafe code, and no system I/O. It could be marked pure-and-safe-only in your proposed scheme. Suppose I write a crate that uses unsafe code to call a C library that returns raw pointers. I store a collection of these pointers in a tinyvec::TinyVec, and after I free all of the pointers, I use TinyVec::clear to ensure I don’t use them again.
What if a bug or a malicious backdoor is inserted into tinyvec that prevents TinyVec::clear from actually removing all the items from the vector in some circumstances? Now my crate could have a use-after-free bug leading to undefined behavior, thanks to a bug in a completely safe crate.
This is the sort of thing we mean by “crate boundaries are not security boundaries.”
rustc currently assumes that input is trusted, and any sort of escapes are not a vulnerability. making this airtight is super hard and then keeping these guarantees is even harder - with the worst part being LLVM, which is of course full of segfaults and other memory safety issues and therefore easy to exploit. LLVM will not treat their backend as a security boundary, so using the LLVM backend for compiling untrusted crates is out of the question. memory safe backends like cranelift are more realistic, but it still seems unlikely that we'd want to commit to making rustc a security boundary.
I have a tool called cackle that attempts to do a lot of this - per crate capabilities, sandboxing of build scripts and rustc etc. Also a blog post were I walk through using the tool. I don't think a tool like this can ever really be perfect, but not being perfect doesn't mean it's not still worth doing.
Currently the only sandbox supported is bubblewrap (bwrap). By default it blocks network access and only allows writing to the output directory (for build scripts).
Thanks, I think an implementation using gVisor + docker would be beneficial, since the Linux namespace has a few bugs reported before, and gVisor is designed to prevent that.
Though since it is designed to be used with docker, it'd be a bit difficult to use it directly, but the take away is that implementing a userspace filesystem might be safer than relying on linux kernel namespace alone.