There is “Safe Haskell”. Using inherent purity of the language, we can just limit access at API level to be sure that the untrusted module can’t do whatever it please (sans DoS by resource exhaustion). Such scheme can be useful when some fast&optimized, yet untrusted plugins are wanted.
While Rust does not have “inherent purity”, it has “inherent memory safety”. So, similarly, by whitelisting the API modules a module can use + forbidding "unsafe"s one can provile a security level similar to Safe Haskell or to Java Sandboxes.
In Safe Haskell there are 3 types of “crates”:
Trustworthy;
Unsafe;
Default/undecided;
“Default/undecided” modules can be automatically used if they don’t contain any “unsafe” blocks and depend only on Trustworthy modules. What is Trustworthy and what is Unsafe is decided by project, not hardcoded into the language.
To make Safe Rust, rustc should be able to:
compile “evil” code without triggering arbitrary behaviour at compilation time;
be able to bar “unsafe” blocks except of on some explicitly trustworthy modules;
generate code that can not trigger arbitrary behaviour, except of though “Trustworthy” modules somewhere in the stack.
In general, the “trustworthy” module[s] should “monopolize” application’s access to actual syscalls and raw memory operations.
Is such scheme viable with Rust? Are there no architectureal obstacles that prevents it to be implemented someday?
Is ability to implement Safe Rust in future worth to be considered when moving Rust forward today?
Shall I create the respective feature request issue on Github?
Note that loading Java code into a sandbox starts with using a custom class loader to interpose access to system APIs.
It’s important to ensure that the untrusted code can’t hold system locks for a long time. The interposed code can control this, but I forgot if it needs to relay calls to another thread in order to ensure the untrusted code can’t crash while holding such locks.
I like this in theory but it would end in disaster. Even if we could come up with a safe subset, a system like this would introduce a false sense of security. Just because some code can’t do anything malicious itself doesn’t mean that it can’t cause code that relies on it to do something unintended.
For example, let’s say Alice is writing a social network and needs a string parsing/formatting library for processing requests and formatting messages. Shes finds a great one but don’t really trust the author, Eve (they have a history). Without this sandbox she wouldn’t even consider using the library but, given that this code can’t really do anything “bad” (and it does exactly what she needs), she decides to go ahead and use it it anyways. It can’t execute code, write files, or even access the network so what harm could it do? So, Alice finishes her social network (and publishes the code because she’s nice) and it’s a hit!
Now Eve learns about Alice’s new social network, reads the code, and is rather unhappy (understatement) about having her code “stolen” (they have a really long history). So, Eve decides to get revenge. She considers just breaking her library but Alice would notice and just revert to a previous version. Then she has an epiphany! Her code has access to both the admin credentials (it processes requests), the user messages (it formats them), and can store information in static variables. So Eve makes a few targeted tweaks to her library and publishes a new version; Alice updates to the new version because it’s “10x faster!!!” and logs in; and then Eve logs in and very much enjoys her welcome message.
The moral of this story is, if you don’t trust the source (and your source doesn’t somehow prove to you that it’s being honest) you can’t trust the data.
I would expect doing this at the language level to get harder over time, not easier. The problem is that macros and CTFE are going to make “don’t execute code at compile time” increasingly less feasible.
What we really need is an easy to use, cross platform sandboxing solution. Something like NaCl would probably be the best direction to head. At that point, you could just run less-than-trustworthy crates in a sandbox.
I think the idea is primarily for plugins, not just for libraries.
Imagine user-contributed repository for plugins, which end-users can download&compile&run in one click (the Rust compiler is embedded into the application for this).
For example, a video came with user-contributed levels, where levels are math-heavy and must be compiled with optimisation (so no slow interpretation) and be linked to the host application over fast API (so no serialized access over some socket). OS- and arch- independent isolation like this would be helpful.
Some language features might be disabled when doing the “Safe compilation”, like Template Haskell gets disabled in Safe Haskell.
This is not entirely accurate. The standard Java library is sprinkled with security checks, so the class loader is mainly needed to confine the loaded code to a particular security domain. The system will do the rest once a security manager is installed (which allows applications to customize the behavior of the existing security checks). Restricting access to system APIs through bytecode rewriting does not work for Java because the system library code has legitimate need to perform unsafe operations in unexpected places (for example, the data/time functionality may want to read time zone files from disk).
The safe way to do this is to sandbox everything. That is, you run the game and the user-defined level inside a sandbox and have the sandboxed game forward (low-frequency asynchronous) privileged operations to a privileged wrapper outside of the sandbox.
I have a project that implements a simple software sandbox in Rust, though it hasn’t been maintained. The big thing that it does that rustc itself can’t do directly is to substitute a different std lib (i.e. one that has been audited to only contain ‘safe’ functionality).
To use Rust as a sandbox though you are still going to need to put your sandboxed code in a different process because with recursion it’s presently trivial for safe code to force your process to abort by hitting the end of the stack.