Should `eval()` be marked as `unsafe fn`?

Suppose there is a Rust crate rustyfoo providing bindings for some scripting language foo. The

rustyfoo::eval( script: &str ) -> Value;

method will run script in foo's interpreter and retrieve the result.

The script is arbitrary and may invoke some function written in C because of the implementation of language foo.

extern "C" {
    pub unsafe fn some_code_called_by_foo_interpreter();
}

Should the eval() method be marked as unsafe?

If it doesn't need to be unsafe, the code

rusty::foo( "RunSomeCode" );

which is equivalent to

unsafe {
    some_code_called_by_foo_interpreter();
}

will have no `unsafe` to mark the potential risk of calling an FFI function.
1 Like

You don't mark things unsafe merely because they are written in C, but if they are known to be able to cause UB. (General FFI stuff is unsafe so often because that has to be the default assumption for unknown foreign functions.)

The eval function should be marked unsafe if, and only if, there is some program written in the scripting language that is able to cause UB in the process running the Rust code.

4 Likes

The foo language's function may be extended by some C code written by others, which is out of the rustyfoo's control and eventually invoked by rustyfoo::eval(). I guess it should be unsafe in the situation.

1 Like

Presumably, the author of rustyfoo doesn't know what the user is going to use it to run. If foo programs can, say, load precompiled modules implemented in C then they can't guarantee that the user won't load something unsound.

1 Like

You beat me to it :stuck_out_tongue:

I'm not sure it should be unsafe though. This kinda feels like how File::write can be used to cause undefined behaviour by writing to /proc/self/mem.

3 Likes

Then the function for loading that extension code must be unsafe. If the function for doing so is eval() (the loading is performed within the language) then eval() should be unsafe, but if the function is separate, then that function can be unsafe instead of eval().

2 Likes

Ehh..

Should running processes via std::process::Command be unsafe then? After all, you can use it to run arbitrary programs. And "arbitrary programs" includes programs which find the pid of their parent process and tamper with that process' memory.

1 Like

For comparison, the rlua crate has two methods for creating a context to execute lua code - one safe and one unsafe. The difference is that the unsafe version allows you to load the lua debug package which is used for breaking the language's abstractions and doing shenanigans. The safe version still gives you the ability to load external packages though, which could be implemented in C. This seems reasonable to me.

2 Likes

The rustyfoo::eval( "package require SomeCode" will load the extension named SomeCode.

Eventually I found SomeCode needs Mutex's protection when multiple threads are eval()ing it, or the program will end up in core dump.

1 Like

I think it really depends on whether foo packages are conventionally expected to be thread-safe. If foo is some ancient language from the dark ages when nobody cared about thread safety and it's typical for foo packages to be non-thread-safe then I'd say yes: eval should be unsafe. Otherwise this just seems like a bug in SomeCode and outside of rustyfoo's sphere of responsibility.

1 Like

You can invoke UB with pyo3's eval because you can evaluate arbitrary python code and it is not marked unsafe. I think rightly so, because it would be wildy unpractical and is somewhat out of scope for rust's type system.

6 Likes

Typically calling C functions is marked as "safe" when the author of the FFI bindings assert that the C code don't cause UB regardless of the inputs sent by the Rust side. Can you assert this for all C APIs that rustyfoo might call?

2 Likes

Another take, a function is marked unsafe when it is meaningful for the caller to check preconditions and promise that they are upheld.

assert!(!x.is_null());
unsafe { fun(x) };

Is there a meaningful check callers are going to be doing before calling eval? Or is there a chance the caller doesn't realize that eval is just going to execute arbitrary code? If not, unsafe is just added noise to the call site.

3 Likes

Is causing UB part of eval's API contract?

Or is it a transgression of [whatever the code calls]'s API contract?

For example, in Lua, loading bytecode is unsafe, and core interpreter functions (particularly load) are unsafe -> thus eval (or load) should be unsafe.

If you're only allowing trusted code to be loaded (e.g. only loading code from compile-time constants marked as such) then such an API could be considered safe even if the core interpreter functions are unsafe, and any UB that happens there is the responsibility of the trusted code in the compile-time constants, not of the caller.

(Fun fact, a lot of libraries don't use unsafe fn except in public APIs. This includes some parts of std (or at least did, not sure if those have been refactored yet)! And yet, calling those internal functions incorrectly would be UB. But since they're internal, it doesn't really matter except for maintenance.)

2 Likes

FWIW in recent versions of rlua, the safe constructor disables loading C libraries from Lua as well as disabling the debug library. It does still let you e.g. open /proc/self/mem from Lua though. :slight_smile:

3 Likes

You beat me to it too. I've never thought of that :joy:

1 Like

Technically this does fall under the definition of "unsafe", but I think this case should be an exception to the rule, because it beats the purpose of "unsafe".

The meaning of unsafe is not simply "this code may cause UB" - it's "this code may cause UB if some invariant is broken, and the compiler is not going to enforce that variant for you". The whole point of unsafe is to draw as much attention to the invariant as possible, which means the unsafe functions and types should have a Safety section in its documentation explaining what the invariant is and what may lead to breaking it, and the unsafe blocks where they are used should have a SAFETY comment that describes how the invariant is maintained.

What are you going to write in the Safety section of rustyfoo::eval's documentation? "This will result in undefined behavior if the foo code called does undefined behavior"? That's not very helpful.

And what are you going to write in the SAFETY comment where you call rustyfoo::eval? "The 5000 LoC something_something.foo script that this eval invokes is not doing anything suspicous"? That's far to broad to be assuring. Especially when that script can be provided by the user or some other third party plugin (which is one of the main reasons to use a scripting language)

The unsafe, in this case, is doing nothing to improve safety, and I think it is kind of pointless to require it.

EDIT

Actually, now that I think about it, rustyfoo::eval does not need to be unsafe even if we are strict with the definition.

The reason FFI functions are unsafe by default is not that we are snobs who think all other languages are unsafe (we are, but that's not the reason). Maybe the other language is using GC and other mechanisms to get safety at the cost of performance. Maybe that other language is actually Rust and it was using cdylib to produce a library we link against. It could be perfectly safe inside.

It doesn't matter.

The boundary is the thing that's unsafe.

FFI means using the C ABI, and doing anything non-trivial with it is probably going to involve pointers. Rust cannot enforce its invariants when it sends and receives pointers through the C ABI. The other language can't enforce its own invariants either (unless it's C in which case it has no language enforced invariants). And they certainly can't communicate these invariants through the C ABI. Hence - unsafe.

(even without pointers, the C ABI can be unsafe if you define the signature wrong. But that's something that's written in the extern block and once there the compiler can enforce it, so it's relatively fine)

rustyfoo::eval is not C ABI. The rustyfoo crate should have mappings for the primitive foo types, representations for its builtin data structures, and some general wrappers for accessing its user defined types. The boundary between Rust and foo is safe, so Rust should not consider it unsafe that foo can do UB internally.

5 Likes

This actually isn't quite true, and is a big reason why breaking ABI is so problematic when FFI is involved. Even when no pointers are involved, the only thing used to link two functions together is the (mangled) name. If the arguments don't match in any way, that's on you, and can't be caught by the tooling.

A tool like cxx marks declaration of the FFI unsafe but calling it safe, matching your interpretation here. But despite cxx doing a significant amount to ensure that the signatures you've declared match the FFI ones, declaration remains unsafe because the assertion that the other side is sound is still on you.

If the evaluated script language is intended to be sound (by Rust's definition of un/safe), then evaluating a script should be safe. If the script language allows doing things which Rust considers unsafe, then evaluating it should require an unsafe assertion that it doesn't.

3 Likes

Yes, the ABI itself is only the mangled symbol name, but virtually FFI in any language used today (excluding Assembly, for obvious reasons) requires that you declare the arguments and the return type of the foreign function. FFI for non-pointer primitives is solved at the declaration level, and for languages that can declare structs with C representation (like Rust) it's a solved problem for non-primitives as well - as long as no pointers are involved. My point is that the main problem of FFI, the main reason why Rust forces extern functions to be unsafe and demands to always have a wrapper, is when pointers are involved.

unsafe should refer to the interface level, not the implementation level. When I write unsafe { foo.unwrap_unchecked() } it does not mean "I've looked at the implementation of Option::unwrap_unchecked, read the Rust compiler code that builds it, studied the architecture of the processor that runs it, and researched the laws of physics that govern the whole thing, and I can personally vouch that this function does not do any undefined behavior". No. What it means is "I've read the documentation of Option::unwrap_unchecked, specifically the Safety part that specifies the conditions under which this function may result in undefined behavior, and I attest that in this specific case where I use it these conditions are not met".

There is no such thing as "unsafe assertion" because the unsafe keyword is not an assertion - it will not panic!() when called on code that's not actually safe, it'll just UB. This keyword is only useful to draw attention to a particular interface that needs a human programmer to check its invariances - because the compiler cannot. When I write unsafe { foo.unwrap_unchecked() } I promise (not assert!) that foo I've checked manually and foo is not going to be None. When I write unsafe { rustyfoo::eval("some_function()") } - what can I promise about the safety of some_function() that wouldn't be a lie?

There is a place for vetting and auditing code, but unsafe is not it.

And what happens if you get those wrong? Nothing will stop you, and you'll get UB if you attempt to call it. That's the point I'm making: the compiler doesn't and can't enforce that the signature matches. (And if a tool does validate signatures against a header, there's nothing preventing it from doing the same for the types behind pointers just the same it does for by-value arguments.)

...yes? Whatever language you're binding to will have the same documentation which documents when it is sound to call and when it isn't. eval should be unsafe if an arbitrary block of script could have a safety prerequisite, to indicate that you've read and upheld it, or safe if they can't.

I'm not asking you to prove that the evaluated code doesn't cause UB, just that it's not intended to not cause any UB.

Am I not allowed to use "assert" in the plain english fashion? When someone says "unsafe assertion" or "unchecked assertion," they're referring to exactly that: you're asserting/claiming/promising that some condition is true, but not checking at runtime that it holds, just assuming.

Exactly the same proof burden as writing unsafe { some_function() } -- that any documented prerequisites of some_function have been satisfied.

I'm just saying that you shouldn't be able to write rustyfoo::eval("unreachable_unchecked()") and cause UB without writing unsafe, if the scripting language allows you to do so. If the scripting language is at least as free of UB as safe Rust, then eval doesn't need to be unsafe.

1 Like