Should `eval()` be marked as `unsafe fn`?

The closest things to eval in Rust are include! and asm!.

include! is safe.

At the end of the day, a safety contract of "this is safe to use if the included code is safe to use" (which is what include!'s would be, if it had one) is isomorphic to "this is safe to call if the given code is safe to call".

But, other programming languages may not enforce safety contracts as strongly as Rust. On the other end of the spectrum, you have asm!, which is unsafe.

2 Likes

Is it? I thought that the compiler evaluates the unsafe-ness of code after macro expansion, which effectively makes include! unsafe-transparent— If the included file does unsafe things, then the include! must appear within an unsafe context.

eval(), on the other hand, can run code that didn't exist at compile-time, and therefore cannot have its invariants checked by the compiler.

1 Like

a safe eval can check its invariants at runtime.

that doesn't mean it can't have unsafe, either. just as include! can include, say,

unsafe { *ptr::null() }

one could likewise eval

unsafe { *ptr::null() }

but it's not eval breaking the contract.

1 Like

My point isn't about whether or not eval() should be unsafe, but rather that include! is a bad model for reasoning about it. If you include! this file, for example, the compiler will require you to place the macro invocation inside an unsafe block:

*ptr::null()

Thus, it's not correct to say that include! is safe. It's safe only if the contents of the included file are safe, and the compiler will check that for you. This ability to see through the include! to the generated code makes the situation materially different than the one with eval.

1 Like

No it doesn't. Why can't eval return an error code if you try to eval the following?

*ptr::null()

a la

assert!(eval("*ptr::null()").is_err());

I think that everyone here agrees that if an eval() function has effective runtime safeguards against UB, then it's ok for it to be a safe function. The question at issue is how robust those safeguards need to be before it's acceptable to remove the unsafe designator.

4 Likes

Because we are talking about FFI that happens inside the evalutated language. Let's say the evaluated language is Python. Python itself cannot derefer null pointers, but it can use FFI to call an already compiled function that does - and nothing in that function is going to check for the null pointer and raise the error. It'll just UB.

yes and why can't python have an unsafe keyword?

The fact of the matter is that Python, and lots of other languages, don't have anything like an unsafe keyword— Rust's safety model is quite novel, and it's unreasonable to ask all these other languages to change in order for Rust programs to interoperate with them.

1 Like

And just for clarity, a function exposed to Python which can cause UB is incorrect the same way that a function in Rust that can cause UB not being marked unsafe is. Evaluating Python from Rust should be considered safe, because Python cannot be used to cause UB without the presence of a soundness bug in the unsafe bindings layer.

Rust is in fact relatively unique in how strictly it demarcates the unsafe barrier between soundness trusted and untrusted code. It's not the only one to use the "unsafe" descriptor, but most other primarily safe languages with an unsafe escape hatch typically just have some "be careful" API namespace that otherwise looks like any other code, e.g. Haskell System.IO.Unsafe or Swift UnsafePointer.

The presence of an unsafe-style escape hatch does not itself mean that calling that language is always unsafe, of course. (If it did, calling any Rust code would necessarily be unsafe.) The point of unsafe is defaults: if the foreign language typically uses unsafe constructions everywhere, calling it should be unsafe by default; if the foreign language is safe by default and relegates unsafe constructions to a discouraged escape hatch used for optimization purposes, calling it should be safe by default.

There will always be some things out of scope for the unsafe guarantee of UB freedom; these are the axioms which are assumed by the system which cannot be enforced within the system (such as that my memory isn't accessible externally, e.g. /proc/mem). Safe languages with less strictly controlled unsafe escape hatches are generally understood to be handled the same way; the unsafety should be marked in the way fit for the language and not exposed to consuming APIs which expect the standard UB-freedom.

Even if you do have a bridge library taking a stricter stance, it doesn't have to mean your code is littered with unnecessary unsafe. You could have something like C#'s unsafe where when setting up the compiler/CLR you need to use the AllowUnsafeBlocks option to enable the use of unsafe functionality. Enabling this flag would probably be considered unsafe from Rust, but needn't make further use of C# code unsafe from Rust.

The "unsafe on entry" pattern where you're promising to not misuse the resulting object ever in the future isn't ideal, to be clear; it's much better if unsafe APIs have relatively clear and localizable preconditions with limited postconditions[1]. But for APIs where most of the surface should be considered safe, and it's just some theoretical edge case that could cause UB, it can be argued for. (But it does make passing any derived proof-carrying types across crate boundaries a huge footgun; since the other library didn't write the unsafe promise to not do so, them obtaining a proof of the promise is unsound if they can use it to cause UB.)

With well designed script embedding APIs that provide security features for safety running user (i.e. untrusted) scripts, there are usually clever ways to keep the amount of unsafe required for a proper API to a minimum. Generally, it'll roughly take the shape of mounting any trusted code, calling a single unsafe function to mark the loaded script as trusted (thus allowing it to do potentially unsafe things), and then evaluating any user code. The user code can go through the trusted interface, but can't itself directly use any potentially unsafe functionally. If you're running potentially untrusted code (if you have plugins, you will be, since people will install and run plugins from the web with minimal vetting if any) you already want to have some sort of sandboxing to limit access to not unsafe but still abusable APIs like filesystem access (e.g. via cap-std or similar). If you have this set up for your scripting interface, adding unsafe should not be all that difficult.

But as a final note: if you can write an API such that it doesn't require unsafe with just some small runtime assertion overhead sublinear on the amount of other work done: absolutely prefer doing that first, and only add unsafe *_unchecked versions if checks show up as a performance bottleneck. They probably won't, and your software will be more resilient[2] to programming errors. The use of unsafe should ideally be encapsulated to core abstractions/collections and limited performance-critical subsections.


  1. Giving safe access to some resource which has an unsafe condition now attached to it is in fact a pattern used by the standard library in a limited fashion. There's of course Pin::get_unchecked_mut, which is oft maligned for being an unfortunate application of this, due to the large impact of the pinning guarantee on what you can safely do with the &mut; but also String::as_mut_vec, which has the much less maligned safety requirement of ensuring the Vec contains valid UTF-8 when the borrow lifetime is allowed to expire (but I have seen some people say you should prefer mem::takeing the string, intoing the Vec by owned value, then replacing it at the end with from_utf8_unchecked to stick to clean preconditions). ↩︎

  2. Where resiliency here means panicking in a controlled manner instead of unpredictable misbehavior or even undebuggable misbehavior (UB). For persistent rather than oneshot programs where crashing wouldn't be considered resilient and recovery is possible, set up an unwind barrier between tasks and probably a process watchdog for aborts as well. ↩︎

2 Likes

It depends on whether the interpreter can fulfill FFI's contracts. If UB still occurs after fulfilling the contract (so we mark eval as safe), it is a FFI bug. On the other hands, if FFI's contracts cannot be met by the interpreter, we mark eval as unsafe and the caller of eval need to fulfill that contracts.

In short, UB should be considered safe if it is not caused by rust itself. We can only trust external operation.

1 Like

An alternative is to require some other unsafe function or method to be called before it's possible to execute the function or method which actually causes UB. I.e. if eval is a method of Interpreter, you could mark Interpreter::new as an unsafe fn and state the requirements that must hold in order to avoid UB later.

My point is: the safety requirements aren't necessarily local.

1 Like

Unfair. The users who do not call eval() are forced to use unsafe block now.

The ideal unsafe function has local preconditions, stuff you can potentially check just before calling the function (and can elide the check if it's possible to manually verify it's an impossible condition - that's the point of unsafe).

But sometimes the preconditions are "global". An example is use after free: you're not supposed to follow a raw pointer that was deallocated, but you can't check for this before calling the function (well, not without some expensive instrumentation). The memory might have been deallocated by some other function and is now invalid, and now you can't call it (through no fault of any function parameters).

If you can load arbitrary C code in this scripting language environment, then the eval function could be unsafe. The unsafe precondition here is: by calling this you assert that any C module loaded has been manually verified to not contain UB. So if UB happens during the execution of the scripting lang, it can be reliably blamed to this eval call.

Another approach is to mark as unsafe the loading of C modules themselves. That way, eval can be safe (and if UB happens during eval, you can blame some of those C modules that were unsafely loaded)

I didn't want to say that it's wise to do in this particular example. I merely wanted to comment @kpreid's statement that:

Even if…

…it is possible to have non-local preconditions (as @dlight also explained). That means it is (from a soundness p.o.v.) perfectly fine that calling a non-unsafe function causes UB if you violate the preconditions of another unsafe item (before or afterwards). Anyway, these non-local preconditions are harder to reason about, of course.

I would say it depends on the scripting language. If scripts tend to have memory safety issues because of the way the scripting language is defined, mark it unsafe. If it merely becomes unsafe if specific capabilities are abused (such as I/O or FFI), unsafe does not seem necessary. I see an analogy here with the I/O subsystem in the Rust standard library: writing to files is unsound on standard Linux (via /proc/self/mem), yet these functions are not marked unsafe, either.