What does `unsafe` mean?

Ok, so. I spent a fair amount of time thinking about proof obligations in general and UB in particular but need to write a ramble before I think I can get much further. I wrote this up and figure that it should probably go in the thread, but a lot of the ideas aren’t especially new. So

Why is UB special?

Proof obligations are closely related to API contracts. For instance, traits often specify how their implementations behave, and consumers of the trait only promise sensible behaviour if the rules are upheld. An example is that BTreeMap will do weird things (not explicitly provided for in the documentation) if the Ord instance on keys doesn’t meet the requirements. The consumer of the API, in order to get the benefit of BTreeMap's promised behaviour, must meet the requirements.

Undefined behaviour rules can be seen as the same kind of a contract: a program must not perform any behaviour that is undefined if it wants to execute as specified. So what separates UB from API contracts? I realized it’s a lot less that you might intuitively think. Safe Rust is free to interact with the surrounding environment, and the stdlib documentation makes absolutely no promises about what happens if you use BTreeMap incorrectly (it simply says “it’s an error”). There are no promises on the behaviour, so BTreeMap could theoretically start NetHack, send an embarassing email to your mother, or boot up an attached DeathStation 9000, all from safe Rust and without violating the API contract. The contract and safe Rust are not even, on their own, enough to say it won’t corrupt memory because the aliasing rules don’t stop a process from writing to /proc/self/mem.

If this is the case, though, why is UB special? I couldn’t really think of a fundamental technical reason to draw a special distinction around it, given the above. So it needs to be philosophical instead. Eventually I realized that we are willing to impart some level of trust on library authors that they will restrict the potential scope of “undefined behaviour” in their libraries to something reasonable (and I am purposefully vague here because it is ultimately subjective). For instance, BTreeMap is an in-memory data structure, so we trust that screwing it up will cause the in-memory data it manages to do weird things rather than do completely unrelated I/O. Safety guarantees allow us to be sure that unless it tries really hard (or violates the safety rules in unsafe code), it won’t cause action at a distance.

On the contrary, we don’t give the same trust at all to the compiler and the hardware. They are both basically evil, and will potentially do all sorts of weird and very undesirable things if you aren’t careful; there are many wonderful stories out there. And unlike library authors, they cannot be trusted to keep things compartmentalized: we’ve all seen horror stories about the things compilers produce by assuming “undefined behaviour never happens” and working backwards, and even without that there are plenty of other ways to completely destroy program integrity with only a tiny mistake.

I think there’s also a lesser point which is that undefined behaviour is often very subtle, and Rust doesn’t try to avoid that fact. The contract requirements of an API are typically much less difficult to understand and apply than the rules about UB. So if we assume library actors are neither actively malicious nor mindbogglingly incompetent (an implementation of string sort that runs sort(1) on /proc/self/mem, say), UB rules are close to the most difficult and most dangerous that we can expect most coders to encounter (with one caveat, which I’ll address in a moment).

So this is why Rust chooses to draw a line around undefined behaviour: it’s almost at the top of the pack in both subtlety and (practical) danger, and I’ll avoid a digression as to the precise definition of it because it seems that people smarter than me picked a reasonable place to draw the line. The line could be drawn in a different place: atomics, for instance, are more subtle than some UB rules and can produce weird logic errors due to compiler and hardware optimizations if misused. But it’s drawn where it is and there is good reason to draw a line around undefined behaviour specifically.

Proof obligations and safe code

Now that we know why we want to draw the line, the question is how. Ultimately, the goal to me seems to be making it so that most Rust code can pretend that undefined behaviour doesn’t exist. In other words, we want to make it so that a standard user of Rust can prove trivially that they do not cause UB. Thankfully (and intentionally), a large subset of the language is, on its own, incapable of causing UB. So as long as you stick to the safe subset, you have no proof obligations. But if you use one of the language or library primitives that aren’t safe, then you need to be able to prove that they are safe.

The proof obligation isn’t limited to the code actually containing the unsafe operations, as others have observed. BThe Vec example is a classic case: one must prove that Vec's internal invariants hold in order to prevent operations from causing UB. But there are others: for instance, unsafe code must prove that it never provides a &mut reference to read-only memory to unknown code—the existence of such a reference is legal, but assigning to its referent, which safe code can do, is not. Abstraction boundaries provide a tool to limit the scope of code that must be considered in proving a given unsafe operation causes UB, however.

This lets us decide that unsafe {} is actually a useful construct. By cordoning off the unsafe primitives, we force an author who uses it to say “No, really, I understand the contract and I’m absolutely sure that this will not cause UB.” Abstraction boundaries provide a tool to make this useful; by limiting the scope of code that must be considered, a library author can actually provide a proof of safety without needing to understand the user’s code, and a consumer of the library is thus entitled to rely on that proof, making their own trivial.

So unsafe {} is fairly easy to understand, but what about the other three uses of unsafe? How do they arise and what obligations do they create?

unsafe fn

Suppose I am writing my own container and I want to offer an unchecked indexing function for speed, just like get_unchecked. Under the hood, it does something similar, calling to unsafe {} code. This presents a problem, however, when I go to write my proof. I don’t want to prove that the call is safe; instead, I want my callee to prove it.

That’s where unsafe fn comes in: it allows me to say “I only promise to not cause undefined behaviour under certain conditions”, and carefully document them. My function then enters into the same class as unsafe primitives: they must be used carefully and may interact in subtle ways. Since the consequence of violating my conditions is UB, it is just as potentially dangerous as unsafe primitives and using unsafe {} as the marker of proof obligations is just as useful, so we reuse it.

We could apply it generally to proof obligations that callees have, but that would likely dilute things somewhat, by encouraging all functions with preconditions to be unsafe so that the caller was responsible for them. Then all code would be unsafe {} and we haven’t gotten ahead. We could imagine a world where unsafe(no_ub) and unsafe(sort_actually_sorts) were a whole namespace of different obligations, but they aren’t, so tough.

We also have a rule that the body of an unsafe fn is implicitly wrapped in unsafe {}. There’s no inherent reason that this needs to be the case: unsafe fn is about a contract with the caller, while unsafe {} is about a contract with the callee. The reason for this implicit block is that a function with no unsafe {} cannot, on its own, cause UB. Of course, it can in conjunction with other functions, so it becomes a matter of style, as with the scope of unsafe {} around one expression or an entire function body. But I think this doesn’t change that the unsafe fn -> unsafe {} implication is one that should be viewed as purely a syntactic sugar, and not a core part of the semantics of unsafe fn.

unsafe trait and unsafe impl

Traits generally impose requests of their implementations, but they are not binding. Most famous are Eq and Ord: you’re supposed to implement them a certain way, but you are not bound to do so and, consequently, no code can produce undefined behaviour only because they violate those conditions (and of course, getting them to work more nicely is part of the goal in understanding this all).

The built-in traits Send and Sync need those kinds of guarantees, though. They are relied on to handle multi-threading (and Sync is even a lang item, since the compiler needs to understand it for statics), and this means data races. So we have to have some kind of way to mark a trait as “implementing this trait risks UB if you don’t uphold its contract”. unsafe impl is to unsafe trait as unsafe {} is to unsafe fn: it is the assertion that you have met your proof obligation and so anyone can be absolutely certain, on pain of UB, that you do as required.

unsafe fn/{} and unsafe trait/impl are otherwise unrelated. While both have a proof obligation and discharge marker, they are orthogonal, because there is no proof obligation imposed by unsafe trait on callees. Indeed, part of its point is to remove the proof obligations from callees and shift them to the implementors.