What does `unsafe` mean?

Sorry for the spam… I fat-fingered the post-button. I edited the previous post some more.

Well, sure? If you hide the unsafety then obviously you don’t have to think about it. I was focusing on the part where you do have to think about it because you’re interacting directly with the unsafe elements.

@Lokathor

Maybe I'm just being pedantic, but what I mean is that I would summarize unsafe as follows:

Any time unsafe is on a Trait or a function or a method the implementer must verify it for upholding invariants, and the user must also promise to also uphold invariants by using an unsafe block or impl.

In other words, there is no distinction between if unsafe is on a trait or fn; the implementor and the user both need to uphold invariants, and if they do, the unsafe-ness is hidden from the "user of the user".

I’m not sure that having a compiler-tracked trusted effect is a good idea.

First, it is 100% OK for trusted code to call untrusted (but safe) Rust code, as long as it doesn’t actually trust any non-type invariants.

Second, trusted means what the documentation says it means, so tagging random functions as trusted does not actually mean anything (contrast with safe, which means that the function follows the Rust type-system).

There already is the possibility of putting unsafe methods into a trait, which has exactly the effect you describe.

So, first of all, you already said yourself above that only unsafe blocks and impls are places where someone promises to uphold invariants. Writing unsafe trait Foo is not a promise to uphold anything, making it very different from e.g. unsafe {...}.

Secondly, unsafe fn happens to permit unsafe code in its body -- so there is something to verify for the implementor -- but that's because the body of an unsafe fn is considered an unsafe block. Which actually is not always desirable, and is also the source of this confusion. Imagine a world in which unsafe fn does not make the body of the function an unsafe block. Now we may write:

// Safe to call if x is 0
unsafe fn foo(x: i32) {
  println!("{}", x); // safe, benign action. Nothing to verify here.
  if x != 0 {
    unsafe { *(0 as *const i32) = 0; }
  }
}

The only place there is anything to verify here is the unsafe block! We have a proof obligation there to show that we will never deref a null-pointer. This is the role of unsafe blocks. Moreover, the unsafe at the function grants us the additional assumption that x is 0. This is the role of the unsafe fn. Together, these two make the function safe. In particular, since the println is outside the unsafe block, we don't have any obligation to show anything there.

We could also have the even more nonsensical

// Safe to call if x is 0
unsafe fn bar(x: i32) {
  println!("{}", x); // safe, benign action. Nothing to verify here.
}

where we have nothing to prove at all but still get to make the assumption that x is 0.

I hope this clarifies why we see unsafe fn as introducing assumptions in the callee and obligations in the caller, and not vice versa. The implementor of an unsafe fn does not have to verify it for upholding anything, unless they use unsafe features themselves. Practically speaking, that will usually be the case (otherwise the function may just as well be safe), but I still find it useful to separate the places where assumptions arise (unsafe fn) and the places where invariants have to be verified manually (unsafe {}). Also, many unsafe fn will (somewhat like foo above) contain some code that actually wouldn't need an unsafe block as the compiler can trivially verify its safety; that code does not need verification.

2 Likes

Yep, I think that’s basically the conclusion I’m coming to.

I wonder if it’s worthwhile to change the syntax in a future epoch. As @eternaleye points out, the fact that unsafe has multiple meanings is really confusing.

Of course, you can do that. The effect T: unsafe Trait has is that it takes an existing Trait and makes all the methods unsafe so that you can say: T: unsafe Clone just as you could (hypothetically) be able to say: T: total Clone.

Ok, going to jump into this. I think the double meaning of unsafe creates additional confusion in this discussion, because of the implications of propagation. Right now, the rule is, “Any function which calls an unsafe function must mask it with an unsafe block or itself be unsafe,” and this looks an awful lot like an effect. But if we separate out the two meanings, and dropped the implicit unsafe { } around unsafe functions, the rule becomes, “Any call to an unsafe function must be in an unsafe block,” which doesn’t seem very much like an effect.

I don't understand why this line of reasoning is OK tho. Why do you get to drop unsafe { .. }? If you don't have the masking of unsafe fn, what is the point of unsafe fn even if it does not allow you to build safe interfaces atop of it.

I meant dropping the feature where unsafe fn implies unsafe { }, and requiring it to be explicit in all cases. This would mean that you must always mask a call to an unsafe fn.

We discussed this further at #rust-lang.

I want to summarize my thoughts from that conversation first, but there is one clarification I’d like to get first because I feel like I’m missing something. @eternaleye talked here and on IRC about how the unsafe effect is a real effect because it allows access to unsafe primitives, but then that actually means that you can’t mask unsafety in the general case, as I understand it. Masking requires that the side effects take place effectively in an isolated environment, so that the surrounding code cannot witness the results. This is true for something like Vec, but it’s not true for places where unsafe code has extra mutability powers like mutable statics or interior mutability. Safe code can never cause these mutations, but unsafe code can, and even if you hide the unsafety behind a safe interface as RefCell and Mutex do, the side effects are not actually masked since they can be observed by safe code subsequently.

1 Like

I wouldn’t say I know precisely what an “effect” means theoretically, but I noticed something different between the “async” and “try” effects and the “unsafe” one: for the former two, you can call a function that returns impl Future or Result then “apply the effect” later with await or ?, but one cannot call something unsafe from something safe then “apply the unsafety” later.

Is that a meaningful difference?

1 Like

If you were to construct a trait impl that invoked deferred unsafety, then you’d have a similar situation to the “async” and “try” effects. unsafe simply means that the compiler can’t deduce all of the conditions that Rust requires for safe code, so the code author must ensure that Rust’s required conditions hold. I don’t see that the relative timing at which the potential UB occurs is generally an important consideration (though it might be in some circumstances).

You can by marking the whole function unsafe. The orthogonality of unsafe breaks down for unsafe Traits which aren't "invoked" so there's no effect. unsafe Traits are almost more compiler enforced documentation than effects.

Ok, so. I spent a fair amount of time thinking about proof obligations in general and UB in particular but need to write a ramble before I think I can get much further. I wrote this up and figure that it should probably go in the thread, but a lot of the ideas aren’t especially new. So

Why is UB special?

Proof obligations are closely related to API contracts. For instance, traits often specify how their implementations behave, and consumers of the trait only promise sensible behaviour if the rules are upheld. An example is that BTreeMap will do weird things (not explicitly provided for in the documentation) if the Ord instance on keys doesn’t meet the requirements. The consumer of the API, in order to get the benefit of BTreeMap's promised behaviour, must meet the requirements.

Undefined behaviour rules can be seen as the same kind of a contract: a program must not perform any behaviour that is undefined if it wants to execute as specified. So what separates UB from API contracts? I realized it’s a lot less that you might intuitively think. Safe Rust is free to interact with the surrounding environment, and the stdlib documentation makes absolutely no promises about what happens if you use BTreeMap incorrectly (it simply says “it’s an error”). There are no promises on the behaviour, so BTreeMap could theoretically start NetHack, send an embarassing email to your mother, or boot up an attached DeathStation 9000, all from safe Rust and without violating the API contract. The contract and safe Rust are not even, on their own, enough to say it won’t corrupt memory because the aliasing rules don’t stop a process from writing to /proc/self/mem.

If this is the case, though, why is UB special? I couldn’t really think of a fundamental technical reason to draw a special distinction around it, given the above. So it needs to be philosophical instead. Eventually I realized that we are willing to impart some level of trust on library authors that they will restrict the potential scope of “undefined behaviour” in their libraries to something reasonable (and I am purposefully vague here because it is ultimately subjective). For instance, BTreeMap is an in-memory data structure, so we trust that screwing it up will cause the in-memory data it manages to do weird things rather than do completely unrelated I/O. Safety guarantees allow us to be sure that unless it tries really hard (or violates the safety rules in unsafe code), it won’t cause action at a distance.

On the contrary, we don’t give the same trust at all to the compiler and the hardware. They are both basically evil, and will potentially do all sorts of weird and very undesirable things if you aren’t careful; there are many wonderful stories out there. And unlike library authors, they cannot be trusted to keep things compartmentalized: we’ve all seen horror stories about the things compilers produce by assuming “undefined behaviour never happens” and working backwards, and even without that there are plenty of other ways to completely destroy program integrity with only a tiny mistake.

I think there’s also a lesser point which is that undefined behaviour is often very subtle, and Rust doesn’t try to avoid that fact. The contract requirements of an API are typically much less difficult to understand and apply than the rules about UB. So if we assume library actors are neither actively malicious nor mindbogglingly incompetent (an implementation of string sort that runs sort(1) on /proc/self/mem, say), UB rules are close to the most difficult and most dangerous that we can expect most coders to encounter (with one caveat, which I’ll address in a moment).

So this is why Rust chooses to draw a line around undefined behaviour: it’s almost at the top of the pack in both subtlety and (practical) danger, and I’ll avoid a digression as to the precise definition of it because it seems that people smarter than me picked a reasonable place to draw the line. The line could be drawn in a different place: atomics, for instance, are more subtle than some UB rules and can produce weird logic errors due to compiler and hardware optimizations if misused. But it’s drawn where it is and there is good reason to draw a line around undefined behaviour specifically.

Proof obligations and safe code

Now that we know why we want to draw the line, the question is how. Ultimately, the goal to me seems to be making it so that most Rust code can pretend that undefined behaviour doesn’t exist. In other words, we want to make it so that a standard user of Rust can prove trivially that they do not cause UB. Thankfully (and intentionally), a large subset of the language is, on its own, incapable of causing UB. So as long as you stick to the safe subset, you have no proof obligations. But if you use one of the language or library primitives that aren’t safe, then you need to be able to prove that they are safe.

The proof obligation isn’t limited to the code actually containing the unsafe operations, as others have observed. BThe Vec example is a classic case: one must prove that Vec's internal invariants hold in order to prevent operations from causing UB. But there are others: for instance, unsafe code must prove that it never provides a &mut reference to read-only memory to unknown code—the existence of such a reference is legal, but assigning to its referent, which safe code can do, is not. Abstraction boundaries provide a tool to limit the scope of code that must be considered in proving a given unsafe operation causes UB, however.

This lets us decide that unsafe {} is actually a useful construct. By cordoning off the unsafe primitives, we force an author who uses it to say “No, really, I understand the contract and I’m absolutely sure that this will not cause UB.” Abstraction boundaries provide a tool to make this useful; by limiting the scope of code that must be considered, a library author can actually provide a proof of safety without needing to understand the user’s code, and a consumer of the library is thus entitled to rely on that proof, making their own trivial.

So unsafe {} is fairly easy to understand, but what about the other three uses of unsafe? How do they arise and what obligations do they create?

unsafe fn

Suppose I am writing my own container and I want to offer an unchecked indexing function for speed, just like get_unchecked. Under the hood, it does something similar, calling to unsafe {} code. This presents a problem, however, when I go to write my proof. I don’t want to prove that the call is safe; instead, I want my callee to prove it.

That’s where unsafe fn comes in: it allows me to say “I only promise to not cause undefined behaviour under certain conditions”, and carefully document them. My function then enters into the same class as unsafe primitives: they must be used carefully and may interact in subtle ways. Since the consequence of violating my conditions is UB, it is just as potentially dangerous as unsafe primitives and using unsafe {} as the marker of proof obligations is just as useful, so we reuse it.

We could apply it generally to proof obligations that callees have, but that would likely dilute things somewhat, by encouraging all functions with preconditions to be unsafe so that the caller was responsible for them. Then all code would be unsafe {} and we haven’t gotten ahead. We could imagine a world where unsafe(no_ub) and unsafe(sort_actually_sorts) were a whole namespace of different obligations, but they aren’t, so tough.

We also have a rule that the body of an unsafe fn is implicitly wrapped in unsafe {}. There’s no inherent reason that this needs to be the case: unsafe fn is about a contract with the caller, while unsafe {} is about a contract with the callee. The reason for this implicit block is that a function with no unsafe {} cannot, on its own, cause UB. Of course, it can in conjunction with other functions, so it becomes a matter of style, as with the scope of unsafe {} around one expression or an entire function body. But I think this doesn’t change that the unsafe fn -> unsafe {} implication is one that should be viewed as purely a syntactic sugar, and not a core part of the semantics of unsafe fn.

unsafe trait and unsafe impl

Traits generally impose requests of their implementations, but they are not binding. Most famous are Eq and Ord: you’re supposed to implement them a certain way, but you are not bound to do so and, consequently, no code can produce undefined behaviour only because they violate those conditions (and of course, getting them to work more nicely is part of the goal in understanding this all).

The built-in traits Send and Sync need those kinds of guarantees, though. They are relied on to handle multi-threading (and Sync is even a lang item, since the compiler needs to understand it for statics), and this means data races. So we have to have some kind of way to mark a trait as “implementing this trait risks UB if you don’t uphold its contract”. unsafe impl is to unsafe trait as unsafe {} is to unsafe fn: it is the assertion that you have met your proof obligation and so anyone can be absolutely certain, on pain of UB, that you do as required.

unsafe fn/{} and unsafe trait/impl are otherwise unrelated. While both have a proof obligation and discharge marker, they are orthogonal, because there is no proof obligation imposed by unsafe trait on callees. Indeed, part of its point is to remove the proof obligations from callees and shift them to the implementors.

11 Likes

Thanks @alercah for these very interesting thoughts! Let me just comment on one thing. :slight_smile:

I would say there is a technical point, and you have hinted at it later: UB is the part of the API contract that the compiler knows about. The compiler has been taught to assume that its input program is UB-free, and that is exploited all over the place for the purpose of optimizations. You are admitting this later in your post, but I would say that this is a technical difference between UB and other kinds of API contraints.

From a formal methods perspective, "UB is what the compiler knows about" translates into "UB is baked into the very definition of the language semantics itself". That also makes it different than other API constraints which only locally affect that particular API -- UB literally affects all code. If your language is defined operationally and comes e.g. with a reference interpreter for a "specification machine"/"VM" that is used to define program behavior, then UB must be defined that way as well. It is for this reason that I cannot just propose to use the RustBelt model of borrowing and lifetimes as the definition for aliasing-related UB in Rust -- that's not an operational model. It will (likely) never be immediate UB to use raw pointer to have two aliasing mutable references to a ZST, but we will still consider that illegal code.

Of course, "UB is what the compiler knows about" doesn't tell us (during language design) which API violations we should make UB and which "just" funny logic errors like a misbehaving BTreeMap, and maybe that's what you are referring to. The delineation between UB and other constraints is ultimately arbitrary, but once the choice is made, there is a qualitative, technical difference.

8 Likes

I may be retreading old ground here, but I’ve just been having a think about what it means for code to be “unsafe” or “trusted” related to the example of Vec.

tl;dr version: What if the fields of Vec should be marked unsafe because they must obey invariants to maintain memory safety?

Longer version:

Normally, a safe function promises “I don’t cause memory errors as long as my inputs are legitimate values of their types.” (And an unsafe function hopefully still promises “I don’t cause memory errors as long as my inputs obey some other documented invariants”.)

But, as people have discussed before, safe methods defined inside the vec module can make arbitrary changes to the capacity field. This allows code with no unsafe blocks in it to cause memory errors indirectly. Currently, most safe code is protected from that by the privacy rules – safe code outside the vec module is unable to cause memory errors because it can’t modify the capacity field without an unsafe block (or with one, for that matter). But it just doesn’t seem like the right model to have the privacy boundary be what enforces memory safety.

Going back to my definition of “safe function”, I notice that a Vec with the wrong capacity value is not a legitimate value of type Vec – but there’s nothing in the code that marks that fact. Thinking of the duality between things like unsafe fn which say “you can’t do this unless you obey extra invariants”, and things like unsafe {} which say “I promise I obeyed all the necessary invariants”, what if the fields of Vec were tagged “unsafe” in the sense of “unsafe fn” – you need an unsafe block to modify one of the fields, because you have to promise that you maintained the invariants. By tagging them this way, it becomes permissible for any code to rely on the invariants for memory safety. On the other hand, a safe field would always be permissible to modify in safe code, but it would never be permissible to rely on its value for memory safety.

Some prior art there:

This is proposed pretty frequently, but the problem is that the it’s not just the capacity which is unsafe; the other two fields are equally unsafe to modify because the unsafety is for the entire data structure, not any single field (we can’t make len > cap, and we have to ensure you don’t change the buf to a buffer with a different capacity). unsafe would have to apply to every field. And in general, you would have to have any code touching unsafe fields verify the invariants itself, which would be bad performance, which means that either every field is going to be unsafe or you’re better served by encapsulation since it lets you put an unsafe boundary on the functions, not just on the field accesses.

1 Like