What does `unsafe` mean?


#28

Ok, going to jump into this. I think the double meaning of unsafe creates additional confusion in this discussion, because of the implications of propagation. Right now, the rule is, “Any function which calls an unsafe function must mask it with an unsafe block or itself be unsafe,” and this looks an awful lot like an effect. But if we separate out the two meanings, and dropped the implicit unsafe { } around unsafe functions, the rule becomes, “Any call to an unsafe function must be in an unsafe block,” which doesn’t seem very much like an effect.


#29

I don’t understand why this line of reasoning is OK tho. Why do you get to drop unsafe { .. }? If you don’t have the masking of unsafe fn, what is the point of unsafe fn even if it does not allow you to build safe interfaces atop of it.


#30

I meant dropping the feature where unsafe fn implies unsafe { }, and requiring it to be explicit in all cases. This would mean that you must always mask a call to an unsafe fn.


#31

We discussed this further at #rust-lang.


#32

I want to summarize my thoughts from that conversation first, but there is one clarification I’d like to get first because I feel like I’m missing something. @eternaleye talked here and on IRC about how the unsafe effect is a real effect because it allows access to unsafe primitives, but then that actually means that you can’t mask unsafety in the general case, as I understand it. Masking requires that the side effects take place effectively in an isolated environment, so that the surrounding code cannot witness the results. This is true for something like Vec, but it’s not true for places where unsafe code has extra mutability powers like mutable statics or interior mutability. Safe code can never cause these mutations, but unsafe code can, and even if you hide the unsafety behind a safe interface as RefCell and Mutex do, the side effects are not actually masked since they can be observed by safe code subsequently.


#33

I wouldn’t say I know precisely what an “effect” means theoretically, but I noticed something different between the “async” and “try” effects and the “unsafe” one: for the former two, you can call a function that returns impl Future or Result then “apply the effect” later with await or ?, but one cannot call something unsafe from something safe then “apply the unsafety” later.

Is that a meaningful difference?


Pre-RFC: flexible `try fn`
#34

If you were to construct a trait impl that invoked deferred unsafety, then you’d have a similar situation to the “async” and “try” effects. unsafe simply means that the compiler can’t deduce all of the conditions that Rust requires for safe code, so the code author must ensure that Rust’s required conditions hold. I don’t see that the relative timing at which the potential UB occurs is generally an important consideration (though it might be in some circumstances).


#35

You can by marking the whole function unsafe. The orthogonality of unsafe breaks down for unsafe Traits which aren’t “invoked” so there’s no effect. unsafe Traits are almost more compiler enforced documentation than effects.


#36

Ok, so. I spent a fair amount of time thinking about proof obligations in general and UB in particular but need to write a ramble before I think I can get much further. I wrote this up and figure that it should probably go in the thread, but a lot of the ideas aren’t especially new. So

Why is UB special?

Proof obligations are closely related to API contracts. For instance, traits often specify how their implementations behave, and consumers of the trait only promise sensible behaviour if the rules are upheld. An example is that BTreeMap will do weird things (not explicitly provided for in the documentation) if the Ord instance on keys doesn’t meet the requirements. The consumer of the API, in order to get the benefit of BTreeMap's promised behaviour, must meet the requirements.

Undefined behaviour rules can be seen as the same kind of a contract: a program must not perform any behaviour that is undefined if it wants to execute as specified. So what separates UB from API contracts? I realized it’s a lot less that you might intuitively think. Safe Rust is free to interact with the surrounding environment, and the stdlib documentation makes absolutely no promises about what happens if you use BTreeMap incorrectly (it simply says “it’s an error”). There are no promises on the behaviour, so BTreeMap could theoretically start NetHack, send an embarassing email to your mother, or boot up an attached DeathStation 9000, all from safe Rust and without violating the API contract. The contract and safe Rust are not even, on their own, enough to say it won’t corrupt memory because the aliasing rules don’t stop a process from writing to /proc/self/mem.

If this is the case, though, why is UB special? I couldn’t really think of a fundamental technical reason to draw a special distinction around it, given the above. So it needs to be philosophical instead. Eventually I realized that we are willing to impart some level of trust on library authors that they will restrict the potential scope of “undefined behaviour” in their libraries to something reasonable (and I am purposefully vague here because it is ultimately subjective). For instance, BTreeMap is an in-memory data structure, so we trust that screwing it up will cause the in-memory data it manages to do weird things rather than do completely unrelated I/O. Safety guarantees allow us to be sure that unless it tries really hard (or violates the safety rules in unsafe code), it won’t cause action at a distance.

On the contrary, we don’t give the same trust at all to the compiler and the hardware. They are both basically evil, and will potentially do all sorts of weird and very undesirable things if you aren’t careful; there are many wonderful stories out there. And unlike library authors, they cannot be trusted to keep things compartmentalized: we’ve all seen horror stories about the things compilers produce by assuming “undefined behaviour never happens” and working backwards, and even without that there are plenty of other ways to completely destroy program integrity with only a tiny mistake.

I think there’s also a lesser point which is that undefined behaviour is often very subtle, and Rust doesn’t try to avoid that fact. The contract requirements of an API are typically much less difficult to understand and apply than the rules about UB. So if we assume library actors are neither actively malicious nor mindbogglingly incompetent (an implementation of string sort that runs sort(1) on /proc/self/mem, say), UB rules are close to the most difficult and most dangerous that we can expect most coders to encounter (with one caveat, which I’ll address in a moment).

So this is why Rust chooses to draw a line around undefined behaviour: it’s almost at the top of the pack in both subtlety and (practical) danger, and I’ll avoid a digression as to the precise definition of it because it seems that people smarter than me picked a reasonable place to draw the line. The line could be drawn in a different place: atomics, for instance, are more subtle than some UB rules and can produce weird logic errors due to compiler and hardware optimizations if misused. But it’s drawn where it is and there is good reason to draw a line around undefined behaviour specifically.

Proof obligations and safe code

Now that we know why we want to draw the line, the question is how. Ultimately, the goal to me seems to be making it so that most Rust code can pretend that undefined behaviour doesn’t exist. In other words, we want to make it so that a standard user of Rust can prove trivially that they do not cause UB. Thankfully (and intentionally), a large subset of the language is, on its own, incapable of causing UB. So as long as you stick to the safe subset, you have no proof obligations. But if you use one of the language or library primitives that aren’t safe, then you need to be able to prove that they are safe.

The proof obligation isn’t limited to the code actually containing the unsafe operations, as others have observed. BThe Vec example is a classic case: one must prove that Vec's internal invariants hold in order to prevent operations from causing UB. But there are others: for instance, unsafe code must prove that it never provides a &mut reference to read-only memory to unknown code—the existence of such a reference is legal, but assigning to its referent, which safe code can do, is not. Abstraction boundaries provide a tool to limit the scope of code that must be considered in proving a given unsafe operation causes UB, however.

This lets us decide that unsafe {} is actually a useful construct. By cordoning off the unsafe primitives, we force an author who uses it to say “No, really, I understand the contract and I’m absolutely sure that this will not cause UB.” Abstraction boundaries provide a tool to make this useful; by limiting the scope of code that must be considered, a library author can actually provide a proof of safety without needing to understand the user’s code, and a consumer of the library is thus entitled to rely on that proof, making their own trivial.

So unsafe {} is fairly easy to understand, but what about the other three uses of unsafe? How do they arise and what obligations do they create?

unsafe fn

Suppose I am writing my own container and I want to offer an unchecked indexing function for speed, just like get_unchecked. Under the hood, it does something similar, calling to unsafe {} code. This presents a problem, however, when I go to write my proof. I don’t want to prove that the call is safe; instead, I want my callee to prove it.

That’s where unsafe fn comes in: it allows me to say “I only promise to not cause undefined behaviour under certain conditions”, and carefully document them. My function then enters into the same class as unsafe primitives: they must be used carefully and may interact in subtle ways. Since the consequence of violating my conditions is UB, it is just as potentially dangerous as unsafe primitives and using unsafe {} as the marker of proof obligations is just as useful, so we reuse it.

We could apply it generally to proof obligations that callees have, but that would likely dilute things somewhat, by encouraging all functions with preconditions to be unsafe so that the caller was responsible for them. Then all code would be unsafe {} and we haven’t gotten ahead. We could imagine a world where unsafe(no_ub) and unsafe(sort_actually_sorts) were a whole namespace of different obligations, but they aren’t, so tough.

We also have a rule that the body of an unsafe fn is implicitly wrapped in unsafe {}. There’s no inherent reason that this needs to be the case: unsafe fn is about a contract with the caller, while unsafe {} is about a contract with the callee. The reason for this implicit block is that a function with no unsafe {} cannot, on its own, cause UB. Of course, it can in conjunction with other functions, so it becomes a matter of style, as with the scope of unsafe {} around one expression or an entire function body. But I think this doesn’t change that the unsafe fn -> unsafe {} implication is one that should be viewed as purely a syntactic sugar, and not a core part of the semantics of unsafe fn.

unsafe trait and unsafe impl

Traits generally impose requests of their implementations, but they are not binding. Most famous are Eq and Ord: you’re supposed to implement them a certain way, but you are not bound to do so and, consequently, no code can produce undefined behaviour only because they violate those conditions (and of course, getting them to work more nicely is part of the goal in understanding this all).

The built-in traits Send and Sync need those kinds of guarantees, though. They are relied on to handle multi-threading (and Sync is even a lang item, since the compiler needs to understand it for statics), and this means data races. So we have to have some kind of way to mark a trait as “implementing this trait risks UB if you don’t uphold its contract”. unsafe impl is to unsafe trait as unsafe {} is to unsafe fn: it is the assertion that you have met your proof obligation and so anyone can be absolutely certain, on pain of UB, that you do as required.

unsafe fn/{} and unsafe trait/impl are otherwise unrelated. While both have a proof obligation and discharge marker, they are orthogonal, because there is no proof obligation imposed by unsafe trait on callees. Indeed, part of its point is to remove the proof obligations from callees and shift them to the implementors.


Proposal: reboot the Unsafe Code Guidelines team as a Working Group
#37

Thanks @alercah for these very interesting thoughts! Let me just comment on one thing. :slight_smile:

I would say there is a technical point, and you have hinted at it later: UB is the part of the API contract that the compiler knows about. The compiler has been taught to assume that its input program is UB-free, and that is exploited all over the place for the purpose of optimizations. You are admitting this later in your post, but I would say that this is a technical difference between UB and other kinds of API contraints.

From a formal methods perspective, “UB is what the compiler knows about” translates into “UB is baked into the very definition of the language semantics itself”. That also makes it different than other API constraints which only locally affect that particular API – UB literally affects all code. If your language is defined operationally and comes e.g. with a reference interpreter for a “specification machine”/“VM” that is used to define program behavior, then UB must be defined that way as well. It is for this reason that I cannot just propose to use the RustBelt model of borrowing and lifetimes as the definition for aliasing-related UB in Rust – that’s not an operational model. It will (likely) never be immediate UB to use raw pointer to have two aliasing mutable references to a ZST, but we will still consider that illegal code.

Of course, “UB is what the compiler knows about” doesn’t tell us (during language design) which API violations we should make UB and which “just” funny logic errors like a misbehaving BTreeMap, and maybe that’s what you are referring to. The delineation between UB and other constraints is ultimately arbitrary, but once the choice is made, there is a qualitative, technical difference.


Disabling 'unsafe' by default
#38

I may be retreading old ground here, but I’ve just been having a think about what it means for code to be “unsafe” or “trusted” related to the example of Vec.

tl;dr version: What if the fields of Vec should be marked unsafe because they must obey invariants to maintain memory safety?

Longer version:

Normally, a safe function promises “I don’t cause memory errors as long as my inputs are legitimate values of their types.” (And an unsafe function hopefully still promises “I don’t cause memory errors as long as my inputs obey some other documented invariants”.)

But, as people have discussed before, safe methods defined inside the vec module can make arbitrary changes to the capacity field. This allows code with no unsafe blocks in it to cause memory errors indirectly. Currently, most safe code is protected from that by the privacy rules – safe code outside the vec module is unable to cause memory errors because it can’t modify the capacity field without an unsafe block (or with one, for that matter). But it just doesn’t seem like the right model to have the privacy boundary be what enforces memory safety.

Going back to my definition of “safe function”, I notice that a Vec with the wrong capacity value is not a legitimate value of type Vec – but there’s nothing in the code that marks that fact. Thinking of the duality between things like unsafe fn which say “you can’t do this unless you obey extra invariants”, and things like unsafe {} which say “I promise I obeyed all the necessary invariants”, what if the fields of Vec were tagged “unsafe” in the sense of “unsafe fn” – you need an unsafe block to modify one of the fields, because you have to promise that you maintained the invariants. By tagging them this way, it becomes permissible for any code to rely on the invariants for memory safety. On the other hand, a safe field would always be permissible to modify in safe code, but it would never be permissible to rely on its value for memory safety.


#39

Some prior art there:


#40

This is proposed pretty frequently, but the problem is that the it’s not just the capacity which is unsafe; the other two fields are equally unsafe to modify because the unsafety is for the entire data structure, not any single field (we can’t make len > cap, and we have to ensure you don’t change the buf to a buffer with a different capacity). unsafe would have to apply to every field. And in general, you would have to have any code touching unsafe fields verify the invariants itself, which would be bad performance, which means that either every field is going to be unsafe or you’re better served by encapsulation since it lets you put an unsafe boundary on the functions, not just on the field accesses.


#41

@scottmcm: Thanks for the links!

@alercah: I was only using capacity as an example – naturally, all 3 fields would be marked unsafe.

No, this is exactly the opposite of my idea – marking them unsafe would be exactly what allows you to not verify the invariants at runtime.


#42

No, this is exactly the opposite of my idea – marking them unsafe would be exactly what allows you to not verify the invariants at runtime.

Err, yes, I was referring to the safe fields. Any interaction between safe and unsafe fields would need to have invariants checked. Only invariants relating to exclusively unsafe fields would be required to be checked.

Ultimately, I don’t think that there’s in principle a reason that we couldn’t introduce a proof obligation like this here. It’s just that experience has shown it probably wouldn’t be that valuable.


#43

Why not?

To me, most of what you’re saying is the standard argument why the privacy boundary must be a key part of enforcing memory safety. Whether a module with unsafe code is sound or not depends partially on what APIs it chooses to expose, and afaik that’s a fundamental inevitable fact, not something we could change by repositioning the unsafe keyword.

My understanding of past proposals for variations on unsafe fields or other unsafety syntax was that their motivation was either ergonomics (the easier it is to read/write unsafe code, the easier it is to get it right) or enabling the compiler to check for some subsets of unsound behavior, not to replace or modify the “base case” of privacy enforcing soundness. So I find it surprising and unusual that you think the idea of the privacy boundary enforcing unsafety is itself questionable, and I’d really like to dig into that if you can elaborate on what you meant by it.


#44

@alercah, @Ixrec: What both of you are saying makes sense. And the “unsafe fields” concept doesn’t take care of everything, like the Cell example from Pre-RFC: Unsafe Types or the fact that Vec doesn’t depend only on its fields, but also what memory has been allocated (although I suppose freeing the memory is also unsafe?).

I think I have 2 main reasons to look askance at using module boundaries for safety:

  • It means anything in the same module is also unsafe (in the sense of promising to uphold guarantees), even if it’s not marked that way – I would be more comfortable reading the code if it was all marked explicitly, even if that was more verbose.
  • Ergonomics – in order to limit what code falls in the same module as the unsafe stuff, I might want to make more submodules than usual, which could involve writing a lot of boilerplate (e.g. re-exports and accessor methods). I’ve always found splitting Rust code into submodules to be a bit inconvenient for my taste.

Thinking about it, neither of these problems seem insurmountable, but I’m not sure how I would make the overall system feel clean enough to me.


#45

I still think unsafe fields/structs are a good idea, but they should not replace the privacy boundary. They should put additional spotlight on the places where those fields are mutated. So I actually think only private fields should be allowed to be unsafe.

This could also have effects like, e.g., not deriving auto-traits once there is an unsafe field. See this post for why that might be a good idea. On the other hand this might make auto traits much less useful, so I am somewhat torn here – generally, the interaction between auto traits and private invariants is not nice.


#46

As a small point, it seems to me that public unsafe fields are entirely compatible with keeping the privacy boundary. If the privacy concept is “a module may not publicize a way to cause memory errors without using an unsafe block”, and modifying an unsafe field requires an unsafe block, there’s no contradiction. I don’t see a strong reason to allow public unsafe fields when you can just implement safe getter methods and unsafe setter methods, but I don’t see a reason not to, either.


#47

Ok, given some discussion on Discord, I think it’s time to look at the idea of a trusted impl T for Ord.

This idea is that a trusted implementation of Ord fulfills the extra invariants of Ord, but to the point where unsafe code can rely on it. Now, when writing your Ord implementation, you must be careful, but the result is that unsafe code is free to optimize in such a way that you get nose demons if your implementation breaks the invariants. This is in contrast to right now, where unsafe code has to assume that trait implementations can do absolutely anything safe Rust can do, which is basically anything other than cause memory unsafety directly. Notably it includes not following any of the invariants, not even the ones that default implementations meet (since they can always be overridden).

In principle, there is no reason this needs to be related to generic code. trusted fn in general could simply mean “there are additional invariants which this function promises to uphold, on pain of UB”. But on a simple standalone function, this isn’t actually very meaningful: it doesn’t inherently document which invariants are promised, so the caller trying to rely on them needs to look at the documentation to see what the invariants are. But the actual trust situation ends up being no different than for a regular function, because the author of the unsafe code still knows exactly what their calling and is responsible for it; trusted might as well just be documentation.

On a trait, however, there can be invariants defined by the trait itself, and then a trusted implementation is saying “I meet my trait’s invariants on pain of UB”. This includes ensuring that it behaves exactly as described, including every method and the relationship between them (e.g. for trusted PartialOrd, that the various operator functions correspond exactly with cmp). The key difference is that, in this case, the unsafe code does not know anything about the implementation by default, and trusted tells it that there are things it can rely on.

This made me think about higher-order functions: what if, rather than traits, we had the idea of a trusted closure, when a trait is too unwieldy. And this in turn made me realize that trusted isn’t really about code at all, because functions are just data, and there’s no real reason that you couldn’t apply trusted to other data as well, by encoding any invariants in a type that isn’t used in a bound (imagine trusted SortedVec<T>). The difference is merely whether the invariant is necessarily upheld for an entire type (the Ord case) or merely for individual values (the SortedVec case), but note that dyn trusted Ord already blurs these lines. And, importantly, both of these cases allow a proof obligation to be moved because safe code can talk about values meeting certain invariants without necessarily being unsafe itself.

While that example encodes the invariant into the type, it could apply to data meeting constraints specified by the function as well. For example, if a function wants to be passed a valid index or risk UB, we could write fn foo<T>(v: Vec<T>, ix: trusted usize) rather than unsafe fn foo<T>(v: Vec<T>, ix: usize). This doesn’t provide much benefit in terms of moving proof obligations, but it does reduce their scope: the first foo is clear directly from its type that it can accept any value of v, any T, and any global state, and that the only special thing that a caller needs to do is promise that ix is valid for whatever foo's definition of valid is. The second foo, on the other hand, could have arbitrary restrictions on any part of the program.

There’s more to come later, but this seems like a good stopping point and I need to get going for now. I will say that my instinct tells me that trusted SortedVec<T> is perhaps useful, but trusted usize is not likely to be.