Language vision regarding safety guarantees

Summary

There are regular statements that "safe code cannot cause UB", including from Rust team members. This statement is technically wrong. The correct statement is "safe code in clients cannot cause UB". I'm wondering why these statements are made (i.e. whether they are simplifications or have unstated assumptions). In particular I'm worried that it gives the wrong impression to people unfamiliar with Rust, who might take such statements at face value.

In scope for this thread:

  • Why does the language claim that "safe code cannot cause UB"?
  • Why doesn't the language guarantee that "safe code cannot cause UB"?
  • Does the language plan to guarantee that "safe code cannot cause UB" in the future?

Out of scope for this thread (please open a new thread and link it here if desired):

  • Why the language guarantees that "safe code in clients cannot cause UB"?
  • How well safety contracts are proven? (This thread is only about safety contracts, not about how they are verified.)
  • How formal safety contracts are. (This thread is only about the content of safety contracts, not about how they are written.)

Details

Terminology for unsafe (expand if unfamiliar)

In the context of unsafe, code can be classified as follows:

|------------unsafe-code-------------|-----------------safe-code------------------|
|-directly-unsafe-code-|             |--seemingly-safe-code--|
|------------------actually-unsafe-code----------------------|-actually-safe-code-|

|-----may-have-UB------|
|---------has-unsafe-keyword---------|
|-----------------------may-cause-UB-------------------------|
  • Directly-unsafe code means code that may trigger undefined behavior unless some property holds, which the type system cannot check and thus the programmer must check. Typical examples are dereferencing a raw pointer, accessing a mutable static variable, and accessing a field of a union.
  • Unsafe API means API annotated with the unsafe keyword. They are meant to propagate safety requirements when needed. Typical examples are unsafe functions and unsafe traits.
  • Unsafe code means code annotated with the unsafe keyword. This contains directly-unsafe code and code using an unsafe API. Typical examples (that are not directly-unsafe) are calling an unsafe function[1] and implementing an unsafe trait.
  • Safe code means code not annotated with the unsafe keyword. This is the complement of unsafe code.
  • Actually-unsafe code means code that may cause undefined behavior unless some property holds, which the type system cannot check and thus the programmer must check. This definition is equivalent to the scope of unsafe: code that has to be checked by the programmer for properties that directly-unsafe code relies on to not trigger undefined behavior. This contains code that the programmer must check to prevent undefined behavior somewhere. Typical examples (that are not unsafe) are the body of Vec::set_len() or the hypothetical Vec::evil() of the scope of unsafe article.
  • Seemingly-safe code means code that is safe and actually-unsafe, as introduced by the scope of unsafe article. This is the complement of unsafe code within actually-unsafe code.
  • Actually-safe code means code that is safe and not actually-unsafe. This is the complement of seemingly-safe code within safe code.

The mismatch between unsafe code and actually-unsafe code (and between safe code and actually-safe code) is not the subject of this thread. How to make those concepts match is described in the unsafe mental model. This thread is about the language convention regarding safety guarantees, in particular how much control users have to write actually-safe code (and if they can at all).

Terminology for contracts (expand if unfamiliar)
  • A unit of implementation is code whose parts cannot be independently updated and are thus tightly coupled. There is no unique objective choice. In particular, a unit of implementation can always be split in smaller units of implementation. The maximal example is a crate linked with its pinned dependencies (and so recursively). More common examples are a crate and a module. Minimal examples are a function definition and a trait implementation.
  • A client of a unit of implementation is another unit of implementation that uses it. A dependency of a unit of implementation is another unit of implementation that it uses. Those concepts are opposite of each other: if X is a client of Y then Y is a dependency of X.
  • A public API is how a unit of implementation interfaces with its client and dependencies. A unit of implementation chooses the public API with its clients and chooses its dependencies based on the public API they chose for their clients.
  • A public API defines (among other things) a logic contract (resp. safety contract) used to prove that programs execute their intended logic (resp. execute without undefined behavior). Note that the safety contract is only meaningful for unsafe code. Safe code only sees the logic contract.
  • An X contract has X requirements and X guarantees from one side to the other (where X is logic or safety).
  • A unit of implementation is correct for X, if it satisfies its side of the X contract of all the public APIs it is involved in, assuming the other sides also satisfy their side of the contract. That unit of implementation is said to trust the other sides to correctly implement the contract, when their implementation is unknown. In contrast, a unit of implementation does not need to trust another unit of implementation if they are both part of the same unit of implementation, they can simply use the other unit of implementation ignoring any possible internal contract between them.

When designing a public API, the logic contract can be freely chosen but the safety contract must follow these constraints:

  • The safe part of public APIs must not have safety requirements. Public APIs should avoid having safety requirements. This constraint prevents the scope of unsafe to leak from a unit of implementation to its clients through the safe part of its public API. This constraint however allows the scope of unsafe to leak through the unsafe part of that public API, which the client may choose to use or not. In particular, the client is in control of whether (and how) it ends up in the scope of unsafe of its dependencies.
  • The safety guarantees of public APIs must be their logic contract. Unsafe code should avoid relying on safety guarantees. This constraint allows the scope of unsafe to leak from a unit of implementation to its dependencies. There is no opt-out. In particular, a dependency is not in control of whether it ends up in the scope of unsafe of its clients.
Illustration for functions

Public APIs for functions must have the following shape:

#[logic_requires(log_req)]
#[logic_ensures(|res| log_ens)]
#[safety_requires(safe_req)] // trivial (or absent) for safe functions
#[safety_ensures(|res| old(log_req) => log_ens)]
// `X => Y` means `Y || !X` and `old(X)` means "X held before the call"

The first constraint says that safe functions cannot choose safe_req (it must be true). The second constraint says that safe_ens cannot be chosen and must be old(log_req) => log_ens, including for safe functions.

Let's consider the following alternative to the second constraint:

  • Public APIs must explicitly document their safety guarantees (they have none by default). Public APIs should avoid having safety guarantees. This constraint prevents the scope of unsafe to leak from a unit of implementation to its dependencies unless explicitly authorized, which the dependency may choose to which extent. In particular, a dependency is in control of whether (and how) it ends up in the scope of unsafe of its clients.

All crates are virtually in the scope of unsafe

All crates can cause undefined behavior, even if they use #![forbid(unsafe_code)].

Let's consider a crate author that doesn't want to deal with undefined behavior. They write a crate foo which uses only safe code and forbids unsafe code. Another crate author writes a crate bar which depends on foo. They write unsafe code which relies on the safety guarantees of foo (which is the logic contract of foo). It's all working.

The initial author refactors their crate foo and publish a patch or minor version. Sadly this new version is not correct for logic anymore (or maybe the one before wasn't either, but it didn't impact bar). A third user who wrote a program using bar now runs cargo update, then builds and deploys the binary. A few days later their bitcoins are stolen due to undefined behavior in their program. Who is at fault? Which crate is unsound (in the sense of incorrect for safety)?

From a theoretical point of view, the bug is in foo. But it's a logic bug, and absolutely no crate in the program handles bitcoins. A logic bug in the safe crate foo caused undefined behavior because it was in the scope of unsafe of the sound (in the sense of correct for safety) crate bar.

From a vulnerability point of view, one could say that bar is at fault, because it's the code that triggered undefined behavior. While it is permitted for bar to rely on foo being correct for logic (through its safety guarantees) for the purpose of safety, it is also not recommended to do so.

My opinion is that the language is at fault, because there is no way for foo to opt-out of the scope of unsafe of its clients, and bar thus didn't know foo's intentions and couldn't respect them.

For more context, the following claim suggests that foo is not at fault (regardless of whether it would have been fine providing safety guarantees):

If you do not use unsafe in your code, you know that any potential causes of memory unsafety are not your fault.

The following hypothesis suggests that bar is at fault by not being self-contained:

Unsafe code blocks should be straightforward and self-contained to minimise the amount of code that developers have to vouch for, e.g. through manual reviews.

Self-containedness: Is the behaviour of unsafe code dependent only on code in its own crate?

But the current convention says that bar is not at fault because unsafe code can rely on correctness of its dependencies, which on the one side rejects the preceding hypothesis and on the other fails to attribute the fault on any of the involved crates when taken together with the first claim.

With the alternative constraint, foo would have no safety guarantees by default, thus making sure it only contains actually-safe code, and preventing bar to rely on its logic contract. If it wanted to, it could document exactly which safety guarantees it is ready to support. Then bar would either use them if they would be sufficient, or need to find an alternative crate or implement the logic on their own.

Reviewing a crate for safety is unreasonably costly

Why review a crate for safety?

In the context of security, a crate:

  • Can be thoroughly audited for safety (since undefined behavior may provide arbitrary code execution, regardless of what the actual code is supposed to do).
  • Only skimmed for specific logic errors (insufficient user input validation, incompatible time-of-check and time-of-use, etc).

This strategy is a nice spot in the "security guarantees" versus "audit effort" space. Having to thoroughly review for logic would be a much worse trade-off.

Because the logic contract is embedded in the safety contract, proving that a unit of implementation is correct for safety essentially amounts to proving that it is correct for logic. Because logic contracts are usually orders of magnitude more complex than safety contracts, reviewing a crate for safety is much more costly than it would have been with the alternative constraint. From the example of the previous section, this means that it is not possible to review bar for safety without reviewing foo for logic (past, current, and future versions).

Note that this alternative already exists in some form today. Some crates document as "Safety-usable invariant" the safety guarantees that are used by their clients (these documentation are added while reviewing those clients, thus minimizing safety guarantees to what is necessary). But this is more of a mix (or trade-off) between the current and alternative constraints. Some crates are assumed to be correct for logic while others need safety guarantees be documented.

A logic bug is a safety bug

Because the logic contract is embedded in the safety contract, a logic bug is a safety bug. This blurs the distinction between safety and logic, which is one of the primary ways Rust is meant to differ from C. In particular, the type system is supposed to provide the safety contract for actually-safe functions.

In other words, in Rust (like in C) there's essentially only one contract, because the logic contract and the safety contract are equivalent. This equivalence follows from the fact that the logic requirements must imply the safety requirements and the fact that the safety guarantees are the logic contract.

With the alternative constraint the concepts of logic and safety are disjoint. You cannot have a safety bug because of a logic bug.

Asymmetry with safety requirements

There are 4 constraints regarding safety contracts depending on whether we look at safety requirements or guarantees and whether we look at whether the scope of unsafe should be encapsulated or leak:

Safety requirements Safety guarantees
Encapsulated scope of unsafe (Current) Safety requirements must be documented and should be avoided (Proposed) Safety guarantees must be documented and should be avoided
Leaky scope of unsafe (Discounted) Safety requirements must be the logic requirements and unsafe code should avoid relying on them (Current) Safety guarantees must be the logic contract and unsafe code should avoid relying on them

Today we choose differently between requirements and guarantees. The alternative constraint mimics the choice for requirements, making sure the scope of unsafe can be encapsulated (from both clients and dependencies, instead of just from dependencies).

The other option would be to change the constraint for safety requirements to mimic the one of guarantees. In that case we get exactly the situation in C, where there's only one contract (the logic contract) and code should avoid relying on those logic properties to not trigger undefined behavior (since those properties can easily break, due to their complexity).

The logic contract is not necessarily what unsafe code needs

The logic requirements of a dependency might be too strong compared to what a unit of implementation may be able to ensure based on the safety requirements it has with its clients. Similarly, the logic guarantees might be too strong compared to what it actually needs. For example if unsafe code needs a sort function to return a permutation if its comparison function is deterministic, it wouldn't be able to use the logic contract that says it returns the input sorted if the comparison function implements a total order.

With the alternative constraint, the sort function could document this safety contract (and others, including the logic contract).

The standard library is correct for logic

It seems that the alternative constraint would be very verbose for crates that actually want to guarantee their logic contract for safety, like the standard library. This could be overcome by stating the safety guarantees at crate-level rather than at each item: "This crate guarantees for safety that it is correct for logic (unless explicitly stated)."

The current constraint is not an issue in practice

As long as unsafe code only relies on the correctness of its dependencies in extremely rare and well-chosen cases (like the standard library or very well-written crates), there is no issue. This can be explained by those well-chosen crates upholding their logic contract to the highest level, in particular high enough for safety purposes.

Note that this hypothesis used to hold in 2020, and probably still holds in the vast majority of cases. Maybe a few security vulnerability from time to time is not worth the explicit documentation of safety guarantees (and letting crates opt out of the scope of unsafe).

Question: is the current constraint a deliberate choice?

When Rust team members state that "safe code cannot cause UB", I wonder how much nuance is implied and what the long-term language vision actually is. Assuming the current constraint, I can see two non-exclusive interpretations:

  • It's just a shorthand and the actual statement is more subtle. In particular, a safe binary crate cannot cause UB, but safe libraries can implicitly cause UB because they cannot control how unsafe clients rely on their logic.
  • It assumes that unsafe code will only rely on the correctness of dependencies that have a high probability of being correct (such as the standard library). A similar assumption applied to clients would be untenable since they are written by arbitrary authors (which can't all be held accountable to writing correct code).

But this could also hide a long-term vision based on the alternative constraint, allowing crates to write actually-safe code independently of their clients and dependencies. So I'm curious to know the official reasoning behind such statements.


  1. which is not an intrinsic ↩︎

2 Likes

I personally also encounter this problem. I found that one may easily get lost in this long post without examples, so I want to add some examples for others to understand the problem. @ia0 feel free to correct me if the examples don't represent the problem :slight_smile:

0. Basic

The documentation of Ord:

Implementations must be consistent with the PartialOrd implementation, and ensure max, min, and clamp are consistent with cmp:

  • partial_cmp(a, b) == Some(cmp(a, b)).
  • max(a, b) == max_by(a, b, cmp) (ensured by the default implementation).
  • min(a, b) == min_by(a, b, cmp) (ensured by the default implementation).
  • For a.clamp(min, max), see the method docs.

Violating these requirements is a logic error. The behavior resulting from a logic error is not specified, but users of the trait must ensure that such logic errors do not result in undefined behavior. This means that unsafe code must not rely on the correctness of these methods.

1. Contract for downstream caller

While in The Rustonomicon, it said:

The design of the safe/unsafe split means that there is an asymmetric trust relationship between Safe and Unsafe Rust. Safe Rust inherently has to trust that any Unsafe Rust it touches has been written correctly. On the other hand, Unsafe Rust cannot trust Safe Rust without care.

As an example, Rust has the PartialOrd and Ord traits to differentiate between types which can “just” be compared, and those that provide a “total” ordering (which basically means that comparison behaves reasonably).

BTreeMap doesn’t really make sense for partially-ordered types, and so it requires that its keys implement Ord. However, BTreeMap has Unsafe Rust code inside of its implementation. Because it would be unacceptable for a sloppy Ord implementation (which is Safe to write) to cause Undefined Behavior, the Unsafe code in BTreeMap must be written to be robust against Ord implementations which aren’t actually total — even though that’s the whole point of requiring Ord.

The Unsafe Rust code just can’t trust the Safe Rust code to be written correctly. That said, BTreeMapwill still behave completely erratically if you feed in values that don’t have a total ordering. It just won’t ever cause Undefined Behavior.

This means it is suggested that:

pub fn safe_api<T: Ord>() {
    check_t_implement_ord_logic_contract_correctly()?;
    unsafe { use_ord_logic_contract_for_unsafe_operation(); }
}

2. Contract for upstream callee?

So far so good. However, if the T does not come from downstream user, but rather from an upstream crate:

extern crate other_crate;
pub fn safe_api() {
    use other_crate::AnOrdImpl as T;
    unsafe { use_ord_logic_contract_for_unsafe_operation(); }
}

Should there be a check?

3. Contract for upstream std?

Moreover, if the T is not from a third-party crate, but from std. What should we do?

pub fn safe_api() {
    use std::AnOrdImpl as T;
    unsafe { use_ord_logic_contract_for_unsafe_operation(); }
}

4. Answer from The Rustonomicon

In the same section in the above Rustonomicon section, there is an "answer" to question 3:

One may wonder, if BTreeMap cannot trust Ord because it’s Safe, why can it trust any Safe code? For instance BTreeMap relies on integers and slices to be implemented correctly. Those are safe too, right?

The difference is one of scope. When BTreeMap relies on integers and slices, it’s relying on one very specific implementation. This is a measured risk that can be weighed against the benefit. In this case there’s basically zero risk; if integers and slices are broken, everyone is broken. Also, they’re maintained by the same people who maintain BTreeMap, so it’s easy to keep tabs on them.

On the other hand, BTreeMap’s key type is generic. Trusting its Ord implementation means trusting every Ord implementation in the past, present, and future. Here the risk is high: someone somewhere is going to make a mistake and mess up their Ord implementation, or even just straight up lie about providing a total ordering because “it seems to work”. When that happens, BTreeMapneeds to be prepared.

The same logic applies to trusting a closure that’s passed to you to behave correctly.

Which I don't agree and I think is somewhat superficial... What I want is a formal best practice, not something about "less" or "more" risk.

I also want to talk about my understanding about this problem.

Understand the contract

"Contract" is a mature thing in programming, which includes "requires" and "ensures" that pre and post the target function. The ultimate goal of contract is to modularize verification.

In theory, for any executable, we can verify the correctness from entrypoint to end, sentence-by-sentence, entering every sub-function call, forking at every conditional branch. However, obviously, this is not practical.

Function-level contract saves the world: for a single function, we write contract of what it takes ("requires", or requirements), and what it returns ("ensures", or guarantees). The verification will be done to check whether "requires" + function body can derive "ensures". When encountering a sub-function call, it will not enter, but directly apply the "requires" and "ensures" of that sub-function.

After understanding this, the "safety-" and "logic-" contracts are clear for "requires", as posted by @ia0 :

A public API defines (among other things) a logic contract (resp. safety contract) used to prove that programs execute their intended logic (resp. execute without undefined behavior).

However, what does it mean for "safety-" / "logic-" contracts for "ensures"? One can never image how the downstream users use this function's output contract. In my point of view, it is meaningless to talk about whether an "ensures" contract is for safety or for logic.

Use the contract

In the old world of "safe / unsafe" without contract, the responsibility is clear: The caller of unsafe callee is responsible to make sure the "Safety" section of callee is meet in caller. If an UB is occurred:

  • If the triggering condition is not described in "Safety" section of callee, then it's callee's fault
  • If not, it's caller's fault

However, the "make sure" part is not clear and more subjective. Rust has no mechanism to check whether the caller can make sure, which is for contract.

In the world of "contract", (if all related crates have well-written contracts), there is nothing subjective. The caller cannot "claim" to make sure, everything is verifiable. A safe and verified code will never cause UB.

And if we want this to be true, then answer to the question 2/3 I posted above is that, it's callee's responsibility to check every conditions that not covered in the "ensures" of upstream crate / std. (Hopefully, Rust compiler could optimize those checks if they are trivial)

So in short, if we add contract to Rust, then the question about "safe code never cause UB" can be resolved by adding "verified" condition. And by verifying, there is no room for subjective and everything is verifiable.

How is check_t_implement_ord_logic_contract_correctly() is supposed to work?

It's making a check at runtime, but we need a check at compile time. Probably using something like Creusot or flux (or the many other verification frameworks - not sure why there are so many), if they can verify contracts in traits, and give a proof of correctness. Since most people don't do that, the BTreeMap must not cause UB even if the Ord impl of the type of the key has bugs.

But if we had a proof of correctness of some code, we could rely on it for soundness. If we don't, we must manually inspect the code and personally assess whether it is correct. The unsafe { } block is a way of signaling you did all the work (including auditing all and every safe code you rely upon). That's the price of unsafe.

But we can't normally rely on the correctness of generic safe code, because we can't, at the point we wrote unsafe { }, verify all impls (unless it's a sealed trait), since it's unbounded. In particular we can't ever rely on the correctness of downstream safe code!

But the Rust solution of such unsafe problems is more unsafe. If we make a trait unsafe trait TrustedOrd: Ord { } with no methods, anyone that write unsafe impl TrustedOrd for MyType is signaling that they are asserting that specifically the Ord impl of MyType can be trusted by unsafe code. This is not particularly great because this is still not verified by the compiler - we need a someone to personally vouch for this impl. (Note, that's how TrustedLen in iterators work)

Well there is a better approach now that we have some verification tools. We can write a contract in, say, Creusot, and as long as we run the checks at compile time we can verify that some safe code is correct. Or, even better, verify that some unsafe code is correct! (and won't cause UB). Right now there's a project to verify the unsafe bits of the Rust stdlib (which is weaker than full verification of the stdlib APIs - which is what we need, since we generally trust that the stdlib is correct in most unsafe code)

1 Like

And nothing should happen, because if the safety of bar depends on the functional correctness of safe APIs in foo, bar should depend on a very specific version of foo, like =1.2.3. Or better yet, vendor foo (maybe republishing it to crates.io as bar-foo - somewhat common across the ecosystem) or, what's actually the best solution, just copy the code from foo and paste on some module in bar.

That's because if some unsafe code depends on some safe code to not trigger UB, then the author of the unsafe code should review the safe code thoroughly. This is the responsibility of bar, the author of foo has nothing to do with it, and can not ever be blamed by UB. This is the axiom of Rust safety: safe code can't be blamed for UB in unsafe code, no matter what.

But what about robustness?

As an aside, there's a perennial proposal that is sometimes reposted here on IRLO for Rust to have the dual concept of safety requirements, called robustness guarantees. It was posted here Simpler mental model for unsafe and in some other threads.

While a safety requirement is something an unsafe API demands when called, a robustness guarantee is something (often from a safe API) that can be relied upon even in unsafe code. This would encode exactly when we can blame or not blame foo for UB in bar - if bar depended on a robust guarantee of foo that turned out to be broken, then it's foo that is at fault. (by default, if foo guarantees nothing, then bar is at fault - same as today)

This would be a great fit if foo actually uses formal verification to assess whether the implementation does what the spec says. Or if for some other reason the author of foo deems appropriate to make this guarantee, going above and beyond most crates.

I happen to like this idea a lot but there was absolutely no traction for Rust to adopt this. So, robustness is not a thing. Which is a shame - robustness is the exact right tool to enable foo to signal that its functional correctness was so thoroughly vetted that, unlike the absolute majority of code written in safe Rust, it CAN be trusted in unsafe code. Right now there isn't a mechanism for that.

Anyway one can read here more about robustness. I hope it is adopted some day.

The best case scenario is if both pieces of code is written by the same author (and hopefully are in the same crate, or even better, in the same module). This is very common - the code that can trigger UB is usually restricted to a single module in a crate (perhaps with a number of submodules), and we should all aspire to make it possible to audit the unsafe code by reading just this module. If we depend on external crates, we should think about whether we depend on its correctness on matters of UB. If we do, that's probably dangerous, or at very least, it makes the lives of unsafe auditors very hard (they need to also read the code from the external library to assess whether your code triggers UB. Very bad). But if you do anyway, the code may become impossible to audit.

What about the stdlib?

Well all of this is kind of left aside when we are talking about depending on the functional correctness of code from the stdlib. That's out of practicality - there's a large number of APIs in the stdlib that unsafe code must rely upon.

Generally we assume that the stdlib has been audited, but this is not true actually. Or it is, but mainly for safety - but we rely on the correctness of the stlib in our own unsafe code, not just its safety. And the Rust stdlib is a library like any other and has plenty of bugs. And it's a moving target, just like crates in the crates.io ecosystem - every compiler version comes with its own stdlib. So, this situation is not great.

The only solace is that the stdlib actually has a decent stability policy and won't change its behavior willy nilly.

Depending on libraries with vague versions like 1.3 (or worse, 1) is like depending on generic code - you can't control exactly what code you depend upon. Cargo works in mysterious ways when deciding what specific code you depend upon (and this can change if we tweak things like the resolver or if some other crate that is completely unrelated to ours depends on foo with a different minor version, or some feature flag enabled!).

For safe code, this is merely a nuisance. A minor version bump may introduce some regression or whatever. For unsafe code, this is safety critical and can never, ever happen. Just like unsafe code can't rely on generic safe code to determine whether it causes UB, it can't rely on the Cargo resolver to not trigger UB either.

This all means that unsafe code can't enjoy the usual workflow for safe code in Rust. We often need to do something in our own crate because depending on random code from other crates won't fly. This is less ergonomic for sure.

1 Like

Sorry for the ambiguity in my post. Let's make things clear. Let's say you are author of library A, which has function foo. foo invokes bar in library B, and invokes unsafe function baz in library C. The safety requirement of baz comes from bar.

What I propose is that:

  • If author of library B writes verified contract of bar, and the "ensures" contract of bar contains the safety requirement of baz:

    Then author of library A does not need to do anything, just invoke bar and invoke baz. Everything is verified at compile time.

  • If bar does not provide "ensures" contract that covers baz's safety requirement:

    Then it's library A's author's responsibility to check it at runtime (which can make verification engine verify the safety requirement of baz).

If things work as I proposed above, then it's needless to pin the version or vender it:

  • If foo's author write verification contract, then if upgrading breaks, it is foo's author's responsibility
  • If foo's author does not write verification contract, then bar should check it at runtime. Then upgrading will never be a problem.

In other words, more straightforwardly, when contract comes into Rust, there will be no case where authors need to review whether upstream code meets safety requirements, since it is either verified statically or checked dynamically.

The talk of BTreeMap above got me interested: in theory BTreeMap should be implementable in safe code (by trusting the API of Vec, Box, etc.). So I was wondering what it was doing that needed the unsafe blocks.

I spotted three main uses of unsafe in BTreeMap's implementation. One of them was handling allocators (because it does the allocation and deallocation itself rather than relying on Vec or the like). This is probably because BTreeMap wants to be able to make multiple allocations, but store the allocator only once. This is a problem that should be fixable long-term (I have been working on scoped generics as a method of solving it), but it isn't fixed in current Rust, so unsafe it is. Fair enough.

The second is more interesting: BTreeMap internally works using a cursor API (which is accessible outside the module too, but unstable), and so it frequently uses cursor methods in order to do updates. As such, it is calling methods like btree_map::CursorMut::insert_before_unchecked, which adds a value to a specific point in a BTreeMap without checking that this is actually the right place in sorted order. The interesting thing about this method is that it's marked unsafe, even though abusing it shouldn't be able to cause undefined behaviour (only logic bugs): because BTreeMap is parameterised on Ord which is a safe trait, you could cause BTreeMap to insert values in the wrong place by making the Ord definition dependent on a global atomic variable, which you temporarily change just at the moment it tries to do the insertion (after the calling code has checked that the position is correct). As such, the unsafe here is basically just a lint rather than marking a potential memory safety issue (and in particular, if the code made any use of insert_before_unchecked's safety requirement for soundness, it would be unsound). It seems wrong to me to need an unsafe block in BTreeMap in order to call insert_before_unchecked, primarily because implementing BTreeMap entirely in safe code would be a sensible way to prove that it is memory-safe even in the face of broken Ord implementations, but the safety requirement on the function means that you have to use an unsafe block to call it and thus this proof is not available.

The third is caused by the Node type that the BTreeMap uses internally (implemented here). This uses an implementation technique that's very hard to implement in safe Rust, both because it tries to do a polymorphic recursion and because it's doubly-linked. This means both lots of unsafe internally, and also some unsafe within BTreeMap itself when it reborrows Nodes (because the way Node::reborrow_mut is defined means that the compiler can't check that the reborrow and original aren't used at the same time). I am wondering whether it would be worth rewriting Node to use more Rust-like data structures: in particular, the double-linking doesn't seem essential and may be counterproductive (it makes BTreeMap iterators/cursors O(1) rather than O(log n) size, but adds an O(n) overhead to the memory usage of BTreeMap and the allocations it references, which seems like a bad tradeoff to me. The polymorphic recursion does seem like it probably has a significant efficiency benefit, though.

It seems like an ideal safety vision would be able to easily prove that BTreeMap<T> is sound regardless of how broken T's Ord implementation is. For the Node-related unsafety, this is easy because Node<T> does not require T: Ord and thus (without specialisation) can't possibly do anything that would be broken by a malicious Ord implemetnation. For the allocator management, this likely shouldn't depend on the Ord implementation (the easiest way to prove this would be to write the code such that BTreeMap uses its allocator is used only with Node and never for any other purpose). insert_before_unchecked is an interesting case, though – I suspect the correct solution is something along the lines of "create an UnorderedBTreeMap that supports cursors, etc. but doesn't have any requirement on being ordered, then implement BTreeMap in terms of it using only safe code".

1 Like

Those examples are fine, but they are more complex than needed. There's no need to have traits involved. A more simple example (following the generic pattern of the original post) would be:

  • Some math crate provides a GCD (greatest common denominator) function.
  • Some game engine crate uses it in unsafe code (games are fast) for coordinate alignment.
  • A binary implements a game using the game engine (unaware of the math crate below or the unsafe code).

A crate author doesn't need to imagine anything. A crate author has an idea of a functionality they want to provide (e.g. sorting slices). They provide a first version with a postcondition they believe is useful. If it's not always useful, they're gonna get feature requests (e.g. the sort should be stable). So building a postcondition (or guarantees) is an iterative process. Actually building the whole contract is an iterative process, because requirements may be too strong for some users.

I'm not sure how you reach that conclusion. Is it because "ensures" are meaningless in general? (which I hope I addressed in the previous quote) Otherwise the distinction is what the crate author wants to guarantee for which purpose. To take the sort example again, the crate may choose to guarantee a stable sort for logic and a permutation for safety. Or anything else. This will be discussed with its users through feature requests (or pull requests if it's just about documenting something the implementation has always been doing).

For some reason the rest of your reply is on point, so I might just have misunderstood that initial segment. Although I consider the notion of verification out of topic for this thread. My problem is the lack of freedom in the safety contract, which prevents encapsulation of the scope of unsafe for dependencies.

I'll clarified in the original post that this thread is not about how safety contracts are verified, but what they should contain. Verification is a different topic.

In that case you're just making foo and bar in the same unit of implementation, which indeed removes the problem because there's no need for contracts. But that's just ignoring the problem when they are in different units of implementation (when bar only sees the contract of foo not its implementation).

This has been debunked in Conditions for unsafe code to rely on correctness, unless you know of an official language statement that says this is not the case. But I agree with you that this convention is dangerous and the core problem this thread originates from.

Whenever an API marked as unsafe causes only logic bugs when misused, rather than actual memory safety issues, I think it's best to ignore the logic thing and suppose that, even though currently it does not actually trigger UB, in the future, new semver-compatible versions of this API hold the right to trigger UB "as promised". So the API is allowed to trigger UB when misused (and allowed to do anything else, including having a safe logic bug), but not required to do so at any given moment.

So what the stdlib is saying is that it could, at some point in the future, actually trigger UB in this operation, even if it's not doing this today. And as such, it's UB in this sense (you do something, it might cause UB sometimes, you can't rely on it to not be UB, ergo it is UB in some abstract sense)

If it actually triggering UB is not a sensible future direction for this API, it seems much better to just remove the unsafe tag from it (existing callers won't be negatively affected, but they may drop the unsafe block if they wish)

I'm sure there must be some gaps between us :joy:

What is the difference for the library author when writing the code that guarantees a stable sort for logic or for safety? Maybe figuring out this could help me understand your opinion.

In my understanding, the difference of logic and safety makes sense for me when it stays for contract requirement: If user wants this contract to fit requirement of downstream unsafe function, then it is safety contract; otherwise it is logic contract. Anyway, the safety or logic is decided by consumer instead of producer.

If there is no difference in code-level to provide logic / safety "ensures" contract for author, then I would say that there could be a unified "ensures" contract, and let the consumer to choose whether it is for safety or logic.

1 Like

The difference is whether failure to uphold that guarantee is a logic bug or a safety bug.

If a crate author wants to only write actually-safe code to avoid dealing with safety bugs, they can avoid using unsafe code and avoid providing safety guarantees. They can still provide logic guarantees. But they know that if they fail to uphold those logic guarantees (because they are usually quite complex and not worth bothering to get them completely right) they won't cause undefined behavior. That author prevents the scope of unsafe of their clients to reach their crate, essentially encapsulating it.

This is UB (library UB). It's the same situation as for builtin (language) UB: language UB is not required to do anything particularly bad but it is allowed to do anything it wants. Same thing for library UB. Being allowed (but not required) to do arbitrary bad things is the very definition of UB.

So the difference is in whether you're careful enough in making sure the sorting algorithm is implemented correctly with no bugs? Such a threshold seems completely undefined and arbitrary, and thus meaningless.

I have more or less understood you. Is it equivalent to bug -- vulnerability, or incorrect -- unsound?

If so, I do think there is few necessity of logic contract. First of all, Rust cares about memory safety the most, and try to make logic correct with type system. Secondly, even if we have logic contract, we can''t guarantee that a verified program is logically correct, but only guarantee that it works as we expected within modelled logic contract.

And for safety contract, I still have my opinion:

If an unsafe operation requires post-condition of an upstream crate:

  • If the safety "ensures" contract from upstream crate covers such requirement, you don't need to do anything.
  • If not (either no safety contract or not cover), you must check it at runtime.

The above rules make it possible to fluently upgrade a dependency, and you don't need to review third-party code for contract, or worry about how unsafe downstream client uses your code (a library writer only need to care about the correctness of safety contract, and breaking changes of safety contracts across major versions).

What's the difference between calling a safe function versus and unsafe function? Do you call the safe function if you're not careful enough to satisfy its requirements?

Here's a thought experiment. You have a crate providing fn sort(...) and unsafe fn sort_unchecked(...) where the safety requirements are the logic requirements and which simply forwards to the safe function in the initial version, providing maximum flexibility:

  • The author can use the safety requirements of the unsafe function to improve performance in the future in a compatible way.
  • Users can choose to call a safe or performant version depending on whether they are careful enough.

For safety guarantees it's the same idea but where it's the author deciding whether they are careful instead of the client.

Today, people usually consider that they should call the safe sort to avoid discharging requirements, because it's frowned upon to have to be careful avoiding bugs. But at the same time, they should be careful avoiding bugs when they provide guarantees, because anyone can use them to write unsafe code. This asymmetry doesn't make a lot of sense in my opinion. Users should be able to choose when and how much they want to be careful avoiding bugs, with the default of being fine writing bugs (i.e. safe by default).

Again, this looks good to me. But I'm still confused by:

I agree. And the "try" is important. It's not enough. You can't encode your logic contract in the type system. So you can't make your logic contract a safety contract (since the type system is about safety, not logic).

What do you mean "we can't guarantee that a verified program is logically correct"? You can very well verify a program (and even crate) to be correct for logic. You'll need more than the type system, but you can.

I'm also not sure what you mean by "it works as we expected within modelled logic contract". Do you mean we can only do dynamic analysis and not static analysis? This is not true. There are sound static analyzers for parts of Rust already (like VeriFast), and there's no theoretical reasons there can't be once Rust is fully specified.

I'm always careful when writing code. Calling a safe sort that double-checks something redundantly usually doesn't hurt me, so I call that. That's not an indication that I'm not careful in my own code.

By that criterion, all the functions that are provided by my crates would have full robustness guarantees because I consider myself a careful programmer.

Exactly. And that's what expected of you today. But Rust is a language supposed to let people write logic bugs without them turning into UB. And it partially fails to do that. It only does at top-level of the dependency tree (e.g. if you write a safe program).

Rust has (or aims to have) a comprehensive memory-safety model, which is designed as no memory-safety issue will be found under this model: no UAF, no double-free, no OOB-read/write. However, I can't image there to be a comprehensive logic model. There are so many logic issues in the real-world: deleting other's account even if you are not admin, writing to disks 10TB data every 10s.

As a result, if there is a logic contract in your verified code, you can't say that there is no logic issue in your code. You can only say that, your logic contract deals with privilege, and there is no privilege escalation issue in your code. If this is the ultimate goal of logic contract, I think it may be too weak.

I know :slightly_smiling_face: It's out of scope. :grinning_face_with_smiling_eyes:

I don't understand what your proposal changes in this respect. I'd still do the same thing if every function had to have a Robustness section in its documentation. I'd write: Robustness: yes, 100% for all my functions.

I wouldn't trust a library that says Robustness: no, this code may contain bugs. It doesn't seem like a good advertisement.