Language vision regarding safety guarantees

Rust protects you from bugs, but I don't envision anybody wanting to deliberately advertise that their code may be buggy and users have to protect themselves from such bugs. That would just seem like a rubbish library.

Ok, I think that's where our hypotheses differ. I believe there is a comprehensive logic model, it's simply the semantic model of the language. It's low-level in the sense that you talk about observable events (FFI calls, ASM stories, volatile accesses), but for specific platforms you can have higher-level events (read a file, send a UDP packet, etc), on top of which you can build arbitrary user properties. But I agree this is out of scope for this thread. At least we agree on the safety contract part which is on topic :slight_smile:

1 Like

I guess the complexity here is something that @scottmcm brought up in another thread:

I think there's a core tension here between the desirable requirements "a BTreeMap<K,V> for which I know that K's Ord implementation is correct should always be ordered", and "BTreeMap<K,V> should be memory-safe regardless of how badly behaved K's Ord implementation is". Both of these requirements are potenitally important to avoid memory-safety vulnerabilitiies: the latter requirement is required if someone writes a broken Ord, and the former requirement is required if unsafe code relies on a BTreeMap<u32,V> to be correctly ordered.

For the first requirement to hold, btree_map::CursorMut::insert_before_unchecked needs to be unsafe, which effectively declares having an unordered BTreeMap<u32> to be library UB.

For the second requirement to hold, btree_map::CursorMut::insert_before_unchecked needs to be safe: otherwise, giving a BTreeMap a K with a broken Ord implementation is library UB, because it leads to an unsafe method being called without actually upholding its safety requirements.

As such, we basically have a weird case of specialisation here: the implication is that btree_map::<K,V>::CursorMut::insert_before_unchecked is safe whenever K is a downstream type (and that having an incorrectly sorted BTreeMap is library UB in this case), but unsafe whenever K is a standard library type (and that having an incorrectly sorted BTreeMap is not library UB in this case, just a logic error).

I'm not sure that there's any sensible way to cover this gap. (It almost seems as though the "safety level" or "trust level" should be some sort of generic parameter, but I can't think of a way to make that work.)

I don't think there's actually a gap here, because if there is it's the same gap as "well arbitrary safe code correctness is also relevant to soundness", which in the limit means that everything needs to be unsafe, which isn't helpful :person_shrugging:

This is wrong. It's OK for the function to exist and be unsafe. It's up to the caller to make sure they don't call it with invalid type / input combinations.

But the caller (BTreeMap) does call it with invalid type/input combinations today, according to its stated safety requirement, if the key type has a broken Ord implementation.

The safety comments in BTreeMap's source code for the calls to insert_before_unchecked assume that Ord is non-broken, which is not a sound assumption given that Ord is a safe trait.

Safety requirements are for external callers from outside of the abstraction boundary. When calling functions in your own module / crate you can ignore documented safety requirements.

1 Like

I would say that if the type's Ord implementation is incorrect, then insert_before_unchecked's safety requirement should (but is not currently written to) not apply. This resolves the conflict: in the case where nonsensical inserts can happen from safe code, nonsense is sound.

3 Likes

Possibly related to all this: part of the reason I was concerned about all the unsafe in BTreeMap is that it makes it harder to prove it to be sound, and I just came across a bug report demonstrating that BTreeMap does actually double-free with some Ord implementations. (In this case, the incorrect assumption made by Node is that Ord does not panic. an assumption that is wrong even for some standard library types like RefCell<u32>.)

I think the confusion about what unsafe actually means with respect to something like BTreeMap is probably responsible for this. The problem is that Root::split_off, which does not have any stated safety requirements, calls move_suffix, which also does not have any stated safety requirements, but move_suffix actually cause the BTreeMap to get into an invalid state (one in which consuming-iterating over it will cause a double free). This can cause unsoundness in safe code because Root::split_off continues by calling a user-provided trait method (which could panic, and then surrounding code could catch the panic in order to iterate over the invalid-state BTreeMap), but I would put the blame here on move_suffix for not being marked unsafe. (Suspiciously, move_suffix has a large unsafe block with no safety comment above it – but I think that isn't the problem that causes the unsoundness, rather the unsoundness is caused by the fact that it breaks invariants and does not document that it does so.)

The whole BTreeMap implementation seems to have been written using a principle of "let's not care about soundness requirements for our own use internally, and just mark the public APIs as safe or unsafe according to whether use from outside the module would be safe or unsafe", but this means that it isn't benefiting from the Rust compiler helping to catch soundness issues, and thus allows bugs like that to sneak through. I would have hoped for better from the standard library (i.e. using only a minimum of unsafe code that is carefully vetted to ensure soundness).

I would tend to disagree (if I understand you correctly): the main reason to use a BTreeMap rather than a HashMap is when you need sorted entries or even to look at what is before/after a given entry.

The tradeoff is between storing backreferences for every tree node persistently in the data structure (O(number of nodes)), and storing backreferences for every tree node on the route to the object currently being iterated over temporarily in each iterator (O(log(number of nodes) * number of iterators)).

My reasoning is that although you often do use iterators with BTreeMap, you don't normally use very many of them, compared to the number of elements you iterate over. Allocating a small amount of extra memory for each iterator does hurt, but it doesn't hurt as much as allocating extra memory for a backreference in every tree node does (especially given that if you are iterating over the whole tree, you would need to pay the cost of loading all the backreferences from main memory – loading them from the iterator instead is faster, because they would already be in cache).

The extra cost wouldn't be needed for "element before" / "element after" queries, only for iterators and cursors.

1 Like

This is perfectly fine and just a matter of style. This is why I wrote that the concept of unit of implementation is subjective. Within a unit of implementation, there are no contracts (see below for nested contracts). The contracts are at the boundaries of the unit of implementation. And you prove that this unit of implementation is correct with respect to those contracts.

The maximum unit of implementation is the union of a crate with all its pinned dependencies recursively. In large projects like the standard library, it makes sense to consider smaller units of implementation like modules, since it would be too costly to review all of the standard library for correctness when changing arbitrary small and isolated parts. One could obviously push this even further (in particular for large modules) and consider every item (like function definitions) as units of implementation.

Since units of implementation have a natural nesting behavior (a crate with its pinned dependencies is made of crates, which are made of modules, which are made of items), there is naturally a notion of "nested contracts". In particular, an item has an "item-level" contract (its contract when seen as a unit of implementation), but it also has a "module-level" contract (the part of its module contract when that module is seen as a unit of implementation) if it's an item exposed outside the module, and similarly for its "crate-level" contract if it's an item exposed outside the crate. Contracts can always be rewritten into equivalent contracts, such that nested contracts imply the contracts their nested in (item-level implies module-level which implies crate-level). This is a "contract implication" so variance needs to be taken into account between requirements and guarantees.

In the case of insert_before_unchecked, there's at least 2 contracts: the one when seen from outside the standard library and the one when seen from inside. But there could be more depending on how the authors want to split the units of implementation. The problem as you noted, is that the unsafe keyword is unique, so it cannot necessarily match all contracts. The contract it must match is the highest-level contract (highest in the sense of most public, so crate-level for public items). So it's possible that an internally safe function (no safety requirements when used within the module) is marked unsafe because it's publicly unsafe.

What I mean when I say that "safe code cannot cause UB" is that if there's UB, you can always attribute it to an incorrect unsafe block.

I know that you sometimes end up in a situation where the unsafe block looks like this:

let x = library::foo();
// SAFETY: Library is implemented correctly.
unsafe { ub_if_library_is_incorrect(x) };

If that safety comment turns out to be wrong, well shucks. The author of that unsafe block took a risk, and it didn't work out. So this is a scenario where:

  1. There is UB.
  2. The fix is to change the safe code, and not the unsafe block.

Given this, was the unsafe block wrong? Well, there is certainly one way in which it's wrong: It had a safety comment saying "library is implemented correctly" and, well, it wasn't implemented correctly, so it had an incorrect safety comment. That makes the unsafe block wrong.

Of course, there's also another way in which it's not wrong: When we fixed the bug, the unsafe code did not change. If it did not change and is now correct, it must have been correct all along?

So given that, I agree the situation is not 100% clear.

However, here's what I think is the right way to look at it: UB is often caused by several things going wrong at the same time. After all, we could fix the UB by changing the unsafe block too! Delete the unsafe { ub_if_library_is_incorrect(x) } line of code, and voilà, no UB. So I guess the UB was caused by that unsafe block after all!? Sure, the bug in library remains, but the UB is gone.


Enough about UB, let's talk about vulnerabilities.

CVE-2024-27308 is an interesting example. Mio documents a guarantee in its docs, and Tokio relied on it in unsafe code. It turns out that the windows implementation violated this contract in some circumstances.

At the time, it was actually argued to me that the CVE should be filed under Tokio because in mio it's just a bug, and it's Tokio that actually triggered UB as a result. However, my take is that this is Rust idealism. There is no other programming language where you would come to the conclusion that the CVE should be filed under Tokio. This conclusion stems purely from the concept of "unsafe" and how closely it's tied with UB in our community. Filing it under mio makes much more sense because, for instance, in a CVE you list which versions are affected and also the version in which the bug was fixed. Those questions cannot be answered if it's filed under Tokio because the fix was to change mio.

So did mio cause this CVE or did Tokio cause it? Well, on one hand the bug is in mio and the fix was to change mio, so it seems clear that mio is the cause. However, it's not quite that clear. As the CVE says, versions of Tokio prior to v1.30.0 were not actually vulnerable because prior to v1.30.0, Tokio would use the mio token as a key into a map, rather than cast it to/from a pointer. If we had never made this change in Tokio, then I do not think this mio bug would have been filed as a CVE. It would just have been a bug. So while the unsafe code in Tokio did not cause the bug in mio, it did cause the CVE.


Given an instance of UB, the list of causes that came together to trigger UB may include some causes that are entirely safe code. However, if so, then that cannot be the only cause, and one of the other causes will involve unsafe code.

I do not think wording this as "safe code cannot cause UB" is unreasonable.

3 Likes

I’d say that there is a narrow, pedantic sense in which the unsafe code should change, when viewed at package granularity: its dependency on the library with the bug should be changed to require the fixed version as the minimum version.