Simpler mental model for unsafe

I do think classifying during review by "UB risk category" is a good idea. But I'll note that the guidelines you've linked would classify a #![forbid(unsafe_code)] crate as ub-risk-0 even if a downstream crate relies on its correctness for soundness. An example could be scopeguard — absolutely no unsafe in its implementation, but often used for adhoc drop glue for maintaining/restoring safety invariants.

Keeping unsafe reasoning "shallow" is assuredly desirable, as is calling out any trusted upstream properties. The objection is entirely on "assigning" the soundness burden on the safe upstream code when it lies squarely on the module using unsafe to trust the used upstream code as-if it were within the module (safety boundary).

Maybe that "blame game" is just due to your initial choice of notation (-> unsafe T), since you later specify that producing an additionally constrained "super-safe" value as safe (and unsafe to utilize that constraint for soundness). I just think it's important for the modularity of unsafe that the soundness burden is always clearly placed onto the author of unsafe block to trust any relied upon upstream code as if they wrote it (or sometimes even more so).

Aside

This suggests an interesting approach to unsafe blocks in unsafe fn — use internal unsafe blocks for any encapsulated safety concerns, but don't for any that are propagated to the caller. "Safety: the caller promises this is safe" comments aren't useless (they show the author thought about the requirement) but they aren't particularly helpful either.

But also yes, if you rely on any properties of an upstream crate for soundness, you can consider there being an entirely implicit crate-level unsafe impl for the crate API for uploading said properties, capturing the fact that the crate boundary is an "input" the same way generics are, given the capability for downstream to substitute any package to satisfy the dependency. Even if there's an unsafe trait impl involved, since the trait definition could be substituted as well.

2 Likes

Ok, so we agree on that. The only point where we disagree is thus how much benefit they actually get. I'm claiming they don't get as much as they could if they had additional language support (or that they might over-estimate the benefit they get with an improper mental model). Whether the benefit is worth the effort is orthogonal and up to those projects.

I agree with you that minimizing the burden of soundness is critical. But there is another dimension which is the traceability of soundness (making the blame game explicit in the syntax). And there is some trade-off between both, reducing the burden reduces traceability, and increasing traceability increases the burden. Different crate owners should be able to choose where they want to stand in this trade-off. Unsafe reviews would then grade those crates appropriately based on their position in this trade-off. There's no enforcement in the language. There's only policies by some actors.

The problem I'm pointing out and the core of our disagreement, is that the whole spectrum of the trade-off is currently not accessible (without introducing boilerplate with unsafe trait wrappers). If you care about complete traceability, the language doesn't help you, and you can't have tooling for it (there's an overkill hack though, see below).

Yes, that's one way to see it. But then this should go against the burden of soundness right? Because now the upstream crate has a burden it didn't use to have. Without mentioning the fact that a crate-level unsafe impl is overkill. Only the relevant functions need to be encoded using unsafe traits, thus minimizing the burden of soundness.

Note that there's actually a way to "solve" or work around the problem. During unsafe review, if an unsafe block depends on the correctness of a safe function in a dependency crate, that dependency crate must be tagged for unsafe review each time it is modified (including the first time it's introduced in the review system). That's overkill (triggering more reviews than needed) and similar to your implicit crate-level unsafe impl, but it works to restore traceability in today's Rust.

EDIT: Thinking about it, that hack is probably quite reasonable. Because there shouldn't be a lot of crates of which soundness is depended on by unsafe blocks. So it should be quite tenable, even though not perfect. Also note how the burden of proof is on the unsafe reviewer and thus the actor that cares about traceability of soundness.

I absolutely do agree that tooling for tracking "soundness relevant dependencies" for review is a good thing that should ideally exist; that's essentially the "flood fill" algorithm for trust that I alluded to.

The disagreement I think comes down to that I don't think (and feel reasonably comfortable saying Ralf agrees) that notating this trust in the upstream source is useful. The careful downstream should still be reviewing upstream patches regardless; if you trust your upstream to maintain "for downstream soundness consumption" properties, how is that any different from trusting them to maintain documented "correctness" properties of similar maintenance complexity?

Strictly speaking, the burden in the "unsafe crate" model isn't on the upstream library, it's on the binary author who links together the crates, i.e. provides an instantiation of the "generic" encoded by the crate interface. Yes, the default resolution cargo does is rarely overridden, but it can be, and doing so can be arbitrarily unsafe without unsafe. Just like some codegen flags are unsafe (e.g. running a binary compiled to use target features you don't have) in that they can cause runtime UB.

unsafe is "pay attention to the input; someone's (potentially) relying on it for soundness." For unsafe trait, the "input" is the unsafe impl. Rust doesn't have a way to say "pay attention to the output" and I don't think that's a meaningful barrier to soundness review. Flanderizing my position, it's that annotation beyond "unsafe input" call-out doesn't confer benefit until you eliminate the possibility of a "clever enough idiot" bypassing proof assumptions like "the entire source tree agrees on a single target cfg" (despite Cargo allowing setting arbitrary cfgs and not prohibiting setting any of the "well-known" cfg keys).

Ensuring soundness reasoning can remain reasonably minimal, shallow, and compartmentalized is a great goal. Knowing a somewhat tight bound on what locally safe code receives what downstream trust is a part of that. But that's mostly a concern to in-the-large code review (thus belongs in code review tooling) and not in-the-small code review/authorship (thus belonging in the source).

To be completely fair, the cost of adding code annotation "side tables" is meaningful when usually a source diff is plenty for diff review when combined with tests and maybe a couple automated PR comments. But if you're going to the extra effort to explicitly audit upstream soundness, you almost necessarily have infrastructure capable of tracking those side tables set up already.

2 Likes

I also think that's the disagreement. I indeed believe annotations in the upstream source can be useful (although I agree not critically useful given the current simple usage of unsafe and given that they could anyway be written outside the upstream source in some review system by those who care about correctness of that upstream library).

Proving correctness is much more complicated than proving soundness. To prove soundness you only need to prove correctness of the parts of the code relevant to soundness (and we identified those parts as being very small). To prove correctness you need to prove correctness of all the code. In particular, I don't think those third-party reviews have a category for correctness (it's absolutely not realistic). They only care about malicious behavior, cryptography, and soundness (but I might be mistaken). And triggering an unsafe review is much more costly than other types of review, because it requires unsafe reviewers which are difficult to find, because of the complexity of the task.

Yes, that's a good point. In particular because it puts the burden of the proof on the entity that actually cares about soundness. However, this may duplicate work, which cargo-vet could factorize back, by essentially having an entity write the missing annotations in the upstream library (not necessarily in the code itself, but in some review system).

Yes, that's related to the point you made above and the hack I mentioned in my previous reply. I agree this looks like a satisfying solution. I'm just not sure that's what's being done though (the tooling seems to only mention triggering on the unsafe keyword), I'll have to check.

1 Like

While continuing to work on the next version of the mental model to incorporate the concept of robustness (the composable version of super-safe) which significantly improves the model, I realized that it should not be recommended to document robustness unless a significant amount of unsafe code depends on it. The reason is that it would force unsafe reviewers to review those properties even though they may not directly affect any unsafe code in the actual dependency tree of the project relying on this unsafe review, thus wasting effort according to that project.

Even in a world where everybody uses and trusts a single cargo-vet registry (thus never wasting effort), if no unsafe code in the world relies on this robustness at the moment, then that's still unnecessary work. The robustness documentation should only be added when some unsafe code in the cargo-vet registry depends on it. In a world where there are multiple cargo-vet registries, this should be decided per registry and becomes a bit messier. So I can see how preemptively documenting a function as robust (i.e. moving part of the correctness documentation to a # Robustness section or doing the same in the cargo-vet review system to not burden the crate owner) is a bad idea. I still believe doing it once there is a legitimate use, would bring value (even if it's just in the review system and not in the crate, to transfer burden from crate owners to unsafe reviewers).

2 Likes

I finally finished the full rewrite of the mental model taking into account this discussion. The main differences are:

  • Making audience and non-goals explicit.
  • Adding an introduction to type theory (needed for subtyping and variance). This is probably an accessibility issue but it may be improved later (and is anyway useful as a programming concept).
  • Moving from the proof type (of kind Prop) to the update type (sigma type) and deriving the proof type back.
  • Distinguishing 2 predicate behaviors: adding unsafe values (restriction to use) and removing safe values (permission to use). This requires predicates to be on the validity invariant.
  • Introducing the adjective "robust" for types that are missing safe values, and explaining the duality between unsafe and robust through variance.
  • Pointing out that unsafe for builtin types actually comes from robust types in contra-variant position.
  • Splitting mutable reference types between the temporarily owned type and the type returned to the lender at the end of the borrow, which I write &mut [T .. S] where [T .. S] symbolizes the lifetime and S is the type returned to the lender. This is critical to understand why mutable references can be unsafe (their final type S is robust).
  • Describing a lower bound to the validity invariant (replacing contra-variant types with bottom).

The discussion can also partially move to the issue tracker if there are concerns better handled on Github.