Feature idea: Defining custom variants of unsafe

I'm surprised the Unsafe Code Guidelines Working Group hasn't been mentioned in this thread yet. It's a work in progress, but they are hashing out what is and isn't undefined behavior (UB), what contracts must be honored at safe/unsafe boundaries and within unsafe blocks generally, etc.

For example, they've documented a distinction between validity invariants and safety invariants. The distinction is relevant to the discussion about transitivity; if you violate a safety invariant such as putting non-utf8 values into a String, for example, it may not immediately cause UB -- but it creates the possibility of safe code causing UB.

They call this library unsoundness:

[A] library (or an individual function) is sound if it is impossible for safe code to cause Undefined Behavior using its public API. Conversely, the library/function is unsound if safe code can cause Undefined Behavior.

The concept of unsafe generally is that a program with no unsafe is necessarily sound (has no UB). And a program with unsafe where every unsafe block maintains the proper invariants is also sound. (Formalizing the invariants is the work in progress.)

5 Likes

Small clarification: replace "every unsafe block" with "all code within the unsafe privacy barrier."

The standard example is Vec::set_len; its implementation is purely a safe set of a field, yet using it is obviously safe. It is perfectly possible for every unsafe block to be valid in isolation, but for safe code to fail to maintain the invariants required. All code within the privacy barrier needs to be trusted/validated, not just code in explicit unsafe blocks.

6 Likes

There's definitely something to this idea, @ChrisJefferson. I think Felix Klock has expressed interested in this area, if I remember right.

Ultimately it would be very exciting if these could act as "tokens" which prove safety, allowing unsafe to be traced exactly from where you uphold the invariant to where the unsafe operation occurs. This could dramatically change the whole notion of "unsafe scopes" so that invariants are properly tracked and it really is the case that only the unsafe block is where unsafe operations occurs. Essentially, the invariants would be typed!

(Of course, today we do a lot of that by just leveraging the actual Rust type system, so that for example a reference to T is an existence proof that T is still valid. But there's still a lot of things we can't prove with Rust's type system alone, which is why unsafe exists at all.)

Right now the way to move forward would probably be external static analysis tooling. If through external experiments someone could find a parsimonous, ergonomic set of primitives that would allow invariants to be tracked in a more automatically checkable way, that would be really exciting and there would be interest in trying to incorporate it into the language proper.


This just isn't even close to true! Security vulnerabilities that have nothing to do with the invariants Rust guarantees happen all the time. There was one in mdBook - a tool maintained by the Rust project and written in Rust - yesterday.

10 Likes

I agree with those other developers that using unsafe only for things that can lead to unsoundness is critical. We already have trouble with people thinking "how bad can it be?" for things that actually can lead to unsoundness (UTF-8 in str being a common one), so the more things use unsafe for things that can only cause logic errors the more of a problem that can become. Remember that plenty of bad things are perfectly "safe" in the soundness sense. It's safe to delete all your photographs. It's safe to upload your financial records to pastebin. It's safe to submit limit sell orders at $0.01 for all your stock. Etc.

So generally the way to do "tricky but not unsound" is with names much longer than the alternatives. You might have a less-tricky fn get_a(&self) -> &T; and a fn get_a_solemnly_swear_to_do_good(&mut self) -> &mut T. (Ok, maybe not quite that long of a name. And even the &T version isn't infallible if it has internal mutability, though of course usize doesn't. But I'm also unclear whether usize is your actual case, since for something small and copy like that the &mut is rarely critical.)

Now, all that said, are there any situation in which you could use DisjointPair such that you're relying on them being different for soundness of unsafe code? Because it's acceptable to require unsafe for things that can break those invariants if it's needed for soundness. For your fake example, imagine you wanted to have

impl DisjointPair {
    fn wrapping_distance(&self) -> NonZeroUsize {
        let d = self.a.wrapping_sub(self.b);
        // SAFETY: it's a soundness invariant of the type that they must be different, so the subtraction can never yield zero
        unsafe {
            NonZeroUsize::new_unchecked(d)
        }
    }
}

Then it would be necessary for a &mut Self -> &mut usize method to be unsafe, as otherwise safe code could cause UB.

So you have to decide which kind of invariant you're looking for.

4 Likes

Well, it also enforces (at compiler level) that the documentation exists at the call-site. To call the function, you have to check the "yes, I have read and agreed to the T&C of this function" box. And of course, anybody who later edits this code (your future self!) immediately notices that there are special T&C-s they need to take care of.

I also think that using "real" unsafe is problematic because it "cheapens" it, but having something like this is a good idea. The main concern is probably that it can get confused with "real" unsafe, which probably has a good reason for being special, because making mistakes there makes your program behave more unexpectedly than in the case of a logic error. One could argue that unsafe(myInvariant) { is different enough from unsafe {, just grep for unsafe[^(], but it's still a bit confusing. OTOH a new keyword is perhaps unjustified?

Either way, some kind of type safety would also be required for those "kinds of unsafe", so this is also something that needs to be defined. And (syntactic) interaction between multiple invariants. Do we write unsafe(Inv1, Inv2) fn? I also think it's better for "real" unsafe not to imply any of the other ones, but then how do you write a function or block that uses both?

1 Like

We discussed something similar last summer in the chat rooms of the Linux Plumbers Conference. It would be nice to be able to mark certain functions as "unsafe" (for some definition of unsafe) for different purposes. For instance, unsafe(interrupt), unsafe(signal), etc. which would require to be called from the same kind of context.

1 Like

Done! Add "Logic errors" as behavior not considered unsafe · rust-lang/reference@a747328 · GitHub

Not going that far. A couple of days ago, I made a question related to the usage of unsafe outside of the strict "memory manupilation issues" See this post.

From one side, defining custom variants of unsafe would be a nice idea since as I show in the post, the function can deal to Undefined Behaviour (in the context of the library) if the input is not verified or the function is not used correctly.

On the other side, I also witnessed that review, read or simply build upon unsafe usages that are not related to memory-management makes things more complicated. More unsafe blocks, and probably a worse experience in the sense of the code results.

It might be better instead to define this unsafe variants at compiler level, do it within rustdoc. With rustdoc we will be able to document all this variants and mark it in the docs html without enforcing the compiler to de-activate some features while reviewing the code section.