Types as Contracts

Over the last couple of weeks of my internship, I have been working on a proposal for the “Unsafe Code Guidelines”. I got some very encouraging feedback at the Mozilla All-Hands, and now I’d like to put this proposal out there and start having a discussion.

If you have any comments, please raise them – the entire purpose of this post is to get the proposal out there so that we can see if this approach can suit the needs of Rust programmers and compiler writers alike. If there are explanations you think are missing or confusing, please bring that up as well; I may then update the blog post accordingly.

https://www.ralfj.de/blog/2017/07/17/types-as-contracts.html

10 Likes

@Zoxc pointed out in IRC that I missed one way for values to flow from unsafe code back to safe code: &mut. For example, the following code should be UB:

fn bad(b: &mut bool) {
  *b = unsafe { mem::transmute(5) };
}

fn main() {
  let mut b = true;
  bad(&mut b);
  // When the borrow ends, we should verify that the boolean is still valid
}

Some other topics which I deliberately left as future work for now are global variables (these probably need validation at the beginning of every function?) and NonZero.

EDIT: Another clarification: UnsafeCell is only exempt from validation when checking in immutable mode. &mut UnsafeCell is not treated specially.

EDIT 2: I did some refactoring of the post, reordering sections 2.2 and 2.3 to hopefully make it all flow better.

As I mentioned on IRC, I’ve written some unsafe “safe” functions that were protected by a module boundary, and so I would prefer that unsafe be side-effect free if possible.

On multiple occasions I’ve needed to unsafely implement a “safe” trait, such as a visitor trait. Since the function signatures in the trait definition are “safe”, I can’t mark the functions as “unsafe”, even though the functions are, in general, unsafe to call. In these cases, I dealt with the situation by making the type that implemented the trait private so that code outside of the module couldn’t access the unsafe functions.

That would however be modules that contain unsafe code somewhere, right? So, something like "unsafe taints the module" would still work here? (Not saying that's a great solution, just trying to gauge the design space here.)

Yes, the module would contain some unsafe code, so having a notion of unsafe-tainted modules would work here. Note though that the unsafe code could be in a submodule, so it’s not clear where the unsafe boundary would end (i.e. would the crate root be tainted because some sub-submodule used unsafe?).

That seems rather dangerous. Presumably the safety of this whole thing depends partially on how, when and where other code invokes the trait methods. If that code is outside your control, then a logic bug in other people's fully safe code can trigger UB in your "safe" method. That's a very bad place to be in, and so I'm questioning whether this is a use case that should be supported.

I say presumably because I can't imagine a scenario where the methods would be unsafe to call in general, but the rest of your module could ensure it's safe regardless of what the third-party, trait-using code does (as long as it's safe/adheres to the contracts). Please elaborate on your scenario if that's actually the case.

1 Like

@hanna-kruppe

Presumably the safety of this whole thing depends partially on how, when and where other code invokes the trait methods.

The concrete example I'm thinking of looked something like this:

// Outside of my module:
pub trait Visitor {
    fn visit_thing1<T: Visitable>(&mut self, x: T::Thing1) -> { ... }
    fn visit_thing2<T: Visitable>(&mut self, x: T::Thing2) -> { ... }
    ....
    fn visit_all<T: Visitable>(&mut self, x: T) -> { ... }
}

// Inside of my module:
pub struct MyVisitable { ... }
impl Visitable for MyVisitable { ... }

// The privacy of UnsafeVisitor means that we can't call the unsafe "safe"
// methods of <UnsafeVisitor as Visitor> outside the current module.
struct UnsafeVisitor { ... }
impl Visitor for UnsafeVisitor {
    fn visit_thing1<T: Visitable>(&mut self, x: T::Thing1) -> {
        // do some thing that is only safe conditioned upon precondition 1
        ...
    }
}

pub fn do_the_thing(visitable: MyVisitable) {
    let mut visitor = UnsafeVisitor { ... };

    // Setup / confirm precondition 1 here
    ...

    visitable.visit(&mut visitor);

    // Use the `UnsafeVisitor`. After this point, precondition 1 could be false
   ...
}

As you can see, it doesn't rely on the external logic being correct. It merely relies upon invariants that are managed within the module.

That said, you could easily imagine this code relying upon e.g. the fact that MyVisitable::visit will result in a call to UnsafeVisitor::visit_thing1 exactly once for every Thing1 in MyVisitable. I think that even in this case, there's a serious argument to be made that the code is valid.

Unsafe code has to be able to rely on the behavioral characteristics of at least some specific, known safe code, even if it's from a different module. As a somewhat hyperbolic example, consider that the implementation of Add for usize is "safe"-- Add isn't an unsafe trait, so it'd technically be "safe" for <usize as Add>::Add to be redefined so that it always returned 0. I'm positive that such a change would break mountains of totally reasonable unsafe code and cause tons of illegal memory accesses. While it may be incorrect to assume that an arbitrary implementation of Add::add is e.g. commutative, associative, or distributive, it's certianly not wrong to assume those behaviors of a specific, concrete implementation of Add, such as the one for usize.

Integer addition is indeed an extreme example, because it's essentially a language built in. If we distrust it, we might as well distrust the compiler's ability to make locals live as long as they're declared to live. You're right that unsafe code authors need to be able to rely on something, but it's just as impractical to not rely on (e.g.) integer addition as it is to rely on arbitrary third party code being logically correct. Let me put it this way: If a logic bug in third party safe code causes unsafe code to trigger UB, then with a few exceptions this UB is the "fault" of the unsafe code and the unsafe code should be fixed. The exact extent of those exceptions could be debated but it's certainly no larger than some subset of std plus safe code the unsafe author controls (and ideally, lives in the same module – if not, at the very least everyone performing an audit will curse you).

Otherwise, we'd back in the C and C++ situation where perfectly innocent errors anywhere in the code base may have arbitrarily bad consequences anywhere else. That reasoning about safety is localized and composable is precisely what Rust (and Ralf's notion of contracts) is about.

It may of course be a pragmatic decision to bet the absence of UB on third party safe code being logically correct, but only in the same way it might be a pragmatic decision to do other "wrong" things in unsafe code that are unlikely to cause problems in your case.

@hanna-kruppe

The exact extent of those exceptions could be debated but it's certainly no larger than some subset of std plus safe code the author controls ...[otherwise]... perfectly innocent errors anywhere in the code base may have arbitrarily bad consequences anywhere else.

I think that's a really persuasive argument-- I agree with you that it's really valuable to limit unsafe code's reliance upon the behavior of safe code, especially safe code from a different module.

What did you think of the concrete example I gave? In that example, no assumptions are made about the behavior of safe code outside the module, but it's still necessary to use privacy to enforce a safety boundary.

Absolutely agreed. I see no fundamental problem with that. It sounds perfectly reasonable for me that e.g. unsafe code my use HashMap and rely on it, actually, implementing a hash map. I don't see any way around this. Fundamentally, unsafe code can make any code "critical" by relying on it being correct.

What is important (and you mentioned this) is that the safe code that we rely on be known and fixed. If you write a library that takes a closure, you don't get to rely on this closure doing anything specific. On the other hand, if you depend on a particular other library and call it, you know that third party code cannot just "swap out" that library's code for something else behind your back, so you get to rely on which code is executed.

The general rule is that a user of the safe interface of your library must not be able to break anything. A user cannot change which hash map you use.

I don't think we can avoid unsafe code to rely on functional correctness of other, safe code. That would be a big burden. Reasoning about safety is still composable in the sense that you can (and should) be aware and very explicit about the assumptions you are making about 3rd party code, and ideally check with their docs that indeed, this is something they guarantee to uphold. With that interface established, one can then separately verify that the 3rd party code indeed upholds the guarantee you want, and that your own interface (assuming the guarantee) is safely encapsulating the unsafe code.

4 Likes

Yes, it's a nice example and seems safe. In the specific example you gave, MyVisitable and the impl Visitable are even in the same module, meaning nobody should have an issue with including it in UnsafeVisitor's trusted surroundings.

In general though, I see an issue with trait methods that would need to be unsafe. One can't always control the implementing type as tightly as you did, and a separate trait splits the ecosystem and duplicates code (not to mention the difficulty of respecting whatever conditions the unsafe method has in a generic context). A similar issue exists with const-ness, e.g. there's not a clear way towards generic const math because Add::add and friends can't and won't be const fns)

Yes.

I'd rather shift more of this burden to the unsafe code authors. For example, they should probably audit the code w.r.t. the guarantee they rely on. I'm not saying nothing should be trusted, but I would like very strong (social) barriers on how much is assumed, to keep the trusted computing base relatively small. But these issues are far afield from this thread, so I suggest if someone wants to discuss this further they fork a new thread.

  • What about weird things like Unique<T>, used for the pointers inside Box, Vec, etc.? Should it be UB to mutate the pointee while the pointer is borrowed by someone else? (Probably; otherwise you’re giving up on significant optimization potential.) If so, there would need to be a way for user code to hook into the contract system, because the pointee may be something other than a single valid instance of T, e.g. for Vec it depends on the len field.

    • I suppose the hook would be a trait that asks a container to walk its contents and somehow indicate types of, and pointers to, all contained values. …That sounds a lot like the same functionality required for tracing GC!
    • But it also seems more intrusive than what should be strictly necessary. With noalias, you can say “if someone loads a pointer from this data structure, it’s UB for someone else to mutate it”, without actually having to know whether there is a pointer or where. But it’s hard to check something like that.
  • It seems like we haven’t gotten any further towards resolving the tension between performance and understandability illustrated in Niko’s original ‘Tootsie Pop’ proposal from last year. Your swap example is similar to Niko’s split-at-mut-via-duplication, which is why you propose making “functions containing unsafe blocks” the boundary for increased permissiveness - and thus reduced performance. But Niko goes on to provide more examples suggesting that the boundary should be the entire module containing unsafe blocks. Yet that, of course, would be a bitter pill for unsafe coders to swallow, since the whole point of unsafe code is usually to increase performance; and in the followup, he provides an example (get_unchecked) where deoptimizing even at the function boundary would feel unfortunate. And then there’s his third post which suggests maybe abandoning the idea of trusting Rust types, but without any real concrete alternative. Well, except the ‘asserting-conflicting-access model’, but your proposal doesn’t go in that direction.

    Not that I need to summarize the issue for you, since you read and responded to those posts :slight_smile: But I wanted to bring it up as context, since your progress makes it more urgent that we make a decision.

    I’m starting to suspect there may not be any truly satisfying resolution, other than picking the least bad option and supplementing it with ugly compiler knobs, like the function attribute you suggested. Which is unfortunate. But then, bad is relative. Whatever the definition of UB, if it can be reliably tested for, that’ll be a heck of a lot better than what you get with C (where people can’t even decide what the standard means)…

1 Like

For now, there's nothing special about Unique. And indeed making it special is hard in this framework as it lacks the type information to do so. I am not yet convinced that this is strictly needed for the optimizations I was shown so far, but I also did not look at them in full detail yet.

Well, one possible avenue that my approach opens is having programmers write explicit validation statements. Also, one idea for get_unchecked is to have different "unsafe levels", and get_unchecked would be marked as "unsafe but doesnt play games with types"; so validation -- and optimization -- could still happen.

But yes, there are certainly unsolved problems around here. This seems to be pretty much the same as your first point though? Or do you have other optimizations in mind?

I guess it depends what you mean by 'strictly'? In this code:

let b: Box<SomeStruct> = ...;
foo(b.field);
// intermediate code not using 'b'
bar(b.field);

it would be desirable for the compiler to know b.field couldn't have changed, assuming it had previously inlined the two calls to Box's Deref impl. Not sure how else you could achieve that, except maybe automatically transforming the code to deref once and reuse the reference? But that transformation might not be safe for arbitrary Deref implementations... am I missing something? :slight_smile:

Not sure what you mean exactly... the points could both be classified as "unsolved problems", but I see my first point (Unique/containers) as relatively minor, something that could be deferred to a future extension, whereas the second goes to the core of what unsafe code is allowed to do.

It’s not obvious from the post whether you view unsafe code as being unsafe blocks or operations which are only permitted within unsafe blocks. I personally view them as operations which are only permitted within unsafe blocks and the blocks are only syntax which has no impact on code generation. For example, if I have the following code:

let x = 0;

Then changing that to:

let x = unsafe { 0 };

Should have no impact on the semantics of the program.

In the example below:

// Defined elsewhere: unsafe fn ptr::swap_nonoverlapping<T>(x: *mut T, y: *mut T) { ... }

pub fn swap<T>(x: &mut T, y: &mut T) {
    Validate(Acquire, [x, y]);
    let x = x as *mut T;
    let y = y as *mut T;
    unsafe {
        Validate(Release, [x, y]);
        ptr::swap_nonoverlapping(x, y, 1);
    }
}

The Validate(Release, [x, y]); is placed there not because there’s an unsafe block, but because the subsequent operation is unsafe and uses x and y in ways that cannot be validated automatically.

At a higher level, one of the concerns I see is limiting optimizations too broadly because ‘we’ collectively let unsafe code ‘poison’ other code too liberally. I suspect that thinking about unsafe code as operations and not as blocks will make that eventuality less likely.

First of all, Boxes are special in that they incur a write lock just like mutable references do.

However, for your example, that wouldn't even be needed... if the code doesn't use b, then of course b is not modified. What am I missing? The interesting case IMHO is calling a function and not passing b to it. That is handled just like mutable references: If you do not pass a box to a function, you keep the write lock, so the box cannot be accessed by the other function.

From my understanding, Unique is what unsafe code authors (want to) use to get the optimizations that they otherwise couldn't because raw pointer types are too weak. That's why your two questions seem closely related to me.

Good point! My plan was to go for unsafe blocks, just because that's simpler. But everything should work just as well if "unnecessary" unsafe blocks do not matter. However, notice that the compiler already warns about unnecessary unsafe blocks.

I don’t like it; it feels too much like tootsie pop, and imo, tootsie pop is not a good way to go forward. I want types to have the exact same guarantees no matter where they are.

At least for the aliasing stuff, I prefer an access model based on a preorder (initial draft):

1 Like

By “intermediate code not using ‘b’” I meant potentially including function calls, or anything else the compiler can’t analyze, like writes through raw pointers that (if not for special casing) could potentially alias the pointer in the Box. (Anyway, in my example, foo() is executed between the reads.)

First of all, Boxes are special in that they incur a write lock just like mutable references do.

Okay, sounds like a good idea (I didn’t know because it’s not mentioned in your post). Of course, there are other types which could take advantage of the same optimizations, such as Vec, Rc, Arc, etc., and in general it’s best to avoid special-casing standard library types - so eventually it would be good to have a way to customize locking behavior in ‘userland’.

You are right. :slight_smile: I should probably mention this.

My model permits the current function to write to this memory, no matter which pointer is used. So the compiler would still have to prove that all writes that happen inside this function do not alias b. For safe writes that is easy; for raw pointers it is not -- my model is conservative here in the sense that &mut and *mut could actually alias if used in the same function.

So you are saying that multiple accesses to a (v: &Vec)[0] should be known to return the same thing even when unknown functions are called in the mean time? That's tough, it requires teaching the compiler about the fact that Vec "owns" the heap-allocated buffer, so it can remain locked even when we don't hold any reference into it. This goes beyond controlling when validation intrinsics are added, and into actually controlling validation -- in some declarative way so that the optimizer can make use of this.

Yep, and I acknowledge it’s tough. Well, tough to implement in a way that’s coherent between validation and actual optimization. On the optimization side, the pointer field in a Vec<T> is already a Unique<T>; if loads through Unique pointers are given LLVM noalias metadata, I think that’s enough to let LLVM make the optimization in question. (Which AFAIK was part of the original motivation for Unique, given the comment about aliasing guarantees, but isn't currently implemented, as Unique isn't a lang item.) But it seems like validation would have to use a totally separate mechanism to indicate the range of addresses to lock, given that that depends on the separate len field.

Also, I guess there are weird semantic corner cases. Consider this code:

let vec: Vec<i32> = vec![1, 2, 3];
let mut foo: i32 = 1;
let out_of_range_index = vec.as_ptr().offset_to(&mut foo as *const i32).unwrap() as usize;
let read1 = *vec.get_unchecked(out_of_range_index); // this reads `foo`
foo = 2;
let read2 = *vec.get_unchecked(out_of_range_index); // ditto

As far as I can tell, under your model, this code is perfectly legal - at least if the Vec contents happen to be located before foo in memory, so that vec.get_unchecked(out_of_range_index) doesn't wrap around the address space. Even if the Vec contents were write locked, that wouldn't cause any conflict with accesses to a different memory region. But if LLVM optimized multiple accesses to the same index, it could coalesce read1 and read2 into one read, even though the value changed in between.

Then again, that isn't just an issue for Vec, but any kind of array, including fixed-length arrays, which I'd assume your proposal already covers. Yet it would be highly desirable for Rust's semantics to allow taking advantage of LLVM's existing noalias functionality. Perhaps that code would have aliasing assumptions disabled due to using unsafe? Or perhaps it should just be independently considered UB to index a slice out of bounds (even using get_unchecked), regardless of aliasing?