"Tootsie Pop" model for unsafe code

nikomatsakis · June 1, 2016, 7:38pm

This I think is a very key question. I have not made up my mind to what extent we get to answer it. It's another area where I'd love to try and gather some empirical data about what code in the wild does, but it's going to be a bit tricky to be sure. I'm not sure how I'd test for this sort of pattern right now.

In any case, if we want to declare the Ref type as illegal, I think it opens the door to moving the abstraction boundary to a fn granularity by default. There are still further interesting questions we would want to examine, but this example is definitely an important one.

I think if you think of things in terms of permissions, what we are saying is that the "permissions" that you get from having a reference to a struct are always equal to the full set that you would get from the types of its fields, plus whatever other invariants you maintain through your API (i.e., that the length field is in scope). Using types like *const T (which have no associated permissions by default) are a way to let types define their own permissions.

I have been thinking some about a model based on this approach and I definitely think it is a viable alternative. Ultimately, I'd like to see us pursue various options so that we can make an informed final choice. I will try and carve out some time to write out a more detailed blog post here, I think, that goes into a bit more depth.

comex · June 1, 2016, 8:56pm

FWIW, I'm under the impression that MSVC doesn't use TBAA, e.g. Bing and http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20140407/102846.html...

and somehow manages to do well on benchmarks anyway.

One other difference between Rust and C – though I'm not sure how much it matters; for all I know this is a non-issue: generics lead to a fair amount of unnecessary use of pointers. For example, if you iterate over &[u32], you get &u32, not u32. If you sort, cmp is called on a pointer; even == in generic code translates to PartialEq, which takes self and other as pointers. Of course, inlining is supposed to take care of the vast majority of these cases, but I think sometimes you do end up with non-inlined function bodies taking unnecessary pointers, in which case noalias may make a difference by reducing the number of times they have to be dereferenced. Also, MIR optimizations take place before LLVM inlining; I suppose the plan is to inline some things at the MIR level? but in any case where that doesn't happen, those optimizations will be hindered.

C doesn't have MIR optimizations, but C code wouldn't use unnecessary pointers in the first place.

And of course, C does have restrict; it's uncommonly used, but not unheard of.

arielb1 · June 1, 2016, 9:49pm

I think the categories are:

Purely safe code
Safe code with assumptions: does not break the type-system in the expected case, but does operations that can’t be proven safe. May have UB when called with bad input.
Heavy unsafe code - takes the type-system to the edge - uses raw pointers, creates structs ex nihilo, etc.

We (obviously) want to do all possible optimizations in purely safe code, and to be very conservative with optimizations in heavy unsafe code. Because heavy-unsafe code should be rare and manually well-optimized, this should not be much of a problem.

However, safe code with assumptions is pretty common - the “single unchecked_get for performance” is quite common and completely safe, and some domains such as gamedev have all of their code safe with assumptions. We can, and want, to optimize these cases heavily.

Ericson2314 · June 1, 2016, 11:02pm

Safe code with assumptions

Do you mean code that doesn't use any unsafe primops and must be unsafe just because the calling of unsafe functions?

Ericson2314 · June 1, 2016, 11:11pm

So here's the heart of the matter. Both your definition---I'll call it "verb" aliasing because the emphasis is on the action of reading and writing the same memory through different pointers---and "noun" aliasing---the mere existence of pointers to the same address---are useful and needed. If either one knows there are no such reads and writes, but knows nothing about pointer values, or knows there are no pointers with overlapping destinations but nothing about reads and writes, one is sure that no aliasing problems will arise in practice. Indeed the &mut invariant is a hybrid of the "noun" and "verb"---there are no other overlapping pointers barring these ones we know we won't write with.

Speaking of the &mut invariant, one can interpret &mut (the type) as a pair of an address and a proof that that invariant holds. Types are also a "very local" analysis in that we are describing properties of a value that hold true regardless of what we do with that value. So when we say that a &mut has this invariant, we really aren't supposed to rely on the "rest" of the program does. In particular despite what I said above about this invariant being a mix of the "noun" and "verb" definitions, the "know we won't write with the borrowed pointers" part comes not from the fact that we know what the rest of the program does, but because since those pointers are borrowed there is no possible rest of the program that would use them an the &mut in question simultaneously. In other words, the substructurality of the type system allows it to handle this invariant as another "very local" property.

So back to your example, while the programmer can know with confidence that there are now writes with the aliased programmer, what really matters is that programmer is sure that the compiler won't take advantage of the "proof of a lie" the very existence of the &mut comes with, and that strikes me as less obvious--especially in the presence of things like mir-level in-lining where what code came from what function ought (IMO) to be erasable (if we don't do this module-level unsafe boundary stuff).

nikomatsakis · June 2, 2016, 11:50am

I like this way of thinking about it, though it seems like these may be orthogonal categories? Interesting.

nikomatsakis · June 2, 2016, 11:52am

This is an interesting statistic, but not necessarily directly applicable to us. In particular, in safe code, we can use TBAA (and indeed an even stronger variation). My impression is that TBAA helps very much in specific scenarios -- e.g., LICM, auto-vectorization -- where it enables stronger code motion because it lets you break (hopefully false) dependencies. I think that the majority of these scenarios will be in safe code, not within the body of an unsafe method. But again this is something we will want to empirically investigate. (And, naturally, it depends on just how we draw abstraction boundaries and so forth.)

nikomatsakis · June 2, 2016, 11:54am

I think my single biggest takeaway from this thread is the importance of the “unchecked” (e.g. get_unchecked or from_utf8_unchecked) use case – basically a “drop-in” use of a function that is unsafe, but only barely. I agree we should make sure that this is easy to do and does not result in the surrounding code being considered unsafe. Still weighing the best way to do this in balance with other factors though. =)

notriddle · June 4, 2016, 11:03pm

I’m pretty sure Rust doesn’t have TBAA because safe code doesn’t gain anything from it, and unsafe code doesn’t want it. References don’t gain anything from TBAA because they’re already incapable of mutable aliasing; you don’t need to look at the types to determine that.

Ericson2314 · June 5, 2016, 5:22am

@notriddle well, it doesn’t matter that the parameter of the reference type is, but it does matter that it’s the type is &mut _ or & _! It’s TBAA done right!

comex · June 5, 2016, 8:14am

FWIW, I've been under the impression that Rust's current lack of TBAA is a feature / guarantee, mostly due to this lack of utility. It seems that I'm not the only one - I found two crates on crates.io that try to provide a safe wrapper around transmuting between POD types:

https://github.com/trustless-fox/rs-lense

I think capnproto-rust does it too, though it's too complicated for me to figure out what it's doing at a glance.

Thus, in theory, introducing TBAA to Rust, even limited to safe code, could cause breakage. Though, to be fair, probably not: the only case where TBAA in safe code could be of any use is with Cell-based types, and people probably aren't using those crates with those.

notriddle · June 5, 2016, 5:14pm

but it does matter that it's the type is &mut _ or & _! It's TBAA done right!

The difference is a bit more subtle than that. As @comex mentioned, and I overlooked, cell types actually could benefit from TBAA.

But more importantly, Rust's current aliasing rules for references are neither a superset nor a subset of C's TBAA.

fn foo(bar: &mut T, baz: &mut T) {
    In Rust, bar and baz cannot point at the same address
    In C, they can point at the same address, and they can be mutably aliased
}

fn foo(bar: &T, baz: &T) {
    In Rust, bar and baz can point at the same address, but they can't mutably alias
    In C, they can mutably alias
}

fn foo(bar: &T, baz: &U) {
    In Rust, bar and baz can point at the same address, but cannot mutably alias
    In C, they cannot point at the same address
}

fn foo(bar: &Cell<T>, baz: &Cell<T>) {
    In Rust, bar and baz can point at the same address, and can mutably alias
    In C, they can also mutably alias
}

fn foo(bar: &Cell<T>, baz: &Cell<U>) {
    In Rust, bar and baz can mutably alias
    In C, they cannot point at the same address at all
}

nikomatsakis · June 6, 2016, 2:44pm

I agree with this. Not using C-like rules for TBAA, at least, is one of the few things I feel pretty sure about in this whole discussion. =)

This is true, though it's also true for most restrictions we might add. This is, indeed, one of the points of my blog post, though this concern has not seen a lot of discussion on this thread.

As I wrote in the post, I'm less worried about backwards compatibility as I am about forwards -- I expect people will continue writing code very similar to what they write now, whatever we say. But it's a bit hard to say: right now, there are no clear rules. If we can make rules that are fairly simple, even if they break some existing code, it's possible that they will be read by enough people that future code won't break them (or won't break them much).

Aggressively optimizing would also help, of course, since it would make things stop working faster -- or else some way to run your code and give lints or warnings.

glaebhoerl · June 6, 2016, 9:43pm

I assume whatever rules we come up with about what is and isn’t defined behavior, MIRI should end up following those rules exactly, and report an error if and only if behavior isn’t defined? That seems like a more reliable way to test than to hope that aggressive optimizations will have immediately obvious symptoms. But that would still only be able to test whether code has defined behavior for one set of inputs at a time, not all possible inputs (you’d need something like symbolic execution to do that?).

I wonder where a Rust interpreter precisely following the memory model would fall on the spectrum between an English-language specification and a fully-formalized specification in something like Coq, in terms of providing confidence that the specification isn’t broken in some way.

nikomatsakis · June 7, 2016, 12:27am

Yes, this has certainly been suggested. How effective it will be as a general purpose mechanism for detecting potential problems remains to be seen, and will depend a bit upon the precise rules, I imagine.

gnzlbg · June 7, 2016, 11:06am

A different approach could be to mark the variable required to drop a mutex as a volatile so that the compiler can inline drop and reorder before/after droping the mutex, but not across the drop itself. Not that this is much better than the current state of things since it effectively introduces a barrier.

gnzlbg · June 7, 2016, 11:18am

I guess the keyword here is potentially but which aliasing information are we feeding to LLVM? We used to pass noalias metadata to LLVM which is "equivalent" to C's restrict annotation which is where the wins are, but we are not passing this metadata to LLVM anymore. Is there any other type of aliasing information we are passing to LLVM or are we just inlining as much as possible and hoping that LLVM optimizer is able to recognize Rust patterns?

gnzlbg · June 7, 2016, 11:32am

That sounds more like a reason to implement Undefined Behavior Sanitizer.

While it would be awesome to catch some forms of undefined behavior in unsafe code at run-time this is a titanic task.

There are easy problems like catching out-of-bounds memory accesses that are solved in C++ with the AddressSanitizer. When you go into reading uninitialized memory the current solution is to use the MemorySanitizer. This requires you to recompile the whole world with instrumentation down until libc, and link against an instrumented libc and memory allocator (basically this means recompile everything down to the OS kernel). For being able to instrument code that links dynamic libraries MemorySanitizer users run-time valgrind like instrumentation for dynamic libraries and exchanges the information at the interface at run-time. This gives an idea of the scope of the problem: just to sanitize reading uninitialized memory you need to instrument everything. And we haven't really reached the hard spots of undefined behavior like accessing an inactive member of an union which currently have no known solution. And if you add data-races (ThreadSanitizer) and some other easier forms of undefined behavior (UBSan) you end up with 4-5 run-time instrumentation stacks that each makes the whole program "only" factor 2-5x slower but that cannot work together at the same time (you have to basically sanitize your program for each of the possible types of UB independently from the others). Doing so for multiple types of UB in general is not yet a solved problem and we are very far from doing so for most types of UB in C/C++.

With this I don't want to discourage anybody out of trying to implement sanitizers for undefined behavior but rather put in perspective that sanitizers are not the ultimate solution to this problem yet and won't be for the forseeable future. A lot of projects use ASan, UBSan and TSan in C/C++ to detect undefined behavior at run-time but very few projects use MSan to detect reads to uninitialized memory due to their inhability to recompile the world (and resort to best-effort slow tools like valgrind as much as they can).

notriddle · June 7, 2016, 4:03pm

That’s fair enough, but not really applicable to the reference aliasing rules, which are the only form of UB that anyone is discussing right now. Most other forms of UB we’re either stuck with (reading uninitialized memory or the wrong variant of a union) or we just made them defined (arithmetic overflow).

I think we can handle sanitizing the aliasing rules, if we do a decent job of defining what they are, so making them schizophrenic seems unnecessary and would probably make writing a sanitizer harder.

rkjnsn · June 7, 2016, 6:44pm

Doesn't the BTreeMap iterator use raw pointers for the root and node? Even if the rule were that "no &mut reference can alias any other reference once one leaves the unsafe block", I don't see how that would affect BTreeMap. Indeed, I think that like in the Ref case, raw pointers are the correct solution when one wants to keep a pointer that does not meet the type-system requirements for & or &mut.

The more I think about this problem, the more I like the solution of "type-system invariants must hold outside of an unsafe block. Code within an unsafe block is free to break these invariants, but they must be restored before leaving the block." The easiest way to do this is to have any variables that violate type invariants scoped within the unsafe block. To me, this seems like a simple rule that is easy to remember, and the boundaries are simple and obvious (the unsafe block, itself). Indeed, it is what I would intuitively expect and try to uphold in my own code. It's true that the unsafe block would have to be careful not to expose broken invariants to any safe functions it might call (e.g., it couldn't pass two aliasing &muts to a safe function), but not exposing invariants to code outside the boundary is necessary for any boundary.

The split-at-mut-via-duplication example can trivially be made to follow this rule by expanding the unsafe block:

impl [T] {
    pub fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) {
        unsafe {
            let copy: &mut [T] = &mut *(self as *mut _);
            let left = &mut self[0..mid];
            let right = &mut copy[mid..];
            (left, right)
        }
    }
}

Note that due to scoping, even in the event of a panic, no code outside of the unsafe block can observe aliasing.

I don't think an author failing to follow a rule that was not thought out at the time, and which we are just now trying to formulate, is a sign of doom. I agree with @arielb1 that this code appears to be pretty obviously invalid, as the lifetime on the reference is just plain wrong. As I have mentioned before, I think trying to make this case valid should be a non-goal. Further, I think that once we have the rules regarding type-system invariants and unsafe written down, including examples such as invalid lifetimes and aliasing &muts, it will be very unlikely that such code with be written or pass code review.

Topic		Replies	Views
Blog post: Thoughts on trusting types and unsafe code Unsafe Code Guidelines	47	6261	March 25, 2019
Types as Contracts Unsafe Code Guidelines	118	12426	March 25, 2019
Next steps for unsafe code guidelines Unsafe Code Guidelines	13	5300	March 25, 2019
Unnecessary `unsafe` block… language design	22	3262	May 18, 2022
Add rustc flag to disable mutable no-aliasing optimizations? compiler	126	8454	April 10, 2021

"Tootsie Pop" model for unsafe code

Related topics