Blog post: Thoughts on trusting types and unsafe code

FYI, I wrote a blog post with some thoughts:

This is mostly me rambling, not reaching any firm conclusions. I will try to distill the most important nuggets into comments on the rust-memory-model repo in any case.


@nikomatsakis very interesting read. FWIW, as someone who is often writing “safe” code that unknowingly deals with references to static data, I am constantly making assumptions that the compiler is not optimizing things in the kinds of ways you describe as problematic. That seems brittle…

volatile_load, at least, saves us in many cases, but not in those you point out, where the caller passes in a non-static reference to static data.

@alevy that reminds me, we never finished our conversation on this topic – I was planning on writing a blog post specifically about aliasing and mutability to try and lay out my thoughts in full there.

That said, I am not sure if your code would actually be a problem. All of my examples concerned &i32 specifically, I didn’t talk about data that is contained within an UnsafeCell. From what I understand, in those cases where you are mutating static state, that state is owned by a TakeCell (variant on RefCell), no?

volatile loads have a similar story: if you need to require volatile loads (e.g., because of DMA or hardware changes), you should definitely be using a wrapper that ensures that only volatile load intrinsics are used (because of privacy).

(As an aside, ensuring that we can handle “volatile only” loads with grace is something I’ve been trying to keep in mind. That is, I think we do want the model to avoid allowing the compiler to insert “spurious” reads of pointers when it can’t prove that a read would have been done. In particular we would want to preserve the invariant that if a pointer is read using only volatile loads, we will never introduce other random loads (probably…?). It might be that a volatile wrapper should use UnsafeCell to inform the compiler that it can’t trust the data to be accessible from a shared ref.)

Put another way, the validity of some unsafe code ought not to be determined by the results of lifetime inference, since mere mortals (including its authors) cannot always predict what it will do.

If mere mortals could predict what lifetime inference would do than we could all right c safely. :stuck_out_tongue:

we are trying to enable intraprocedural optimization, and hence the fn boundary is the boundary beyond which we cannot analyze.

More generally I think you are onto something as your conclusions look obvias wants I have read them, but I would not have thought of them in advance.

It leaves me hopeful that when all is said and done you will have a model evan I can understand!

1 Like

One takeaway from this meandering walk is that, if we want to make it easy to optimize Rust code aggressively, there is something special about the fn boundary.

Rust generally guarantees that a lifetime that is passed as an argument to a function live will outlive that function (there is the dropck weirdness, where you can pass a lifetime dead, but hopefully we can handle that someway). I think that this guarantee is easy for people to understand, and I can't find a good reason for ignoring it even in unsafe code.


@nikomatsakis that blog post would (also) be a good read. It would also probably help me decide if my arguments have a leg to stand on.

I agree, I believe the kind of code that we write is not a problem (at least not the problem you lay out in this post). Just confirming your argument that expecting the programmer to have use their mental model to get this right is unreasonable.

I’m often the same person that writes the unsafe and safe code, and I still find it necessary to be able to treat most of the code as thought the compiler won’t inline in ways I don’t expect. This, indeed, means being diligent about using constructs like VolatileCell inside the statically allocated structs such that volatility is enforced at the edges of a data structure

(this, unfortunately, often means there are some examples loads that could be elided but aren’t. Although I haven’t seen a case where it’s clear to me how even an ideal language could communicate that to the compiler).

If volatile load intrinsics are not treated as loads for optimization purposes, references to volatile memory act the same as references to freed memory: in correct code the compiler will not be able to prove that a later load occurs, and things might blow up if the compiler inserts any loads. Either both should be allowed, or neither should.

The only caveat I can think of is that with normal memory, anything which the compiler can prove (perhaps even with a dynamic check) is on the same page as something that gets read - depending on the platform’s minimum page size - must be safe to load as well, though whether the value of that load is useful depends on other parts of the memory model. For example, if, say, compiling for x86 and optimizing for size, a 16-bit load to to %al of an address which the compiler knows is 4-byte aligned could be changed to a slightly more succinct 32-bit load to %eax, if the compiler doesn’t care about the high part (%ah). More commonly (but a bit less generally), multiple loads to parts of an aligned struct can be optimized into fewer larger loads plus bit shifts.

With normal memory, this is fine even if the extra memory loaded is technically unallocated. But if someone creates a reference to a hardware register and dereferences it as normal memory, under the theory that extraneous loads to that particular register are harmless, then the compiler adding loads that affect other registers could cause unexpected behavior.

In practice, that’s a really dangerous assumption to make anyway, since prohibiting the compiler from splitting up loads, or combining loads where all of the covered memory actually belongs to one of the loads, is inconceivable, yet registers are often picky about access size and alignment. Or even if not, I think current optimizers already do the kinds of optimizations I said could be dangerous, so the assumption is bogus regardless. There’s no reason to want to allow this kind of thing; just use volatile intrinsics for all I/O memory. I only mention it because it distinguishes freed and truly volatile memory.

1 Like

This makes some sense to me. It is partly why I said it seems like it would sense to type "volatile" memory as being within an UnsafeCell and thus interacting with it through *mut references. (That said, we often make (transient) references using the & operator that are immediately coerced to unsafe values, so we have to account for that.)

interesting nonetheless :slight_smile:

Could not agree more with your conclusion.

One more thing, related to the question of how much benefit can actually be derived from compiler optimizations depending on various aspects of the memory model.

Something about Rust that’s long bothered me is that you sometimes end up using ‘useless’ types like &i32 or &&T or even &Box<T> or &&[T]. Sometimes this happens because of generics, other times because of encapsulation (i.e. you don’t really have &Box<T> but &U, where U is an opaque struct whose private data currently consists of a Box). By useless I mean that unless (a) you care about pointer identity or (b) you’re using the kind of unsafe mutate-behind-the-compiler’s-back shenanigans alluded to in the blog post, there is no runtime benefit to using a pointer rather than passing a copy of the underlying data. So it would be nice if the compiler could automatically transform the former to the latter. Sometimes this can already happen as a side effect of inlining, but not always. Even with mutable references, in some cases it would be advantageous to transform f(&mut x) into x = f(x), keeping the value in registers.

In Niko’s example, being able to defer the load until later in the function would be nice, but eliminating the load altogether would be nicer. Indeed, in practice there would often be little or no actual benefit to doing the former, depending on how much other register pressure existed at various points. In other cases, such as if something later in the function did its own load of *v and the compiler could merge the two, there’d be more of a clear win, but then some potential memory model rules exist that would allow that but not the original transformation. Fully de-pointerizing v only works if the reference is forbidden to change value behind the compiler’s back at any point during its existence - at least during its actual runtime lifetime - even long before any code actually loads from it. (It can’t work at all for cells, but the compiler already special-cases those.) And I think it’s at least possible that it could provide a significant practical win.

But wait, what about pointer identity? If you have the value, you could always make a new temporary pointer if required, but you can’t get the original pointer. I once thought perhaps references shouldn’t have pointer identity guarantees for this reason… but that ship has sailed; those guarantees are unambiguously provided by stable Rust. And there are other things that don’t work without the original pointer, like a fn foo(t: &T) -> usize that’s secretly implemented as { t as *const T as usize } - can’t use a new temporary if it escapes.

Well, one possibility is providing both the pointer and the value, which is expensive but might still be advantageous in some cases. But more useful is basing it on interprocedural optimization. All you need is for the compiler to calculate a flag per reference function argument that means “I never cast this to a raw pointer, pass it along to another function with the flag, or do anything else unusual, so just give me the value”.

Significantly, this can even work across crates due to Rust’s compilation model: the flag could be stored in crate metadata. C can’t even do this across source files in a single project without LTO, and it fundamentally can’t do it across dynamically linked libraries because it’s expected that the library implementation can change without recompiling clients. In the future Rust might want to make similar guarantees about a stable ABI, but there are so many performance caveats w.r.t. inlining and generics that I expect stable-ABI items or crates will be explicitly marked somehow, preserving the ability to optimize in the common case.

By the way, I was wondering whether similar flags could be used to replace memory model assumptions: to accept a need for interprocedural optimization (yuck, I know, but UB is also yuck), but make it relatively ubiquitous. But I think this doesn’t really work (even without de-pointerization), since with a conservative memory model it’s hard to predict what kinds of operations could be dangerous in the first place. Also, it’s probably too brittle for core optimizations: trait objects and function pointers break all such static reasoning, for example.


Regarding the idea of

One thing that I find intuitively appealing is the idea of trusting Rust types everywhere.

the example concludes with

If this seems like a bad idea, it is. The idea that writing unsafe Rust code might be even more subtle than writing C seems like a non-starter to me. =)

I'm not entirely convinced it's more subtle than C. My impression is that one of the difficulties with C is that the mental model people use has varied over time as compiler writers have teased optimisations out of implications of the spec (strict aliasing being a popular example). By contrast, we can't say how subtle the rules in Rust until those rules are written down! Rust also has the advantage that any rules can only be violated in unsafe blocks, as opposed to everywhere in C.

As an experiment, let's say the rules (very informally) for an immutable reference are "for the lifetime of the reference it is a) valid to dereference and observe the value and b) not permitted for the value to be mutated" and mutable reference are "while the reference is active (not reborrowed) (a subset of its lifetime) it is a) valid to dereference and observe and/or mutate the value via this reference and b) not permitted for the value to be observed or mutated by any other means" (so a & and &mut pointing to the same place with overlapping lifetimes would be disallowed because it's not valid to dereference and observe the & during the overlap).

Given this, the alloc_free example given in the blog post isn't unexpected - the user has materialised a type that we are expecting to be able to trust, but it's easy to see that a lexical lifetime would cover a region of invalid dereferencing, violating the rules.

Going back to the tootsie pop blog post, the "split-at-mut-via-duplication" example is also easy to identify as invalid under these rules - it materialises an active &mut that, if dereferenced, would be observing or mutating a value observable from another &mut. Instead, you would need to make the argument &mut inactive first (through reborrowing just the first half of the slice) and then materialise a &mut to the second half of the slice.

The "refcell-ref" is a little trickier (though it's a slightly more tricky example anyway) because it needs the realisation that, by imposing additional rules that determine whether the reference is valid to use, this is no longer a reference that rust can trust - it needs to be a pointer, since you can't currently model the behaviour in the rust ownership system.

Overall, I'd say that these informal rules (assuming they were formalised in some way) can be applied to code without being unreasonably subtle. Instead of 'subtlety', I'd say that the downside of this model is that it's probably not really "human friendly" (a term used in the tootsie pop blog post) - I've taken three examples that @nikomatsakis says 'should probably be ok' and rejected all three. Not a promising start! There's also the curious interaction that non-lexical lifetimes could actually make these rules harder to reason about.

However, perhaps there's value in 'baseline' models like this as a first step in releasing unsafe guidelines that can be relaxed later? If I were told that following these rules would result in unsafe code that would never be optimised incorrectly, I'd go and make sure all my unsafe code complied. And perhaps 'baseline' models would narrow down the model search space, though there may be a risk of painting Rust into a corner that way.

[W]e are trying to enable intraprocedural optimization [...] without doing any interprocedural analysis.

The above is two quotes from the blog post, quoted out of order and combined. I think reducing the idea to this shows that the premise may (not is, but may) be flawed. Having read everything that's been posted on this topic so far, I don't understand why it is so important to avoid interprocedural analysis when doing interprocedural optimizations.

Here are some clarifying questions:

  • Do C or C++ compilers do these kinds of optimizations without interprocedural analysis? That is, is this an issue of parity with existing high-performance compilers, or is this an attempt to leapfrog them by exploiting specific properties of Rust? Is there an implicit goal that these optimizations can be implemented prior to inlining and in particular solely in MIR, instead of in LLVM? (It seems obvious to me that such optimizations should be done primarily after inlining, but maybe there is some benefit to also doing some before inlining.)

  • Let's say that avoiding interprocedural analysis is really a goal. Does giving special meaning to function boundaries actually achieve this? In fact, it seems to me that it instead just moves the interprocedural analysis step from the compiler to the programmer. In particular, consider the "extract function" refactoring and its inverse. In order to implement such refactorings when the function boundary is treated specially as hinted in the blog post, the refactoring engine (which is probably a human) would need to do interprocedural analysis to ensure the refactoring is safe, right? And, in fact, when function boundaries are treated specially, it may not be possible to implement such refactorings safely at all, right?

  • When a function is inlined, presumably LLVM is doing load/store optimization using interprocedural analysis naturally. When a function is NOT inlined, how does the function call overhead (including pushing parameters onto the stack and whatnot) compare to the savings from load/store optimization? Is it really common to have cases where a non-inline function gets called and where such load/store optimization makes a significant difference?

  • Are the benefits of such optimizations so significant that it is worth making it very easy (it seems too easy from my understanding of the proposals) to write bad code? The Tootsie-Pop model seems quite analogous to how C violated the principle of least astonishment with its undefined behavior stuff. I would expect that, if such a model were implemented, compiling Rust code with a -fno-tootie-pops flags would become the stated best practice much like -fno-strict-aliasing is recommended today.

One takeaway from this meandering walk is that, if we want to make it easy to optimize Rust code aggressively, there is something special about the fn boundary.

  • I assume that when you say "optimize Rust code aggressively" you are talking specifically about eliminating or moving loads and stores. It isn't clear, though, how much such optimizations matter. Could high-priority optimizations like eliminating unnecessary bounds checks, unnecessary copies during moves, unnecessary zeroing be implemented without doing this?

the fn boundary is the boundary beyond which we cannot analyze – within the fn body we can see more.

This isn't strictly true, at least because of inlining, which enables the function body to be analyzed in context. And, even outside of inlining, many of these optimizations are done using full interprocedural analysis when link-time-optimization is enabled, right?


I think it’s quite important to consider that the memory model not only impacts what transformations can a compiler do, but also what transformations can a programmer do. Rust is often advertised as a language in which a junior programmer can code and not break the whole project. Let’s keep it that way.

That’s why I think that “trusting types at function boundary” is a good idea. Consider a programmer trying to refactor a long no-unsafe function such as patsy. And imagine that this refactor results, among other things, in moving access of the reference from before to after the sketchy collaborator function. I just think it would be ridiculous to forbid that, and even more ridiculous to make it legal or illegal based on mere existence of unsafe keyword in the same module. (Note that the case of unsafe keyword contaminating vec module (discussed in previous blog post about TPM) is a little bit different, as the safe modification of len variable affects unsafe blocks; and in this case, presence of unsafe affects safe functions.). There could be a middle ground of unsafe contaminating only a struct and its methods, but this seems too compicated and ad-hoc to me.

We have a precedence of treating function boundary specialy – it needs to be explicit and acts as a type inference boundary. I don’t see why we shouldn’t make it special for memory model too. (There’s also the one nasty thing called unsafe fn, I’m not sure whether to trust types of those functions’ signatures)

Speaking of interprocedural optimizations without interprocedural analysis – we could enable (by memory model) only such transformations that a programmer could also make. Consider the argpromotion (passing references as values) optimization, that @comex talks about – this definitely requires storing some flags per function – an information that a programmer looking on the function alone can’t know. So this is an compiler optimization. It doesn’t need changes in memory model and it can be made in simple way using some interprocedural analysis (it’s even done today for private functions). So we can put a threshold for things allowed by memory model as things that allow ‘reasonable programmer’s transformations’ or ‘beneficial compiler optimizations, which are not possible with simple interprocedural analysis’. Apologies if this paragraph is not written clearly, but I hope it conveys the idea.

Returning to the examples from blog post, I think that the common source of possible optimisation unsafety there is the &*raw_ptr pattern, which can create &'static references out of thin air. Maybe we could treat lifetime of these references as 'unsafe until they cross a safe’s function boundary? The &'unsafe references would be treated by an optimizer as raw pointers. (I’m not proposing to add 'unsafe lifetime to the language (I think it was proposed though), the notation is just to mark a reference which lifetime compiler knows nothing about)

1 Like

I like this idea.

By the way, to quickly repeat something that's been pointed out in the past: UB isn't nearly as scary if it can be tested for. If the rule is to trust types, rustc could have a special unsafe-checking mode that inserted a read at the beginning and end of every reference variable's life and checked for equality; combine with scribbling over dead variables, malloc scribbling, and maybe valgrind to help with corner cases, and it should be able to catch most errors. (This won't work if volatile values are allowed to live in ordinary references, incidentally.)


This would cause different behaviour if you took some safe code from an unsafe-tainted function and split it out into a different function - no unsafe code has been touched, but you've been affected by it. I'd argue that a more natural boundary for where you can trust rust types is the safe<->unsafe boundary itself, constraining both rule-breaking and the results of rule breaking in the same way. But this is just "Tweaking the concept of a boundary", as it's referred to in the TPM post.

However, just to explicitly point it out, trusting types at either the function or safe<->unsafe boundary means that the "refcell-ref" example in the TPM post is invalid and the & in Ref must be a pointer.

Regardless of the rules, I suspect something like this is necessary - no matter how forgiving Rust is (short of any unsafe code disabling a bunch of optimisations everywhere), some people are going to run afoul of the rules. Perhaps Rust can differentiate itself to C/C++ by having both a 'good-enough' model that people can rely on and really solid tools to help you out (like the Go race detector - if it complains, you know you have a problem).

1 Like

Lots of reasons. First off, you don't always have all the code available. If you call code from another crate, or through a trait object, I'd like to be able to optimize well without having to know just what code you are calling.

In the general case, no. But there are ways to enable it. For example, C99 added the restrict keyword precisely for this purpose.

Both. Consider the case of Fortran vs C. Fortran is generally held to be the faster choice for numerical computing, precisely because of its strong aliasing guarantees:

[Fortran and C] have similar feature-set. The performance difference comes from the fact that Fortran says aliasing is not allowed. Any code that has aliasing is not valid Fortran but it is up to the programmer and not the compiler to detect these errors.

To a large extent what I've been advocating for the TPM is that we optimize unsafe code like C compilers, but we do better with safe Rust code. But the pushback is all about doing better than C compilers with unsafe code. Which I am definitely sympathetic to! =)

To be clear, this would only be in unsafe code, but yes -- this is one thing I am worried about!

Certainly every model has its hazards. In the case of the TPM, the hazard is moving code that used to live inside an unsafe boundary outside of an unsafe boundary. In other models, the hazard might be outlining code into a separate function. I agree it's important to minimize these hazards -- ideally not at the cost of making things complicated everywhere. =)

I am intrigued by this idea. I cannot understand it. The goal of the Tootsie Pop model, indeed, is to basically emulate the effect of -fno-strict-aliasing. I'm not even sure what "opting out" of it would mean -- forgoing optimizations on safe code?

Actually, nothing in my post talked about eliminating loads or stores. That, indeed, is somewhat easier -- if you have two loads, you have some evidence that the pointer is valid between them. What I talked about was code reordering -- specifically, moving a load without evidence of another load to come. And I'm not sure how important it is: maybe it's not. Mostly I'm currently in a mode of trying to push on the boundaries of what the safe type system ought to allow: I focused on this test case because it's one where, if you only look at the actual loads, you wouldn't be able to do.

I'm not sure where your list of "high priority" optimizations came from. I don't know that this particular question (can we move a load later past unknown code) will affect the answers much. But I promise you that, whatever the case, the less we have to know about the surrounding code, the better. =)

As an aside, from conversations I've had with people working on the C specification and the like, there is very little data on which instances of UB help the most with optimizing and how. I'd love to see citations and references on this!

I am wrestling with two things a bit.

First, the Rust type system is ultimately a static approximation over some more fundamental concepts. You can see this in the discussion about non-lexical lifetimes -- if we were to change lifetimes from lexical to non-lexical, a lot of code would now be accepted. And we claim it would still be safe. So clearly there is a more fundamental underlying concept.

To some extent this is what people are referring to when they talk about UB. UB defines the contours of execution traces that are acceptable. The Rust type system guarantees you stay within those contours, but there is a large space of behavior that it can't accept which would also be valid. So we want unsafe code to be able to explore that area with as few rules and complexity as possible. This was sort of the underlying idea behind the TPM.

But, it's not that unsafe code is unrelated to the Rust type system. For one thing, a lot of unsafe code relies on other code following the Rust type system in order to be safe. Consider something like this:

fn foo<'a>(&'a mut self) -> &'a u32 { unsafe { ... } }

Here we can rely that we have exclusive access to *self for the entirety of 'a. But what does that really mean? Currently, it means "so long as the return value is in scope" -- if we switch to NLL, it means "so long as the return value is in use". Note that there is a difference there, though I know of no way to write unsafe code that exploits that difference.

But in any case we are relying on our callers to obey the Rust type system's narrower countours so that we can safely explore this area that lies between. This is precisely why people say that "unsafe code doesn't compose well". And of course given that we are using unsafe code to define some basic properties of the Rust language -- e.g., unwinding, threading etc -- it's essential for us to set some broader limits on what unsafe code can do that go beyond the core type system.

(For example, I think that longjmp is just fundamentally incompatible with Rust code, since it can cause a local variable's destructor to be skipped; even though we say that values may be leaked, we still intend for people to be able to rely on RAII if they own the value in question.)

Anyway, add to all this the fact that I think unsafe code authors have enough to think about without burdening them with thinking about the lifetimes of transient references that they create and use. This feels perilously close to C's TBAA rules. I feel like I'm pretty "in the know" when it comes to aliasing and so forth, but whenever I write low-level code in C I definitely get an itchy, nervous feeling trying to decide if I am properly conforming with TBAA rules. I don't think Rust's unsafe rules should feel the same.

1 Like

Yes, so this is the direction I've been going in a bit (continuing the train of thought from that post). Basically trying to express the "type system guarantees" a bit more concretely -- the following properties must hold, at these points.

Another way to think of it would be a kind of "operational semantics" that includes safety checking. e.g., imagine every bit of memory has a version number. When I borrow &foo, I get back a very fat reference that also has the version numbers for all the owned memory (this is just a mathematical construct, obviously, since that could be an arbitrarily large amount of version numbers). Then whenever I load I can check that they still are valid. Or something like that. (Extending this to threading is kind of ... fun.)

Not sure if this will lead anywhere productive yet. =)

Yes, I like this idea. I've often thought that we can try to "bubble up" optional information this way. Though I'd prefer to get what we can without needing it. (And -- yes, it'd be nice to replace &u32 with just u32.)

I am intrigued by this. (Edit: I guess this concern is partly what @briansmith was getting at?) Imagine that the declaration were different. For example, imagine that one tagged the module as containing unsafe code:


In that case, the "scope" of an unsafe block could be the innermost enclosing #[unsafe_scope]. If there is none, then it is the entire crate.

This would mean you could confine the scope of an unsafe block to something within the function, to (maybe?) address the unchecked get use case:

fn do_stuff(x: &[u32]) {
  let mut sum;
  for i in 0 .. x.len() {
    unsafe { 
      sum += *x.unchecked_get(i);

I'm not really saying I like this idea, just "tiling the space" a bit.

I have to admit I am still eager to find a way to trust types more, even in unsafe code. I like being able to use Rust reference types to express sharing and uniqueness, since I think that is a super common thing to want to talk about. I just want to find the right balance that gives us a lot of room to optimize without limiting us or making the rules too complex.