Stacked Borrows: An Aliasing Model For Rust


#21

How does the model cover conditionality?

x: &mut i32, y: &mut i32, flag: bool
let mut ptr = x as *mut i32;
if flag { ptr = y as *mut i32; }
...
if flag { 
    *x = 1;
    unsafe { *ptr = 2; }
}

#22

@jonh This is a dynamic model, it just looks at whatever code is executed. So assuming flag did not change, that code is fine.


#23

I just spent quite a while disagreeing with you and attempting to come up with a counterexample, and upon repeated failures I’m now convinced you’re right that the models are basically the same. So here is my proof sketch that, in fact, you are correct (at least with the interpretation of the partial order that I was mentally using). I think this helps clarify what the different points in the design space are, by looking at where the proof sketch might fall apart.

First of all, note that since the total order is compatible with the partial order, anything accepted by the partial order is accepted by the total order, so it suffices to show that anything accepted by the total order is also accepted by the partial order. The only place where timestamp ordering is relevant is in checking the validity of shared references. In a case accepted by a total order, there are three relevant timestamps: t_freeze, when the location was frozen, t_create, when the shared reference was created, and t_use, when the value is read from under the shared reference. For this to be accepted, t_freeze <= t_create in the total order, and no writes to the location are scheduled between t_freeze and t_use. In addition, because of the data dependency, t_create must happen-before t_use. (N.B. This may not hold in the presence of weak memory, since the reference could have been read out of a promise.). Then, in the partially ordered setting, at the moment of t_create, the creation of the reference would trigger freezing the already frozen memory location. In my mental model the freeze time was actually an antichain frontier, so this would add t_create to the freeze time. Then, upon t_use, there have been no writes, so the location is still frozen, and the ordering confirms that t_create (in the freeze time antichain) happens-before t_use, allowing the read. Q.E.D.ish.

There are 2 big ways that I can see this proof failing:

  1. If t_create does not happen before t_use. This may be totally possible with weak memory orders. Essentially, the question is whether the following should be allowed:
    unsafe fn thread1(x: &Atomic<&'static u32>, ptr: *mut u32) {
        x.store(&*ptr, Ordering::Relaxed);
    }
    fn thread2(x: &Atomic<&'static u32>) {
        let value = *x.load(Ordering::Relaxed);
        println!("{}", value);
    }
    
    Note that the retagging at the return of load has the same effect of requiring validity as the actual read.
  2. If the freeze time is modeled as a single time instead of some frontier. This could make sense, but rules out
    unsafe fn thread1(ptr: *mut u32) {
        println!("{}", *(&*ptr));
    }
    unsafe fn thread2(ptr: *mut u32) {
        println!("{}", *(&*ptr));
    }
    
    as only the first freeze would “win” and the second reference would be invalid.

In both those cases, I’d actually like for the code to be accepted, so… color me convinced that, to the extent that the models are different from each other, the simple operational total order is better. Anyway, as we both said, this is not a question that is super relevant right now.

Yeah, my instinct is that this will be way too stringent. Especially since we say that &mut and *mut are the same layout, people will expect at least round trip conversion to be valid, and as you point out, we can’t even detect transmutes in general, which makes it nigh impossible to make miri check for this condition.

I at least am relatively happy with this as a compromise position (and it’s one of the things I proposed way back in the Types as Contracts thread). The lack of Raw on the stack just means that the new pointer is not dereferenceable, but it still allows comparison, hashing, etc. It also allows for a 2 piece desugaring of the as *mut _ cast, as a side effecting “broadcast” operation followed by a transmute, which I think is a reasonable model to use. The actual transmute can then be subject to, e.g., DCE, and the broadcast can be handled separately and will compile to no IR.

That said, I can definitely see this being another confusing way to shoot yourself in the foot.

Oh, sure. I just wanted to see if we could extend the model to make it legal. But since I finished writing that idea up and have been thinking about it more, I’m getting less and less convinced that it is useful, at least for now.

Ooh, FFI is an interesting aspect of this topic that I hadn’t thought about. If the rust code casts to a raw pointer and passes it to C, that certainly would be fine, but I assume you’re talking about an extern fn that directly takes &Cell<i32> as an argument? I’m not sure how this would cause more problems than just passing a reference directly to FFI. If I declare

extern "C" {
    fn foo(&mut i32);
}

then the C code will receive a reference, but will interpret it (which seems like a transmute to me) as a raw pointer. Then it will modify the memory, which would be invalid if there isn’t some implicit “all pointers immediately push Raw if crossing the FFI boundary.” This doesn’t seem to be a problem unique to Cell.

Honestly, while I do like making everything reorderable (and this particular case seems simple enough that I could totally see wanting a Rust pass to do something like this), my biggest motivation for wanting separate tags and timestamps is that it feels mentally simpler to me. Technically more complicated, but intuitively simpler.

That seems mostly reasonable, but I can see a few places where transmutes might throw a wrench in things. What if someone transmutes a &mut into a shared reference?

Actually, that’s an interesting question independent of the problem of timestamp tag ordering. I could see an argument for it just being UB, but it seems so innocuous, and more convincingly, I’d like to be able to transmute from &&mut T to &&T (mostly because I want more useful examples like transmuting &[&mut T] into &[&T]). It’s not clear what the right behaviour is. My instinct is that it interacts with retagging to make everything work out - when you actually load/receive the transmuted reference, it gets retagged. This activates the Uniq element on the stack (because despite the type, you actually got a mutable reference) and then tops off the stack and freezes it. But maybe that is too delayed.

:grin:

Not in general, certainly, but if you have &mut/noalias dereferenceable, it seems like that should lock things down enough for it be be valid (EDIT: I just noticed #53105, which is exactly this). Those assert exactly that there is no other code that can observe the location (and you won’t trap by trying to write). Obviously you need to only write back the value that was already there, but I don’t see a problem with that.

That said, upon experimentation, it looks like LLVM still won’t do an unconditional write, even in that highly restricted situation.

This mostly feels right to me. However, it occurs to me that it would be possible to support the intuition that when you explicitly drop a mutable reference, you are giving up the promise of exclusive access. There would be a “drop glue” (in quotes because it is purely at the instrumented VM level) for &mut that pops off the Uniq and any barrier that came with it.

The main case I can see for justifying them is if you have many references (i.e. in a data structure) and you want to convert all of them in one go. Iterating through and casting each one is expensive and possibly impossible, since we don’t really have a way to change the type of a data structure partially through (that would require true linear types to make it work).

This (generalized) was the motivation behind my RFCs #91 and #223, for example.


#24

Could you maybe write a small Rust-only test case demonstrating what you think should be possible?

Here’s short example that just creates a cyclic linked list, and a longer one that actually deserializes a graph.


#25

Hmm, that makes sense.

Some other notes / nitpicks:

Dereferenceable

Given a function argument of type &T, I think your model already requires that the argument is a valid pointer (aligned, not trapping, not null), even if the function body never dereferences it: otherwise, the retagging and barrier operations wouldn’t make sense. Thus, rustc is justified in applying LLVM dereferenceable, as it currently does. However, it might be worth stating this more explicitly.

On the other hand, given a function argument of type &&T, if the function body never dereferences it, the inner pointer could be invalid. So if LLVM hypothetically offered a “deferenceable N times” attribute, rustc couldn’t use it.

De-reference-ification optimization

Another hypothetical optimization I’ve talked about in the past is: among function arguments that are immutable references to word-sized values (e.g. &usize, &&T), automatically detect cases where the function implementation doesn’t care about pointer identity for that argument, and transform the ABI-level signature to take the value directly (usize, &T). Of course, it isn’t always possible to statically determine whether the function cares about identity. But in many cases it is possible; in particular, if the function body passes the reference as an argument to another function, rustc can propagate the “doesn’t care” flag from the callee to the caller.

If &&T can point to an invalid pointer, there would be some restrictions on that optimization:

  • It would still be okay to do the transformation to pass &T directly, but the new argument couldn’t be marked dereferenceable.
  • If T is itself word-sized, it would not be okay to do the transformation twice and pass T by value.
  • On the other hand, the compiler could assume validity if it additionally detected that the argument is unconditionally dereferenced (again by the function body or its callees). But that isn’t always the case; e.g. if the function has an early exit (perhaps using the ? operator) before it happens to touch that argument, it’s not an unconditional dereference.

Probably not the end of the world.

Unsized types

You mentioned "mem::size_of<T>() many bytes"; I guess that for unsized T, that should that be mem::size_of_val(t) many bytes? …I can’t actually think of any interesting or problematic subtleties related to that, but I’m mentioning it anyway just for the record.

Reinterpreting memory

Unfortunately… I agree with gereeter that transmuting between pointers and references, or transmuting &&mut T to &&T as they suggested, or doing the equivalent of a transmute by copying bytes, etc. should be legal. I suspect there’s a fair amount of existing code that uses such transmutes, and it would be surprising (in the “surprise! UB” sense :wink: ) if they didn’t work – especially considering that they’re legal in C.

For this to be accommodated, I guess it would have to be possible for arbitrary values to carry tags, not just reference-typed ones – including raw pointers, usize, and even u8. I suppose this contradicts your desire to make non-reference types as “non-magical” as possible. But I think the alternative is worse.

Local variables

Given:

fn foo() -> i32 {
    let mut x = 2;
    let _ = &mut x as *mut i32; // does not escape
    bar();
    x
}

Can the compiler optimize this to always return 2?

If x were immutable, that optimization would arguably be justified in your model, under the assumption that an immutable local variable acts like an immutable reference. But not if x is mutable.

In some sense this is logical. If the raw pointer were passed to bar() rather than thrown away, the optimization would clearly be invalid; but you want raw pointers to be non-magical and untracked, so the fact that it’s thrown away shouldn’t make a difference and it should be invalid here as well.

However, from a compiler perspective, I think that would be quite surprising. The raw pointer cast seems like dead code, but removing it would change semantics. And LLVM is definitely going to want to remove it, so you’d have to use some special mechanism to trick LLVM into thinking the pointer value escaped.

I think it would be better to make this optimization valid by introducing an additional rule, something about values that are completely non-escaping.

Such a rule would also allow something like

fn no_op(p: &mut i32) -> &mut i32 {
    unsafe { &mut *(p as *mut i32) }
}

to be optimized into a true no-op.


#26

This really is a interesting example, and I think it does in some sense “feel” like it or something like it ought to be allowed. I think it is generally fine to say that your specific code is UB (in my mind at least is pretty flagrantly mutates something that has had a shared reference taken which is invalid) as long as there is a clear guideline for what should be done instead. Specifically, regardless of how ugly the resultant code is, it should definitely be possible to create a value of that specific Node type (no changes like requiring it to be more generic or giving it repr(C)) using unsafe code that is cyclic in the way you describe.

This is the best I was able to come up with. As long as UnfinishedNode and Node are layout-compatible (which is a guarantee I think we can choose to provide, though we don’t right now), this should do the same thing as your code without undefined behaviour.

I think I would be in favor of a strong enough guarantee about struct layout to make this sort of code work (possibly with some extra annotations). I don’t think this would be at all onerous, though it might get in the way of struct layout randomization; I don’t actually think it will though. As a specific strawman proposal designed to be extra conservative:

Two monomorphised types S and T are layout-compatible if any of the following hold.

  • S = T
  • S is a repr(transparent) wrapper around S2 and S2 and T are layout-compatible.
  • T is a repr(transparent) wrapper around T2 and T2 and S are layout-compatible.
  • S and T are both built-in pointer types (&mut, &, *mut, *const) and the pointer payloads are either
    • Both Sized,
    • Both slices,
    • The same trait
  • S and T are both structs with the same number of fields and, when paired up in declaration order, every pair of fields
    • Contain layout-compatible types
    • Have the same name (this is a bit extreme, but I wanted to be conservative)
  • S and T are compatible according to repr(C) or other specific representation rules (e.g. packed, repr on enums, etc.)

If we made those guarantees, I think the cyclic data-structure use case is not too heinous to deal with.

The primary difficulty is that it is really hard to maintain data dependency chains and make sure that you don’t track things the way they were at the source level. Yes, it is painful to not be able to remove the dead cast, but that will be easily cleaned up later, and without it, you have to be sure to not break a different part of the chain. The traditional (well, I mean the one I’ve seen in multiple places; this isn’t really old enough to have tradition) example is

fn foo() -> i32 {
    let mut x = 2;
    let ptr = &mut x as *mut i32 as usize; // does not escape
    bar(ptr - ptr);
    x
}

Now bar in some sense has been passed the pointer, and unless you want to deal with some complicated questions of what information content certain values leak about the pointer, you need to let bar access x. But then you can’t rewrite ptr - ptr to 0. Complicating pointer-to-integer casts is annoying, but it seems to me like the least bad option.

See also the sort-of-tangent to the Types as Contracts discussion which I think did a great job of demonstrating the tradeoffs.

Additionally, I don’t think it would cause any confusion past the AST, as I’m imagining that the lowering from AST to MIR would lower a reference-to-pointer cast as two statements in the MIR: a (brand-new) PushRaw operation and then a transmute to do the actual cast. Then, the compiler would be free to remove the cast, just not the PushRaw which has side effects. The PushRaw in some sense acts like a compiler barrier - it prevents certain movement of operations, but when MIR gets lowered to LLVM, it would just disappear.

This is just a MIR-level model, so it is fine if LLVM removes the cast - we just can’t remove it (or, well, the push of Raw, which is a bit different) before lowering to LLVM. We do need to be careful to compile to valid LLVM, but as long as MIR undefined behaviour is a superset of LLVM undefined behaviour, which it should be since we throw away a bunch of extra information, then LLVM will generate valid code.


#27

Indeed, but as specified it is not a superset. LLVM will assume that alloca’ed variables whose addresses do not escape are not mutated externally; in the example I gave, it actually does optimize the function to always return 2. That’s why I said it would be necessary to trick LLVM into thinking the address did escape.


#28

As long as UnfinishedNode and Node are layout-compatible … I think I would be in favor of a strong enough guarantee about struct layout to make this sort of code work (possibly with some extra annotations). I don’t think this would be at all onerous

I tried that approach too, in the real problem this example was derived from. But it turned extremely burdensome at the source code level, because in fact Node is not a simple type at all; it’s a collection of fairly complex types. (I not actually deserializing a graph, I’m making immutable, thread-safe copies of the heap in a scripting language implementation.) But it is possible.

in my mind at least is pretty flagrantly mutates something that has had a shared reference taken which is invalid

I’m not sure that’s true (in the sense that I think there’s a reasonable aliasing model for which it’s ok). From this example : you could argue that the lifetime (in the “define what’s UB sense”, not the static checker sense) of the reference created on line 31 ends immediately, since it’s never actually used. It is reachable from another reference but it’s never actually reached. (And we can’t outlaw the mere existence of a chain of references that would point to mutating memory if someone were to follow them, because there are plenty of totally safe ways to do that.)

Then the ‘launder’ on line 37 discards all the old references and creates a fresh one, so the compiler can’t assume anything about (transitive) loads through it from before line 37.

The model that fits my intuition is analogous to data dependent reads in a processor memory model, and (I think) a close fit for ‘noalias’ in clang. For shared refs:

  • Assume a global clock, and timestamps of last write to each address (stealing your idea here)

  • When a new ref is taken (or a ref is cast from a pointer, or from an &mut ref), it gets a timestamp tR. If any load through that ref observes a memory timestamp tM > tR, it’s UB.

  • When a ref is created by loading through another ref, it gets the parent ref’s timestamp.

  • Thus the compiler can assume that, given some shared root references, nothing loaded through any transitive ref path will mutate, and any load can be moved to as early as the initial ref creation or as late as the last load. But it can’t make any assumptions about data that is never loaded.

(I don’t know how mutable refs would fit in this model, but hopefully your stacked borrows would adapt

An unrelated observation (I made it earlier but now realize that it applies more broadly). For owning types like Rc<T>, Vec<T> and especially Box<T> - if you start with an immutable root, the whole graph is immutable, and the compiler should be able take advantage of that just like with plain refs. But the model falls back to the free-for-all ‘raw pointer’ case because they these types use pointers in their implementation. Does this inhibit possible optimizations or sanitizer checks? (If so, maybe special rules for std::ptr::Unique would fix that?)