Stacked Borrows: An Aliasing Model For Rust

How does the model cover conditionality?

x: &mut i32, y: &mut i32, flag: bool
let mut ptr = x as *mut i32;
if flag { ptr = y as *mut i32; }
...
if flag { 
    *x = 1;
    unsafe { *ptr = 2; }
}

@jonh This is a dynamic model, it just looks at whatever code is executed. So assuming flag did not change, that code is fine.

I just spent quite a while disagreeing with you and attempting to come up with a counterexample, and upon repeated failures I'm now convinced you're right that the models are basically the same. So here is my proof sketch that, in fact, you are correct (at least with the interpretation of the partial order that I was mentally using). I think this helps clarify what the different points in the design space are, by looking at where the proof sketch might fall apart.

First of all, note that since the total order is compatible with the partial order, anything accepted by the partial order is accepted by the total order, so it suffices to show that anything accepted by the total order is also accepted by the partial order. The only place where timestamp ordering is relevant is in checking the validity of shared references. In a case accepted by a total order, there are three relevant timestamps: t_freeze, when the location was frozen, t_create, when the shared reference was created, and t_use, when the value is read from under the shared reference. For this to be accepted, t_freeze <= t_create in the total order, and no writes to the location are scheduled between t_freeze and t_use. In addition, because of the data dependency, t_create must happen-before t_use. (N.B. This may not hold in the presence of weak memory, since the reference could have been read out of a promise.). Then, in the partially ordered setting, at the moment of t_create, the creation of the reference would trigger freezing the already frozen memory location. In my mental model the freeze time was actually an antichain frontier, so this would add t_create to the freeze time. Then, upon t_use, there have been no writes, so the location is still frozen, and the ordering confirms that t_create (in the freeze time antichain) happens-before t_use, allowing the read. Q.E.D.ish.

There are 2 big ways that I can see this proof failing:

  1. If t_create does not happen before t_use. This may be totally possible with weak memory orders. Essentially, the question is whether the following should be allowed:
    unsafe fn thread1(x: &Atomic<&'static u32>, ptr: *mut u32) {
        x.store(&*ptr, Ordering::Relaxed);
    }
    fn thread2(x: &Atomic<&'static u32>) {
        let value = *x.load(Ordering::Relaxed);
        println!("{}", value);
    }
    
    Note that the retagging at the return of load has the same effect of requiring validity as the actual read.
  2. If the freeze time is modeled as a single time instead of some frontier. This could make sense, but rules out
    unsafe fn thread1(ptr: *mut u32) {
        println!("{}", *(&*ptr));
    }
    unsafe fn thread2(ptr: *mut u32) {
        println!("{}", *(&*ptr));
    }
    
    as only the first freeze would "win" and the second reference would be invalid.

In both those cases, I'd actually like for the code to be accepted, so.. color me convinced that, to the extent that the models are different from each other, the simple operational total order is better. Anyway, as we both said, this is not a question that is super relevant right now.

Yeah, my instinct is that this will be way too stringent. Especially since we say that &mut and *mut are the same layout, people will expect at least round trip conversion to be valid, and as you point out, we can't even detect transmutes in general, which makes it nigh impossible to make miri check for this condition.

I at least am relatively happy with this as a compromise position (and it's one of the things I proposed way back in the Types as Contracts thread). The lack of Raw on the stack just means that the new pointer is not dereferenceable, but it still allows comparison, hashing, etc. It also allows for a 2 piece desugaring of the as *mut _ cast, as a side effecting "broadcast" operation followed by a transmute, which I think is a reasonable model to use. The actual transmute can then be subject to, e.g., DCE, and the broadcast can be handled separately and will compile to no IR.

That said, I can definitely see this being another confusing way to shoot yourself in the foot.

Oh, sure. I just wanted to see if we could extend the model to make it legal. But since I finished writing that idea up and have been thinking about it more, I'm getting less and less convinced that it is useful, at least for now.

Ooh, FFI is an interesting aspect of this topic that I hadn't thought about. If the rust code casts to a raw pointer and passes it to C, that certainly would be fine, but I assume you're talking about an extern fn that directly takes &Cell<i32> as an argument? I'm not sure how this would cause more problems than just passing a reference directly to FFI. If I declare

extern "C" {
    fn foo(&mut i32);
}

then the C code will receive a reference, but will interpret it (which seems like a transmute to me) as a raw pointer. Then it will modify the memory, which would be invalid if there isn't some implicit "all pointers immediately push Raw if crossing the FFI boundary." This doesn't seem to be a problem unique to Cell.

Honestly, while I do like making everything reorderable (and this particular case seems simple enough that I could totally see wanting a Rust pass to do something like this), my biggest motivation for wanting separate tags and timestamps is that it feels mentally simpler to me. Technically more complicated, but intuitively simpler.

That seems mostly reasonable, but I can see a few places where transmutes might throw a wrench in things. What if someone transmutes a &mut into a shared reference?

Actually, that's an interesting question independent of the problem of timestamp tag ordering. I could see an argument for it just being UB, but it seems so innocuous, and more convincingly, I'd like to be able to transmute from &&mut T to &&T (mostly because I want more useful examples like transmuting &[&mut T] into &[&T]). It's not clear what the right behaviour is. My instinct is that it interacts with retagging to make everything work out - when you actually load/receive the transmuted reference, it gets retagged. This activates the Uniq element on the stack (because despite the type, you actually got a mutable reference) and then tops off the stack and freezes it. But maybe that is too delayed.

:grin:

Not in general, certainly, but if you have &mut/noalias dereferenceable, it seems like that should lock things down enough for it be be valid (EDIT: I just noticed #53105, which is exactly this). Those assert exactly that there is no other code that can observe the location (and you won't trap by trying to write). Obviously you need to only write back the value that was already there, but I don't see a problem with that.

That said, upon experimentation, it looks like LLVM still won't do an unconditional write, even in that highly restricted situation.

This mostly feels right to me. However, it occurs to me that it would be possible to support the intuition that when you explicitly drop a mutable reference, you are giving up the promise of exclusive access. There would be a "drop glue" (in quotes because it is purely at the instrumented VM level) for &mut that pops off the Uniq and any barrier that came with it.

The main case I can see for justifying them is if you have many references (i.e. in a data structure) and you want to convert all of them in one go. Iterating through and casting each one is expensive and possibly impossible, since we don't really have a way to change the type of a data structure partially through (that would require true linear types to make it work).

This (generalized) was the motivation behind my RFCs #91 and #223, for example.

Could you maybe write a small Rust-only test case demonstrating what you think should be possible?

Here's short example that just creates a cyclic linked list, and a longer one that actually deserializes a graph.

1 Like

Hmm, that makes sense.

Some other notes / nitpicks:

Dereferenceable

Given a function argument of type &T, I think your model already requires that the argument is a valid pointer (aligned, not trapping, not null), even if the function body never dereferences it: otherwise, the retagging and barrier operations wouldn't make sense. Thus, rustc is justified in applying LLVM dereferenceable, as it currently does. However, it might be worth stating this more explicitly.

On the other hand, given a function argument of type &&T, if the function body never dereferences it, the inner pointer could be invalid. So if LLVM hypothetically offered a "deferenceable N times" attribute, rustc couldn't use it.

De-reference-ification optimization

Another hypothetical optimization I've talked about in the past is: among function arguments that are immutable references to word-sized values (e.g. &usize, &&T), automatically detect cases where the function implementation doesn't care about pointer identity for that argument, and transform the ABI-level signature to take the value directly (usize, &T). Of course, it isn't always possible to statically determine whether the function cares about identity. But in many cases it is possible; in particular, if the function body passes the reference as an argument to another function, rustc can propagate the "doesn't care" flag from the callee to the caller.

If &&T can point to an invalid pointer, there would be some restrictions on that optimization:

  • It would still be okay to do the transformation to pass &T directly, but the new argument couldn't be marked dereferenceable.
  • If T is itself word-sized, it would not be okay to do the transformation twice and pass T by value.
  • On the other hand, the compiler could assume validity if it additionally detected that the argument is unconditionally dereferenced (again by the function body or its callees). But that isn't always the case; e.g. if the function has an early exit (perhaps using the ? operator) before it happens to touch that argument, it's not an unconditional dereference.

Probably not the end of the world.

Unsized types

You mentioned "mem::size_of<T>() many bytes"; I guess that for unsized T, that should that be mem::size_of_val(t) many bytes? ...I can't actually think of any interesting or problematic subtleties related to that, but I'm mentioning it anyway just for the record.

Reinterpreting memory

Unfortunately... I agree with gereeter that transmuting between pointers and references, or transmuting &&mut T to &&T as they suggested, or doing the equivalent of a transmute by copying bytes, etc. should be legal. I suspect there's a fair amount of existing code that uses such transmutes, and it would be surprising (in the "surprise! UB" sense :wink: ) if they didn't work – especially considering that they're legal in C.

For this to be accommodated, I guess it would have to be possible for arbitrary values to carry tags, not just reference-typed ones – including raw pointers, usize, and even u8. I suppose this contradicts your desire to make non-reference types as "non-magical" as possible. But I think the alternative is worse.

Local variables

Given:

fn foo() -> i32 {
    let mut x = 2;
    let _ = &mut x as *mut i32; // does not escape
    bar();
    x
}

Can the compiler optimize this to always return 2?

If x were immutable, that optimization would arguably be justified in your model, under the assumption that an immutable local variable acts like an immutable reference. But not if x is mutable.

In some sense this is logical. If the raw pointer were passed to bar() rather than thrown away, the optimization would clearly be invalid; but you want raw pointers to be non-magical and untracked, so the fact that it's thrown away shouldn't make a difference and it should be invalid here as well.

However, from a compiler perspective, I think that would be quite surprising. The raw pointer cast seems like dead code, but removing it would change semantics. And LLVM is definitely going to want to remove it, so you'd have to use some special mechanism to trick LLVM into thinking the pointer value escaped.

I think it would be better to make this optimization valid by introducing an additional rule, something about values that are completely non-escaping.

Such a rule would also allow something like

fn no_op(p: &mut i32) -> &mut i32 {
    unsafe { &mut *(p as *mut i32) }
}

to be optimized into a true no-op.

This really is a interesting example, and I think it does in some sense "feel" like it or something like it ought to be allowed. I think it is generally fine to say that your specific code is UB (in my mind at least is pretty flagrantly mutates something that has had a shared reference taken which is invalid) as long as there is a clear guideline for what should be done instead. Specifically, regardless of how ugly the resultant code is, it should definitely be possible to create a value of that specific Node type (no changes like requiring it to be more generic or giving it repr(C)) using unsafe code that is cyclic in the way you describe.

This is the best I was able to come up with. As long as UnfinishedNode and Node are layout-compatible (which is a guarantee I think we can choose to provide, though we don't right now), this should do the same thing as your code without undefined behaviour.

I think I would be in favor of a strong enough guarantee about struct layout to make this sort of code work (possibly with some extra annotations). I don't think this would be at all onerous, though it might get in the way of struct layout randomization; I don't actually think it will though. As a specific strawman proposal designed to be extra conservative:

Two monomorphised types S and T are layout-compatible if any of the following hold.

  • S = T
  • S is a repr(transparent) wrapper around S2 and S2 and T are layout-compatible.
  • T is a repr(transparent) wrapper around T2 and T2 and S are layout-compatible.
  • S and T are both built-in pointer types (&mut, &, *mut, *const) and the pointer payloads are either
    • Both Sized,
    • Both slices,
    • The same trait
  • S and T are both structs with the same number of fields and, when paired up in declaration order, every pair of fields
    • Contain layout-compatible types
    • Have the same name (this is a bit extreme, but I wanted to be conservative)
  • S and T are compatible according to repr(C) or other specific representation rules (e.g. packed, repr on enums, etc.)

If we made those guarantees, I think the cyclic data-structure use case is not too heinous to deal with.

The primary difficulty is that it is really hard to maintain data dependency chains and make sure that you don't track things the way they were at the source level. Yes, it is painful to not be able to remove the dead cast, but that will be easily cleaned up later, and without it, you have to be sure to not break a different part of the chain. The traditional (well, I mean the one I've seen in multiple places; this isn't really old enough to have tradition) example is

fn foo() -> i32 {
    let mut x = 2;
    let ptr = &mut x as *mut i32 as usize; // does not escape
    bar(ptr - ptr);
    x
}

Now bar in some sense has been passed the pointer, and unless you want to deal with some complicated questions of what information content certain values leak about the pointer, you need to let bar access x. But then you can't rewrite ptr - ptr to 0. Complicating pointer-to-integer casts is annoying, but it seems to me like the least bad option.

See also the sort-of-tangent to the Types as Contracts discussion which I think did a great job of demonstrating the tradeoffs.

Additionally, I don't think it would cause any confusion past the AST, as I'm imagining that the lowering from AST to MIR would lower a reference-to-pointer cast as two statements in the MIR: a (brand-new) PushRaw operation and then a transmute to do the actual cast. Then, the compiler would be free to remove the cast, just not the PushRaw which has side effects. The PushRaw in some sense acts like a compiler barrier - it prevents certain movement of operations, but when MIR gets lowered to LLVM, it would just disappear.

This is just a MIR-level model, so it is fine if LLVM removes the cast - we just can't remove it (or, well, the push of Raw, which is a bit different) before lowering to LLVM. We do need to be careful to compile to valid LLVM, but as long as MIR undefined behaviour is a superset of LLVM undefined behaviour, which it should be since we throw away a bunch of extra information, then LLVM will generate valid code.

Indeed, but as specified it is not a superset. LLVM will assume that alloca'ed variables whose addresses do not escape are not mutated externally; in the example I gave, it actually does optimize the function to always return 2. That's why I said it would be necessary to trick LLVM into thinking the address did escape.

As long as UnfinishedNode and Node are layout-compatible ... I think I would be in favor of a strong enough guarantee about struct layout to make this sort of code work (possibly with some extra annotations). I don’t think this would be at all onerous

I tried that approach too, in the real problem this example was derived from. But it turned extremely burdensome at the source code level, because in fact Node is not a simple type at all; it's a collection of fairly complex types. (I not actually deserializing a graph, I'm making immutable, thread-safe copies of the heap in a scripting language implementation.) But it is possible.

in my mind at least is pretty flagrantly mutates something that has had a shared reference taken which is invalid

I'm not sure that's true (in the sense that I think there's a reasonable aliasing model for which it's ok). From this example : you could argue that the lifetime (in the "define what's UB sense", not the static checker sense) of the reference created on line 31 ends immediately, since it's never actually used. It is reachable from another reference but it's never actually reached. (And we can't outlaw the mere existence of a chain of references that would point to mutating memory if someone were to follow them, because there are plenty of totally safe ways to do that.)

Then the 'launder' on line 37 discards all the old references and creates a fresh one, so the compiler can't assume anything about (transitive) loads through it from before line 37.

The model that fits my intuition is analogous to data dependent reads in a processor memory model, and (I think) a close fit for 'noalias' in clang. For shared refs:

  • Assume a global clock, and timestamps of last write to each address (stealing your idea here)

  • When a new ref is taken (or a ref is cast from a pointer, or from an &mut ref), it gets a timestamp tR. If any load through that ref observes a memory timestamp tM > tR, it's UB.

  • When a ref is created by loading through another ref, it gets the parent ref's timestamp.

  • Thus the compiler can assume that, given some shared root references, nothing loaded through any transitive ref path will mutate, and any load can be moved to as early as the initial ref creation or as late as the last load. But it can't make any assumptions about data that is never loaded.

(I don't know how mutable refs would fit in this model, but hopefully your stacked borrows would adapt

...

An unrelated observation (I made it earlier but now realize that it applies more broadly). For owning types like Rc<T>, Vec<T> and especially Box<T> - if you start with an immutable root, the whole graph is immutable, and the compiler should be able take advantage of that just like with plain refs. But the model falls back to the free-for-all 'raw pointer' case because they these types use pointers in their implementation. Does this inhibit possible optimizations or sanitizer checks? (If so, maybe special rules for std::ptr::Unique would fix that?)

1 Like

I think weak memory is of little concern here because all potentially racing accesses must happen through &UnsafeCell, so there is no freezing and no checking going on -- everything is Raw.

Why do you think this is ruled out? The second reference would re-freeze, but that's okay.

Uh, good question. The return value would be activated, and then that would be UB because the location is not frozen.

Aha! I agree. I want that, too, and I hadn't thought about what happens.

If we would do things like freezing recursively through pointer indirections, things would work out as the inner T would be frozen anyway. But with the model as I described it, I think this cast would actually not be legal because, well, it's a transmute of &mut T to &T, and then see above.

We could say that a shared reference, upon retagging, starts a freeze if things are not already frozen, but I am worried that this might weaken shared references because now we can no longer know that stuff has already been frozen.

Or we could make creating a shared reference (from an &mut) special in the sense that it is the only operation which proceeds recursively and freezes everything. But that also seems surprising, when nothing else recurses through references.

Note that at least in my mind, the tag is just a timestamp, it does not know whether it is Uniq or Frz. So transmuting a &mut T to a &T actually "reinterprets" the tag. Maybe the tag should come with that extra information as you seem to assume it does, and maybe that helps -- retagging would activate the Uniq, but then freeze. I think that solution sounds best, from the ones I have seen so far.

I do not follow. The inserted write is visible to whoever passed you the &mut -- someone else will have access again when you are done. Spurious writes are only okay if the reference is never used again by anyone else in the future. Am I missing something?

However, this means that we can only add noalias when we can show that the corresponding reference is never dropped.


Correct. I was focusing on aliasing in this post; the next post will be about which party can assume which invariants on types to hold when, and that's when I am going to come to this point.

Agreed.

Ah, that is an interesting one -- and a good case for doing some kind of validation recursively through references.

On the other hand, we have plenty of unsafe code that hands around references to not-completely-initialized data. So I am a bit worried about how such code might interact with this kind of optimization -- not very well, probably.

Yes.

Making integers carry tags invalidates all sorts of arithmetic optimizations, so that is a no-go.

Raw pointers carrying tags that are being ignored would be fine though. However, that may or may not be enough for this situation, I would have to look at some examples.

This is a hard question in C, and it is juts as hard in Rust. We recently wrote a paper about it. I do not plan to solve those questions with this model.

The intent is that the compiler may optimize, but that has nothing to do with Rust's reference types any more, this is the usual reasoning that you would also do in C about not being able to guess addresses.

Thanks, this helps a lot!

As @gereeter wrote, this code is UB:

    let a: &'static mut Node = alloc(Node {
        name: "a",
        next: &EMPTY,
    });
    let b: &'static mut Node = alloc(Node {
        name: "b",
        next: unsafe { &*(a as *mut Node) },
    });
    a.next = b;

You are creating a shared reference to a, and then mutating it. I think allowing this would inhibit most of the optimizations we want to do around shared references.

UnfinishedNode with an UnsafeCell is one possibility to get around this. Another one might be to use raw pointers, though transmuting a raw pointer to a shared reference is also somewhat subtle...

The launder does not affect the data behind the reference though -- it launders the outer & in &'static Node, but it does not have any effect on the references stored inside that struct.

And if it did, the a as *const _ -- casting a shared ref to a raw ptr -- will activate the source reference, so it is like a use.

Oh, data dependency models are usually pretty much impossible to use for a source language. (E.g.: does x * 0 depend on x?)

This is an interesting suggestions. I find it a bit odd, TBH -- I'd expect it to maintain its own timestamp. But I think one could adapt my model just a tiny bit to have a very similar effect (otherwise it already seems to have a lot in common with yours): When a shared ref is loaded, we don't activated it -- we just retag (give it a new timestamp and initiate that). You would probably want to do the same for mutable references.

My intuition was that you would want the ref you load to already "have been valid".

However, this might not interact too well with the barrier... not activating the reference being loaded means that it might actually not be derived from the parameter... for example

fn foo(x: &mut i32, z: usize) -> i32 {
  *x = 12;
  let y : &mut i32 = bar(usize); // imagine this aliases x
  *y = 13;
  *x // can we optimize this to 12?
}

Under a model that doesnt activate incoming references, y might actually push its own Uniq on top of x, and then later when x is used again that gets popped. So the above could would be allowed. That does not seem right to me.

So, I am not sure I like "loading a ref inherits the parent timestamp". This makes it completely irrelevant where the loaded ref is coming from, i.e., how it was computed; that's a lot of information that would go missing. Maybe it works if we do this only for shared references, I would have to think more about that.

The compiler should understand that things are immutable once the & surfaces. Before that, I find it hard to communicate this intent -- but yeah, maybe there is a way to make Unique mean something. I am not sure what it would mean, though.

Question for my own edification: as far as I gather, this model does not deal very well with “transmutes”. What about the “types as contracts” model - is it any better at handling them?

You are creating a shared reference to a , and then mutating it. I think allowing this would inhibit most of the optimizations we want to do around shared references.

I thought your model was being careful work only in terms of actual loads and stores, and not say anything about data which was reachable through the type system but never actually reached. In this example the mutation to some data reachable from 'a' is never observed.

The launder does not affect the data behind the reference though – it launders the outer & in &'static Node , but it does not have any effect on the references stored inside that struct.

I guess I'm confused about the precise terminology in use here. Immediately after let p = some_address as *const (&'a i32, &'a i32); let r = &*p;, is there one reference or three? I was assuming the former--i.e. references that are part of some data structure sitting in memory aren't part of the formal model until they're loaded--but apparently that was wrong?

I think my first example had unnecessarily long ref lifetimes which confused the issue. Here is a cleaner version, where the work-in-progress nodes are stored in pointers instead of refs. Is there UB in this version?

Edit: yes, I was mixing up a half-baked model in my head with your model. I think your model does define what happens for all of my examples. Bits of the data-dependency-like model were stuck in my head because that's the only way I can see to reasonably deal with transmutes. Cases like what to do with pointer * 0 fortunately aren't actually a problem if you're only tracking references.

The trouble with transmutes comes from having extra data stored at pointers, so there is a representation difference between mutable references, shared references and raw pointers. "Types as Contracts" had the explicit goal to not do that, so it did not have this problem.

However, it made lifetimes matter. I am not sure which is worse...

Then you are misunderstanding. My model explicitly performs "activation" of all references that enter or leave a function -- passed in as argument, passed out as argument, loaded from memory, stored to memory, passed in as return value, passed out via return. This gives us some nice invariants to work with -- it is needed, for example, for the dereferencable annotation. Also we do retagging on references coming in, we can't do that without looking at the data. The model is mostly access based, but not 100% so.

However, when doing "activation", I propose to not recurse through references. So for example, when a function receives (as incoming argument, incoming return value or loaded from memory) a (&mut i32, &mut i32), then these references are both activated. However, when it receives a &mut &mut i32, the inner reference is not touched at all. IOW, we look at the data coming in (two references in case of the pair, one reference in the second example), but we do not go into memory to perform some recursive action.

I do not understand how you are counting references. The second line is a raw-ptr-to-shared-ref, so it will do the following to all locations "covered" by the ref -- the ref has type &(&i32, &i32), so that would be 16 locations (with 8 bytes pointer size):

  1. Activate a Raw (so some_address must be frozen or have a Raw in the stack)
  2. Bump the clock and assign the newly created shared ref the current timestamp.
  3. Initiate Frz(current_time), which will freeze the location.

The values stores in these locations (which happen to be references again in this case) do not matter at all here.

Yes. Let's look at the most important steps happening:

    unsafe {
        (*a).next = &*b; // <--
        (*b).next = &*a;
        &*(a as *const Node)
    }

At the marked line, we create a shard ref covering whatever memory b is stored in. This follows the steps I described above, so in the end, that memory is frozen. In the next line, b.next is mutated again, which is okay because there is a Raw in the stack -- but will end the freezing for that part of memory (the last 8 bytes of b), because you cannot mutate something frozen.
Finally, later a.next.next aka b.next is accessed. This will activate that shared reference, asserting that memory is still frozen -- triggering UB because memory is in fact not frozen any more.

Taking a step back, at (*a).next = &*b; you are creating a shared reference. That is a promise that the memory pointed to by this reference will never be mutated while this reference is valid. The reference points at wherever b sits in memory. The next thing you do is mutate b. The only way now to maintain your promise is if we say that this invalidate the reference you just created. (With types as contracts, btw, this would already be UB -- types as contracts takes lifetimes into account, so the moment you create an &'static pointing at something, that memory is marked as read-only forever. Which makes sense, because that's literally what &'static is.)

We have to find a way to make this work without mutating memory pointed to by a shared reference. One way is to opt-out of the promise I mentioned above by using Cell or UnsafeCell. So if you replace &'static Node by Cell<&'static Node> or UnsafeCell<&'static Node>, you should be good. Would that be an acceptable solution? It changes the type throughout the program, but it also reflects better the fact that the memory here is being mutated while there is a shared ref pointing at it.

1 Like

I thought your model was being careful work only in terms of actual loads and stores

... However, when it receives a &mut &mut i32 , the inner reference is not touched at all. IOW, we look at the data coming in (two references in case of the pair, one reference in the second example), but we do not go into memory to perform some recursive action.

That's what I meant to say - the model is "shallow", so reference-typed values that are just sitting out in some data structure in memory don't have any special properties in this model until they're loaded. (As opposed to the "types as contracts" model, where they do have special properties and my example is obviously UB.)

But your description of how the example would trigger UB doesn't seem to be following this "shallow" model:

Finally, later a.next.next aka b.next is accessed. This will activate that shared reference, asserting that memory is still frozen – triggering UB because memory is in fact not frozen any more.

This will activate what shared reference? No reference is used more than once in the example, and all references are gone by the end of the unsafe block; a: *mut Node is the only live variable of any type at that point. New references are later derived from it (to access a.b.a), but those will have later timestamps. If that triggers UB, then so would let mut x = Box::new(5); &*x; *x = 6; *x. It's the same sequence of steps: Start with a root raw pointer; derive a shared ref, freezing the memory; mutate the memory, unfreezing it; later, derive a new shared ref from the root pointer and use it to read the memory.

BTW I'm not arguing that my example should necessarily be legal; as long as there is some way to create cycles of references, and there seems to be general consensus that that's desirable. But I think this model actually allows quite a lot of questionable stuff. E.g.:

    // of course lifetimes won't let this compile, but assume the same structure was created
    // by unsafe pointer casts or transmutes or whatever.
    let mut i = 0;
    let r1 = &mut &mut &mut i;
    let r2 = &mut &mut &mut i;
    ***r1 = 1;
    ***r2 = 2;
    println!("{}", ***r1);

This clearly should be UB. But it appears to be allowed by the stacked borrows model, because ***r1 loads a chain of new references from memory, so they get fresh timestamps and pushed onto the stack.

This is fixable for shared refs, by having the derived ref take its ancestor's timestamp instead of a fresh timestamp (so the memory must have been immutable since the initial root shared ref was created), but I'm not sure how to fix it for unique refs without tracking the whole graph of which ref was created from which.

b.next

The model is shallow, but when the code goes "down into the data", the model follows along.

No, why would they? The timestamp is stored together with the reference in memory and fetched again when the reference is loaded from memory.

No, this is fine. There is no shared reference that lives across the mutation. It is a very different sequence of steps.

Your code matches let mut x = Box::new(5); let ref = &*x; *x = 6; *ref. Or, slightly closer

let mut x = Box::new(5);
let shr = Box::new(&*Box::into_raw(x)); // put a shared ref into memory
*x = 6;
**shr // load the shard ref again and then deref it

That is UB.

It is not allowed, and (I think I said this above but it seems that got missed :smiley: ) you do not just get fresh timestamps when loading from memory. Instead, you first load the old timestamp, activate it -- meaning make it the top of the stack, or cause UB if that does not work -- and then retag (assign a fresh timestamp).

It was you who first talked about "just" assigning a fresh timestamp when loading. I replied that I find this dubious for &mut (search for "interesting suggestion" in this post). Thanks for providing an example :slight_smile:

So, to repeat -- reference identity, including the timestamp, is generally preserved by a memory round-trip. I did not say that explicitly in the post because I would never have thought of doing anything else.^^ Many steps in the model would not make any sense otherwise. Also we probably want to optimize a store immediately followed by a load into just re-using the value; if memory round-trips changed the value that would not be correct.

Ah. I see what you meant now. Sorry for the misunderstanding. As @RalfJung mentioned, this is justified on the LLVM side by citing the nondeterminism of allocation (in their Twin Memory Allocation paper). We could theoretically port this argument over directly to Rust, but it adds a line of reasoning that I thought we were trying to avoid. In essence (if I understand correctly), that paper argues that if there is a set of nondeterministic choices that result in undefined behaviour, then the code is undefined code, and so we can assume that this doesn't happen. To quote the paper (section 3.2):

Since there is at least one execution where the program would trigger UB, the compiler can assume q cannot alias with p (or with any object at all).

This is an explicit departure from the notion that UB should be machine checkable, since we are declaring the code undefined even if the particular execution it takes is perfectly fine. Now, I might be willing to make this small concession, since the only type of nondeterminism relevant to this question is where allocations get located; things like I/O, pseudorandom number generation, etc. don't have to count. However, it is a big leap from where we are today.

That might simplify things? But that guarantee can only really be made with safe code, I think (yes, it is UB even outside of safe code, but since we're trying to define UB, things feel like they get trickier). Besides, many of the problems show up when working purely immutably with data. The problem is that the freezes and such from creating references act as writes in the abstract, possibly creating (handle-able) races from something that isn't writing at all.

But if the two threads have incomparable timestamps, then thread 1 could freeze, have its freeze overwritten by thread 2's freeze, and then try to access the data. The access would be undefined since thread 2's freeze would not have come before the creation of thread 1's reference. Perhaps for clarity I should have written the creation of the references and the access of the references as separate steps.

Oof. That seems quite undesirable. I would instinctively want to say that since there is an implicit coercion that doesn't even change the runtime representation of the references going from &mut to &, it should be perfectly safe.

Yeah, I think this gets at why I wanted unique pointers to have an ID separate from the timestamp - it completely dodges all ugly questions about reinterpreting things. Also, yes, I think I was imagining that there was some sort of (abstract) runtime tagging of whether a pointer was unique or shared. As long as we have tags, we might as well add more information to them, I guess.

The spurious writes I'm thinking about are writing the value that was already present, i.e.

*ptr = *ptr;

These definitely access the memory location, but there should be no problem if there are no aliasing/racing accesses and Uniq(ptr) is at the top of the memory's stack. It does nothing to the value of memory and nothing to the stack.

I don't actually think this is a good direction to follow, but I don't think the "never dropped" analysis would be that hard in most cases - mutable references passed to functions are reborrowed, generally, not moved, so they just aren't dropped until the end of the function.

Yes, that's where I was misunderstanding. I understood "shallow" to mean that references sitting out in memory had no special properties at all (in particular, no identity). The only per-memory-address metadata was your struct MemoryByte, with the stack of borrows / freeze state of that address, but no special metadata for a reference that might be stored there. So loading a value of type &i32 would have to get a new timestamp from somewhere. Hence my talk about "root pointers" - if all you have is an &&&i32, there is just one reference in the model until you start following the chain.

I'm pretty sure this model would actually work fine for shared references. But yeah, it's not at all the same as your model. Thanks, now it makes much more sense.

This is a much stricter model, in some ways more strict than "types as contracts", since transmuting or casting a pointer won't let you change the restrictions on indirectly reachable references. So how the model handles transmutes is important.

(That's another reason I was confused for so long--I assumed the model had to let you fill in a chunk of memory by some unspecified means, then declare "this a *mut T with lifetime starting now" and use it as such, as long as the bits in that memory are valid for a T.)

I hope we can avoid such a limitation. Rust's current support for transmutes and working with the with the bit representation of types is really useful, and a big selling point for low-level programming. (In contrast to the C/C++ model, where the standard effectively refuses to admit that objects are made of bits, and so makes things that should be trivial either UB or impossible unless you go the linux kernel route and tell the compiler to follow different rules for your codebase.)

1 Like

I have wondered about this optimization too. It seems though that if we are analyzing to see whether the &&usize escapes the function, we could also analyze to see whether the usize value is actually used, no? In that case, we could do the optimization without caveats.

Well, I suppose the concern is that there would be cases where it is "maybe used"...? I suppose that without termination analysis that is eminently possible. So never mind perhaps. =)

My first thought was – what if integers carry tags but arithmetic does not propagate them, so only straightforward copies were allowed?

But that's not enough, since, as you know, there's still the problematic optimization of "if we know that a == b, then replace b with a".

Still, at the risk of repeating myself, I really think that a model that doesn't allow copying values through memcpy is unacceptably surprising.

I'm curious whether and how LLVM intends to fix the aforementioned issue for C, and whether the resulting model will be sane/predictable enough that it could be used here.

From the LLVM bug thread:

Nuno Lopes 2017-11-07 11:44:34 PST

Ralf is part of the memory model team btw. We will share a document + patches starting late next week.

Do you know if that document ever got shared, and if so do you have a link? :slight_smile:

There is no such thing as undefined code. Undefined behavior is a property of an execution. If a program has multiple executions because of non-determinism, then if any execution has UB, that is sufficient license for the compiler to optimize. This is the same for Rust, C and LLVM. I do not see any other way to define this -- we cannot require all executions to exhibit UB, that would be prohibitive for optimizations. For example:

fn main() {
  let x = Box::into_raw(Box::new(0)) as usize;
  let y = 0x42440usize as *mut i32;
  println!("{}", unsafe { *y });
}

This could be defined behavior if x is really allocated at that address. Still we say the program has UB because it can have UB.

Yes and no. It is, in principle, still checkable by exploring all executions. Again, this is not new -- for example, once concurrency enters the picture, this is also the case even if you entirely ignore integer-pointer-casts.

Now, the question is whether this is still feasible to check. And if the number of executions is large, it is not. Which is why I am trying to reduce the extend to which we rely on non-determinism for this. However, for the kind of reasoning LLVM does around pointer-integer-casts, I do not see any alternative. You do not even have to go all the way to twin memory allocations to see this, the following example is sufficient:

fn main() {
  let x = Box::into_raw(Box::new(0)) as usize;
  let y = Box::into_raw(Box::new(1)) as usize;
  if x < y { /* cause UB */ }
}

Essentially, the moment the actual integer address of an allocation matters, you have a huge amount of non-determinism and no way to explore it exhaustively.

And, again, I do not follow your terminology. What is "undefined code"? UB is a per-execution property.


This was talking about a transmute from &mut to &. These do have different runtime representations in my model -- and anyway coercions do not imply "same runtime representation", consider e.g. unsizing.

Oh then I misunderstood. What you are asking for is for the thing tagging & and the thing tagging &mut to be distinguishable. If one was a timestamp and one an ID they'd still both be 64bits of data. OTOH, if we sacrifice a bit or two to introduce a distinction, we can do that and still use timestamps for &mut.

Essentially, currently I plan to use the Borrow enum from my post as tag.

I see. Yes inserting those for &mut should be fine, not sure why you'd want to do that. :wink: (We'd have to take care that this does not require the value at that location to be valid, but that should be doable.)


The reference metadata is part of that reference. There is no reference without the metadata. So this is not per-memory-address metadata, it is just part of the data that is stored in memory.


Oh sure, I think one should be able to copy through memcpy. What does not work -- and is, in fact, broken in LLVM (though I do not know of a miscompilation) is doing that while using an integer type in memcpy. You want a type that can hold arbitrary data, including (part of) a pointer with its metadata -- a type that you cannot do anything with, no arithmetic, no deref, just write it somewhere else. C intends to make char that type, but that does not actually work because C also allows arithmetic on char. Unfortunately, LLVM has no type suitable for this purpose either -- and without a concrete miscompilation, it will be hard to convince them to change that. :confused:

That document is the twin allocation paper that has been mentioned here multiple times :slight_smile: