I just spent quite a while disagreeing with you and attempting to come up with a counterexample, and upon repeated failures I’m now convinced you’re right that the models are basically the same. So here is my proof sketch that, in fact, you are correct (at least with the interpretation of the partial order that I was mentally using). I think this helps clarify what the different points in the design space are, by looking at where the proof sketch might fall apart.
First of all, note that since the total order is compatible with the partial order, anything accepted by the partial order is accepted by the total order, so it suffices to show that anything accepted by the total order is also accepted by the partial order. The only place where timestamp ordering is relevant is in checking the validity of shared references. In a case accepted by a total order, there are three relevant timestamps: t_freeze, when the location was frozen, t_create, when the shared reference was created, and t_use, when the value is read from under the shared reference. For this to be accepted, t_freeze <= t_create in the total order, and no writes to the location are scheduled between t_freeze and t_use. In addition, because of the data dependency, t_create must happen-before t_use. (N.B. This may not hold in the presence of weak memory, since the reference could have been read out of a promise.). Then, in the partially ordered setting, at the moment of t_create, the creation of the reference would trigger freezing the already frozen memory location. In my mental model the freeze time was actually an antichain frontier, so this would add t_create to the freeze time. Then, upon t_use, there have been no writes, so the location is still frozen, and the ordering confirms that t_create (in the freeze time antichain) happens-before t_use, allowing the read. Q.E.D.ish.
There are 2 big ways that I can see this proof failing:
- If
t_create does not happen before t_use. This may be totally possible with weak memory orders. Essentially, the question is whether the following should be allowed:unsafe fn thread1(x: &Atomic<&'static u32>, ptr: *mut u32) {
x.store(&*ptr, Ordering::Relaxed);
}
fn thread2(x: &Atomic<&'static u32>) {
let value = *x.load(Ordering::Relaxed);
println!("{}", value);
}
Note that the retagging at the return of load has the same effect of requiring validity as the actual read.
- If the freeze time is modeled as a single time instead of some frontier. This could make sense, but rules out
unsafe fn thread1(ptr: *mut u32) {
println!("{}", *(&*ptr));
}
unsafe fn thread2(ptr: *mut u32) {
println!("{}", *(&*ptr));
}
as only the first freeze would “win” and the second reference would be invalid.
In both those cases, I’d actually like for the code to be accepted, so… color me convinced that, to the extent that the models are different from each other, the simple operational total order is better. Anyway, as we both said, this is not a question that is super relevant right now.
Yeah, my instinct is that this will be way too stringent. Especially since we say that &mut and *mut are the same layout, people will expect at least round trip conversion to be valid, and as you point out, we can’t even detect transmutes in general, which makes it nigh impossible to make miri check for this condition.
I at least am relatively happy with this as a compromise position (and it’s one of the things I proposed way back in the Types as Contracts thread). The lack of Raw on the stack just means that the new pointer is not dereferenceable, but it still allows comparison, hashing, etc. It also allows for a 2 piece desugaring of the as *mut _ cast, as a side effecting “broadcast” operation followed by a transmute, which I think is a reasonable model to use. The actual transmute can then be subject to, e.g., DCE, and the broadcast can be handled separately and will compile to no IR.
That said, I can definitely see this being another confusing way to shoot yourself in the foot.
Oh, sure. I just wanted to see if we could extend the model to make it legal. But since I finished writing that idea up and have been thinking about it more, I’m getting less and less convinced that it is useful, at least for now.
Ooh, FFI is an interesting aspect of this topic that I hadn’t thought about. If the rust code casts to a raw pointer and passes it to C, that certainly would be fine, but I assume you’re talking about an extern fn that directly takes &Cell<i32> as an argument? I’m not sure how this would cause more problems than just passing a reference directly to FFI. If I declare
extern "C" {
fn foo(&mut i32);
}
then the C code will receive a reference, but will interpret it (which seems like a transmute to me) as a raw pointer. Then it will modify the memory, which would be invalid if there isn’t some implicit “all pointers immediately push Raw if crossing the FFI boundary.” This doesn’t seem to be a problem unique to Cell.
Honestly, while I do like making everything reorderable (and this particular case seems simple enough that I could totally see wanting a Rust pass to do something like this), my biggest motivation for wanting separate tags and timestamps is that it feels mentally simpler to me. Technically more complicated, but intuitively simpler.
That seems mostly reasonable, but I can see a few places where transmutes might throw a wrench in things. What if someone transmutes a &mut into a shared reference?
Actually, that’s an interesting question independent of the problem of timestamp tag ordering. I could see an argument for it just being UB, but it seems so innocuous, and more convincingly, I’d like to be able to transmute from &&mut T to &&T (mostly because I want more useful examples like transmuting &[&mut T] into &[&T]). It’s not clear what the right behaviour is. My instinct is that it interacts with retagging to make everything work out - when you actually load/receive the transmuted reference, it gets retagged. This activates the Uniq element on the stack (because despite the type, you actually got a mutable reference) and then tops off the stack and freezes it. But maybe that is too delayed.

Not in general, certainly, but if you have &mut/noalias dereferenceable, it seems like that should lock things down enough for it be be valid (EDIT: I just noticed #53105, which is exactly this). Those assert exactly that there is no other code that can observe the location (and you won’t trap by trying to write). Obviously you need to only write back the value that was already there, but I don’t see a problem with that.
That said, upon experimentation, it looks like LLVM still won’t do an unconditional write, even in that highly restricted situation.
This mostly feels right to me. However, it occurs to me that it would be possible to support the intuition that when you explicitly drop a mutable reference, you are giving up the promise of exclusive access. There would be a “drop glue” (in quotes because it is purely at the instrumented VM level) for &mut that pops off the Uniq and any barrier that came with it.
The main case I can see for justifying them is if you have many references (i.e. in a data structure) and you want to convert all of them in one go. Iterating through and casting each one is expensive and possibly impossible, since we don’t really have a way to change the type of a data structure partially through (that would require true linear types to make it work).
This (generalized) was the motivation behind my RFCs #91 and #223, for example.