There is no such thing as undefined code. Undefined behavior is a property of an execution. If a program has multiple executions because of non-determinism, then if any execution has UB, that is sufficient license for the compiler to optimize. This is the same for Rust, C and LLVM. I do not see any other way to define this – we cannot require all executions to exhibit UB, that would be prohibitive for optimizations. For example:
fn main() {
let x = Box::into_raw(Box::new(0)) as usize;
let y = 0x42440usize as *mut i32;
println!("{}", unsafe { *y });
}
This could be defined behavior if x is really allocated at that address. Still we say the program has UB because it can have UB.
Yes and no. It is, in principle, still checkable by exploring all executions. Again, this is not new – for example, once concurrency enters the picture, this is also the case even if you entirely ignore integer-pointer-casts.
Now, the question is whether this is still feasible to check. And if the number of executions is large, it is not. Which is why I am trying to reduce the extend to which we rely on non-determinism for this. However, for the kind of reasoning LLVM does around pointer-integer-casts, I do not see any alternative. You do not even have to go all the way to twin memory allocations to see this, the following example is sufficient:
fn main() {
let x = Box::into_raw(Box::new(0)) as usize;
let y = Box::into_raw(Box::new(1)) as usize;
if x < y { /* cause UB */ }
}
Essentially, the moment the actual integer address of an allocation matters, you have a huge amount of non-determinism and no way to explore it exhaustively.
And, again, I do not follow your terminology. What is “undefined code”? UB is a per-execution property.
This was talking about a transmute from &mut to &. These do have different runtime representations in my model – and anyway coercions do not imply “same runtime representation”, consider e.g. unsizing.
Oh then I misunderstood. What you are asking for is for the thing tagging & and the thing tagging &mut to be distinguishable. If one was a timestamp and one an ID they’d still both be 64bits of data. OTOH, if we sacrifice a bit or two to introduce a distinction, we can do that and still use timestamps for &mut.
Essentially, currently I plan to use the Borrow enum from my post as tag.
I see. Yes inserting those for &mut should be fine, not sure why you’d want to do that.
(We’d have to take care that this does not require the value at that location to be valid, but that should be doable.)
The reference metadata is part of that reference. There is no reference without the metadata. So this is not per-memory-address metadata, it is just part of the data that is stored in memory.
Oh sure, I think one should be able to copy through memcpy. What does not work – and is, in fact, broken in LLVM (though I do not know of a miscompilation) is doing that while using an integer type in memcpy. You want a type that can hold arbitrary data, including (part of) a pointer with its metadata – a type that you cannot do anything with, no arithmetic, no deref, just write it somewhere else. C intends to make char that type, but that does not actually work because C also allows arithmetic on char. Unfortunately, LLVM has no type suitable for this purpose either – and without a concrete miscompilation, it will be hard to convince them to change that. 
That document is the twin allocation paper that has been mentioned here multiple times 