Oh, right, I completely forgot that optimizing away a “dead” ptr-to-int is not justifiable in intptrcast. And same for dead loads, because a load from memory may happen to load a pointer at integer type and thus also perform a task. That is clearly not desirable. I think I did not realize until during this discussion how many optimizations are lost in intptrcast.
So, it doesn’t seem like we currently have any formal/checkable model that would actually be sound wrt. to LLVM and support the full range of integer operations on pointers. The best proposal I can come up with right now is two implement two modes in miri:
- A sound, but incomplete mode. “sound” means that (as far as we know) this detects all UB. “incomplete” means that it may flag programs as having UB that do not actually have UB. As far as the pointer model goes, the model currently implemented by miri should suffice (see below). However, soundly modelling noalias annotations remains an open problem, and there are probably other subtle sources of UB in LLVM – so soundness may actually not be possible, but it would be the goal.
- A complete, but unsound mode. “complete” means that if the checker detects UB, then we really see this as being UB under the Rust rules. “unsound” means that some UB might fail to be detected. miri+intptrcast would be the starting point here.
This would provide an “overapproximation” and an “underapproximation” of the UB we really want, and hopefully, over time, we can close that gap. It is not very satisfying though…
Currently, miri treats memory locations as being abstract (block ID + offset). This can model the fact that locations are unforgeable, and that offsetting a pointer cannot cross from block to block. However it only supports very limited arithmetic on integers obtained from pointer: You can add/subtract an “actual” integer, and you can do bit-ops on the bits that are “below” the alignment (this makes HashMap work). However, you cannot even add/subtract two pointers. You cannot currently compare pointers to different locations for inequalities (less-than etc.), but that could be fixed.
Also, there are some open questions here around overflow: Since the actual base address of the allocation is left undetermined, we cannot tell if (ptr as usize) + 50000
overflows or not, which is something that safe code can observe. Currently, the overflow behavior is as if the base address was actually 0, for any pointer. (ptr as usize) - 1
overflows, but (ptr as usize) + x
never overflows for any x: usize
.