This does look like a miscompilation. Removing the bounds check on Y[i]
(i.e. assuming i < 2
) gives us this (asm annotated by me, with assistance from godbolt):
Rust
extern "C" {
pub static mut X: [u16; 2] ;//= [0,0];
pub static mut Y: [u16; 2] ;//= [0,0];
}
pub unsafe fn test(i: usize) -> u16 {
let px: usize = X.as_ptr().add(2) as *const u16 as usize;
let py: usize = Y.as_ptr().add(i) as *const u16 as usize;
let flag = (px ^ 123) == (py ^ 123);
Y[0] = 3;
if flag {
*Y.get_unchecked_mut(i) = 4
}
return Y[0]
}
LLVM-IR (optimized)
define i16 @_ZN7example4test17h716b2f3df427f079E(i64 %i) unnamed_addr #0 !dbg !6 {
%0 = getelementptr inbounds [2 x i16], [2 x i16]* @Y, i64 0, i64 %i, !dbg !10
%flag = icmp eq i16* %0, getelementptr inbounds ([2 x i16], [2 x i16]* @X, i64 1, i64 0), !dbg !20
store i16 3, i16* getelementptr inbounds ([2 x i16], [2 x i16]* @Y, i64 0, i64 0), align 2, !dbg !21
br i1 %flag, label %bb6, label %bb8, !dbg !22
bb6: ; preds = %start
store i16 4, i16* getelementptr inbounds ([2 x i16], [2 x i16]* @X, i64 1, i64 0), align 2, !dbg !23
br label %bb8, !dbg !22
bb8: ; preds = %start, %bb6
ret i16 3, !dbg !24
}
ASM
example::test:
; rcx <- address of Y
mov rcx, qword ptr [rip + Y@GOTPCREL]
; rdx <- address of Y, offset rdi u16 ; rdx is py
lea rdx, [rcx + 2*rdi]
; rax <- address of X
mov rax, qword ptr [rip + X@GOTPCREL]
; rsi <- address of X, offset 2 u16 ; rsi is px
lea rsi, [rax + 4]
; Y[0] <- 3
mov word ptr [rcx], 3
; compare rdx (py) and rsi (px)
cmp rdx, rsi
; if equal, jump to .FLAG_IS_TRUE
je .FLAG_IS_TRUE
; return 3
mov ax, 3
ret
.FLAG_IS_TRUE:
; X[2] <- 4
mov word ptr [rax + 4], 4
; return 3
mov ax, 3
ret
The fact that it's storing to X[2]
instead of Y[0]
is fine, because the check shows that the two addresses are identical. The issue is that we then return 3
.
This is, if I'm not mistaken, the exact same issue that Ralf pointed out in the original blog post: a write through one pointer is exchanged for a write through another (because they're shown equivalent) and then assumed to not write memory accessed through the original pointer. Obviously this combination of optimization passes is incorrect.
This is just another edge case miscompilation bug, though. LLVM is a complicated beast, and because LLIR has largely been informal to this point, it's not surprising that some optimization passes combine to cause miscompilations. All software has bugs.
It's for reasons like this combination of optimizations causing an e2e miscompilation that Ralf makes his point in the blog post: we need to actually decide what the semantics of LLIR are, so that we have a hope of showing optimization passes individually correct.