I don't quite follow. The original definition is certainly not fine. It may be that the ABI itself is permissible (both extern "C"
in core
#[inline] "rust"
assign registers here, that's hopefully right) but that is only a necessary and not a sufficient condition. Further it already leaks the values out of the registers as in initially containing allocas.
The value semantics of Rust don't allow denoting places which are allocated in registers, as places need addresses. As soon as execution gets into the intrinsic implementation, the argument may be moved onto the stack as none of the semantics require the value passed in the arguments to stay in registers. This level of guarantee is only good enough if there's a reasonable way to enforce the optimization or check whether it has take place .. and to quote:
And it's not like there's a good way to produce that more-straightforward IR from Rust.
The full unoptimized IR for the exposed intrinsic is as follows: godbolt
use std::arch::x86_64::*;
pub unsafe fn foo(x: __m256i, bbox: __m256i) -> __m256i {
_mm256_cmpgt_epi32(x, bbox)
}
define internal void @core::core_arch::x86::avx2::_mm256_cmpgt_epi32(ptr sret(<4 x i64>) align 32 %_0, ptr align 32 %a, ptr align 32 %b) unnamed_addr {
start:
%0 = alloca <8 x i32>, align 32
%_5 = alloca <8 x i32>, align 32
%_4 = alloca <8 x i32>, align 32
%1 = load <4 x i64>, ptr %a, align 32
store <4 x i64> %1, ptr %_4, align 32
%2 = load <4 x i64>, ptr %b, align 32
store <4 x i64> %2, ptr %_5, align 32
%3 = load <8 x i32>, ptr %_4, align 32
%4 = load <8 x i32>, ptr %_5, align 32
%5 = icmp sgt <8 x i32> %3, %4
%6 = sext <8 x i1> %5 to <8 x i32>
store <8 x i32> %6, ptr %0, align 32
%_3 = load <8 x i32>, ptr %0, align 32
store <8 x i32> %_3, ptr %_0, align 32
ret void
}
Many stack moves, a type change from 4 × i64
to 8 × i32
and all those addresses involved which demonstrate a lack of enforcement of register use. The original function arguments are even defined as pointers to values and not as registers in that level of ABI. These are all obviously wrong for the sake of any semblence of leak-freedom. Imho the implementation of these intrinsics would need to utilize inline assembly itself to truly promise absence of stack spillage / a direct translation to the intrinsic. And even that will be wrong on the caller side where no value used in arguments can be certified to keep being allocated within a register without spillage.
None of this is to say that rustc
or core
is at some kind of fault here. If anything, I think C is in the exact same predicament regarding actual guarantees around intrinsics; and its silent saving grace is the lack of wide-spread dereferencable
attributes that ensure no improper optimizations involving 'deduplication' of a loop constant to the stack for 'efficiency' as here.