It appears that LLVM doesn’t have intrinsics for unaligned loads and stores, so the user of a high-level language can’t talk directly to LLVM to request unaligned loads and stores via something like the link_llvm_intrinsics
feature. Instead, the compiler for the high-level language needs to provide the means to have the kind of LLVM IR generated that eventually compiles to unaligned load/store instructions.
Looking at emmintrin.h
, e.g. the Intel-defined _mm_loadu_si128
SSE2 intrinsic doesn’t map to a __builtin
call but to a dereference of a pointer to a single-member struct annotated with __attribute__((__packed__, __may_alias__))
.
The simd
crate uses the same pattern for the same purpose with a single-member #[repr(packed)]
Rust struct. In debug builds, this works if the result of the dereference is assigned to a local variable before extracting the single member. Using an expression without the intermediate variable fails, though. Furthermore, AFAICT, debug mode doesn’t actually emit a MOVDQU
instruction but accomplishes the results of the computation by other means.
At present (did it work pre-MIR?), that pattern fails in release mode. The load is emitted as MOVDQA
, which requires 16-byte alignment.
Given the past and the clang approach, the obvious way forward would be to make the #[repr(packed)]
pattern work with MIR. However, making things work for packed structs generally seems over-complex considering the narrower goal of accomplishing unaligned SIMD loads/stores and too much of an obscure incantation from the language user perspective.
From the language user perspective, it seems to me that having read_unaligned()
on *const
and *mut
and write_unaligned()
on *mut
would be more obvious and would be consistent with read_volatile()
and write_volatile()
.
Looking at the LLVM IR clang generates for _mm_loadu_si128
vs. _mm_loadu_si128
, it seems that the difference between eventual MOVDQU
vs. MOVDQA
instruction generation is annotating the LLVM load
and store
instructions with align 1
instead of align 16
. It seems to me that it should be possible to add rust-intrinsic
s unaligned_load
and unaligned_store
next to volatile_load
and volatile_store
and make the new intrinsics generate LLVM load
and store
with align 1
. Then these could be exposed on *const
and *mut
in the same manner as the volatile variants.
Does this seem like an OK way forward?