It appears that LLVM doesn’t have intrinsics for unaligned loads and stores, so the user of a high-level language can’t talk directly to LLVM to request unaligned loads and stores via something like the link_llvm_intrinsics feature. Instead, the compiler for the high-level language needs to provide the means to have the kind of LLVM IR generated that eventually compiles to unaligned load/store instructions.
Looking at emmintrin.h, e.g. the Intel-defined _mm_loadu_si128 SSE2 intrinsic doesn’t map to a __builtin call but to a dereference of a pointer to a single-member struct annotated with __attribute__((__packed__, __may_alias__)).
The simd crate uses the same pattern for the same purpose with a single-member #[repr(packed)] Rust struct. In debug builds, this works if the result of the dereference is assigned to a local variable before extracting the single member. Using an expression without the intermediate variable fails, though. Furthermore, AFAICT, debug mode doesn’t actually emit a MOVDQU instruction but accomplishes the results of the computation by other means.
At present (did it work pre-MIR?), that pattern fails in release mode. The load is emitted as MOVDQA, which requires 16-byte alignment.
Given the past and the clang approach, the obvious way forward would be to make the #[repr(packed)] pattern work with MIR. However, making things work for packed structs generally seems over-complex considering the narrower goal of accomplishing unaligned SIMD loads/stores and too much of an obscure incantation from the language user perspective.
From the language user perspective, it seems to me that having read_unaligned() on *const and *mut and write_unaligned() on *mut would be more obvious and would be consistent with read_volatile() and write_volatile().
Looking at the LLVM IR clang generates for _mm_loadu_si128 vs. _mm_loadu_si128, it seems that the difference between eventual MOVDQU vs. MOVDQA instruction generation is annotating the LLVM load and store instructions with align 1 instead of align 16. It seems to me that it should be possible to add rust-intrinsics unaligned_load and unaligned_store next to volatile_load and volatile_store and make the new intrinsics generate LLVM load and store with align 1. Then these could be exposed on *const and *mut in the same manner as the volatile variants.
Does this seem like an OK way forward?