Unaligned SIMD (SSE2 in particular) loads/stores

hsivonen · November 7, 2016, 9:18am

It appears that LLVM doesn’t have intrinsics for unaligned loads and stores, so the user of a high-level language can’t talk directly to LLVM to request unaligned loads and stores via something like the link_llvm_intrinsics feature. Instead, the compiler for the high-level language needs to provide the means to have the kind of LLVM IR generated that eventually compiles to unaligned load/store instructions.

Looking at emmintrin.h, e.g. the Intel-defined _mm_loadu_si128 SSE2 intrinsic doesn’t map to a __builtin call but to a dereference of a pointer to a single-member struct annotated with __attribute__((__packed__, __may_alias__)).

The simd crate uses the same pattern for the same purpose with a single-member #[repr(packed)] Rust struct. In debug builds, this works if the result of the dereference is assigned to a local variable before extracting the single member. Using an expression without the intermediate variable fails, though. Furthermore, AFAICT, debug mode doesn’t actually emit a MOVDQU instruction but accomplishes the results of the computation by other means.

At present (did it work pre-MIR?), that pattern fails in release mode. The load is emitted as MOVDQA, which requires 16-byte alignment.

Given the past and the clang approach, the obvious way forward would be to make the #[repr(packed)] pattern work with MIR. However, making things work for packed structs generally seems over-complex considering the narrower goal of accomplishing unaligned SIMD loads/stores and too much of an obscure incantation from the language user perspective.

From the language user perspective, it seems to me that having read_unaligned() on *const and *mut and write_unaligned() on *mut would be more obvious and would be consistent with read_volatile() and write_volatile().

Looking at the LLVM IR clang generates for _mm_loadu_si128 vs. _mm_loadu_si128, it seems that the difference between eventual MOVDQU vs. MOVDQA instruction generation is annotating the LLVM load and store instructions with align 1 instead of align 16. It seems to me that it should be possible to add rust-intrinsics unaligned_load and unaligned_store next to volatile_load and volatile_store and make the new intrinsics generate LLVM load and store with align 1. Then these could be exposed on *const and *mut in the same manner as the volatile variants.

Does this seem like an OK way forward?

jneem · November 7, 2016, 1:54pm

Having read_unaligned() and write_unaligned() would be a nice convenience. It seems, though, that you can currently get unaligned loads by casting the source pointer to *const u8 and the target pointer to *mut u8. For example, see this.

Amanieu · November 7, 2016, 5:40pm

I have already opened a RFC for read_unaligned and write_unaligned.

hsivonen · November 8, 2016, 7:55am

Thank you @jneem and @Amanieu. copy_nonoverlapping addresses my use case.

system · March 25, 2019, 8:27am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reading and erasing bits outside of an allocation in inline asm Unsafe Code Guidelines	2	301	August 27, 2024
`unordered` as a solution to "Bit-wise reasoning for atomic accesses" Unsafe Code Guidelines	24	3596	December 22, 2024
Getting explicit SIMD on stable Rust	336	44128	March 25, 2019
How to make core::arch simd intrinsics safe: language design	6	1276	August 28, 2022
Expose LLVM integer intrinsics for arbitrarily-large integers compiler	16	1935	August 31, 2023

Unaligned SIMD (SSE2 in particular) loads/stores

Related topics