Getting explicit SIMD on stable Rust


@nagisa Could you unpack that? I don’t really grok what you’re saying. Examples would be super helpful.


@nagisa Additionally, why doesn’t something like this satisfy your desire for unaligned vector loads? Or what about the _mm_loadu_si128 intrinsic from SSE2?


Oh, I guess I was wrong and pcmpeqb does not permit an unaligned pointer, my bad.

Additionally, why doesn’t something like this1 satisfy your desire for unaligned vector loads?

Terrible codegen in debug mode. _mm_loadu_si128 may be okay but is specific to architecture (i.e. its better to just write assembly then)


So, in response to proposal to use [T; n] for representing SIMD. That forces our hand to only ever support uniform vectors, whereas there may be systems (esp. in the future: see RISC-V) which support non-uniform vectors just fine.

Additionally, why doesn’t something like this satisfy your desire for unaligned vector loads?

So I tried using this and with optimisations LLVM fell back to using both unaligned loads and stores, rather than just unaligned loads, like I was intending. Explicitly crafting the IR to make LLVM translate correspondingly aligned loads and stores resulted in the intended instructions being emitted. That suggests that we might at least want to introduce some intrinsic to do arbitrary aligned loads and stores, where its possible to specify alignment independent of the data type (or we might eventually gain #[repr(align(N))] which would probably help with this).


Note that there is already std::ptr::read_unaligned as of a few months ago (albeit not stable); the current implementation uses memcpy, but changing it to be intrinsic-based would be ‘just’ an implementation patch. That doesn’t address custom alignments other than 1, though.


For those interested, there is a Final Comment Period ongoing to stabilize cfg_target_feature in this issue:

Since this has been discussed here a lot, I thought some of you might be interested.


Also, I’ve been slowly but steadily working on the SIMD RFC. I recently took a detour to prove that the simd crate could compile on top of our proposal:


Can we use this somehow?


Seems like that might be useful if we were implementing a code generator?


I implanted a basic on to retro-fit into the Rustc but it broke everything fairly horribly (uses GCC and LLVMINT)


Since there was interest to stabilize cfg_target_feature/target_feature to allow other RFCs to advance, I wrote a pre-RFC to fix their semantics on stable and nightly Rust before stabilization:

Since target feature is required for the SIMD RFCs, it would be great if the pre-RFC could get some eyes from the interested parties to make sure that the proposed semantics don’t hinder any SIMD-related work.


it might be worth looking into how SIMD is being done for webassembly. they’ve chosen the approach of vector types and ‘general’ operations, rather than directly exposed intrinsics - that might only be a good choice in the browser environment, but still, it’d be nice if rust SIMD could compile to wasm SIMD.

here’s the design discussion: and chromium implementation in progress:


Great work on the RFC so far, @burntsushi. I found it very easy to read and understand. :slightly_smiling_face:


Now that target_feature has been stabilized and exposing intrinsics is well under way, we can start thinking about the higher-level API.

A few engineers at Google have been working on a SIMD library for C++ that shares similar goals:

  • Support multiple architectures.
  • Don’t hide the cost of expensive operations.
  • Get 80–90% of the performance of hand-written platform-specific code, at 10–20% of the cost.

In particular there is an overview of operations that are efficient on all platforms.

The current API is the result of insights gathered over many years, and I think we should take inspiration from it.


@ruudva Here are my brief thoughts:

  1. Probably should start a new thread. Stable SIMD still has a long way to go, and it’d be good to not start mixing them up.
  2. At this point, I’d suggest that folks just try and start building a library. I imagine this will produce valuable feedback. :slight_smile:


I’m happy to learn that stdsimd has progressed to the point where non-x86 support has started.

I see that unlike simd, stdsimd doesn’t have distinct types for boolean vectors and instead operations that return lane-wise booleans return a vector of signed integers of the same lane width and lane number as in the input type.

(I also observe that since the start of this thread, WebAssembly SIMD has lost distinct boolean vector types.)

Yet, I don’t see stdsimd analogs for the all() and any() methods that boolean vectors had in simd and that WebAssembly SIMD still has. What’s the plan for those operations in stdsimd?

(Arguably, those operations hide cost a bit, since they involve more instructions on ARMv7 than on x86/x86_64/Aarch64. Still, they are the sort of operations that the library should provide instead of everyone having to figure them out individually–especially on ARMv7.)