Getting explicit SIMD on stable Rust

There is a lot of details to work out. Suppose you’re writing impl Mul for i32x4. It would go something like this:

  • If MIPS MSA is available, use __msa_mulv_w().
  • If ARM NEON is available, use vmul_i32().
  • If SSE 4.1 is available, use _mm_mullo_epi32().
  • If SSE2 is available, try to cobble something together out of 16-bit multiplications using _mm_mulhi_epi16() and _mm_mullo_epi16(), or maybe _mm_mul_epu32() combined with some shuffling.
  • Otherwise, expand into a lane-wise scalar multiplication.

In particular, trying to construct an i32x4 multiplication out of existing SSE2 intrinsics requires some work and knowledge. Work and knowledge that has already been put into LLVM.

This is just one operation for one type. There’s about 150 of those to go through. Then you would need to write individual unit tests for all of them since you’re guaranteed to have picked the wrong intrinsic by mistake at least once. Then find a MIPS machine to run your unit tests on. No, not that one. One with SIMD instructions available.

2 Likes