Getting explicit SIMD on stable Rust

Actually, it looks like that's no longer a problem. Per LangRef for getelementptr-instruction:

When indexing into an array, pointer or vector, integers of any width are allowed, and they are not required to be constant. [...] The first type indexed into must be a pointer value, subsequent types can be arrays, vectors, and structs.

Nightly will even generate such a thing if you do things like write to a SIMD type through a borrow transmuted to &mut [f32; 4], like

  %8 = tail call float @llvm.fma.f32(float %5, float %6, float %7) #5
  %9 = getelementptr inbounds <4 x float>, <4 x float>* %r, i64 0, i64 0
  store float %8, float* %9, align 16

Noticed while experimenting further with what happens if I try to code intrinsics in Rust. Turns out that while "You can use llvm.fma on any floating point or vector of floating point type", it doesn't currently manage to turn an extract-fma-insert-each into an @llvm.fma.v4f32.

Rust + LLVM for fma thing
#[no_mangle]
pub fn _mm_fmadd_ps(a: Simd4x<f32>, b: Simd4x<f32>, c: Simd4x<f32>) -> Simd4x<f32> {
    Simd4x(
        a.0.mul_add(b.0, c.0),
        a.1.mul_add(b.1, c.1),
        a.2.mul_add(b.2, c.2),
        a.3.mul_add(b.3, c.3),
    )
}

; Function Attrs: nounwind readnone uwtable
define <4 x float> @_mm_fmadd_ps(<4 x float>, <4 x float>, <4 x float>) unnamed_addr #0 {
entry-block:
  %a.0.vec.extract = extractelement <4 x float> %0, i32 0
  %b.0.vec.extract = extractelement <4 x float> %1, i32 0
  %c.0.vec.extract = extractelement <4 x float> %2, i32 0
  %3 = tail call float @llvm.fma.f32(float %a.0.vec.extract, float %b.0.vec.extract, float %c.0.vec.extract) #5
  %a.4.vec.extract = extractelement <4 x float> %0, i32 1
  %b.4.vec.extract = extractelement <4 x float> %1, i32 1
  %c.4.vec.extract = extractelement <4 x float> %2, i32 1
  %4 = tail call float @llvm.fma.f32(float %a.4.vec.extract, float %b.4.vec.extract, float %c.4.vec.extract) #5
  %a.8.vec.extract = extractelement <4 x float> %0, i32 2
  %b.8.vec.extract = extractelement <4 x float> %1, i32 2
  %c.8.vec.extract = extractelement <4 x float> %2, i32 2
  %5 = tail call float @llvm.fma.f32(float %a.8.vec.extract, float %b.8.vec.extract, float %c.8.vec.extract) #5
  %a.12.vec.extract = extractelement <4 x float> %0, i32 3
  %b.12.vec.extract = extractelement <4 x float> %1, i32 3
  %c.12.vec.extract = extractelement <4 x float> %2, i32 3
  %6 = tail call float @llvm.fma.f32(float %a.12.vec.extract, float %b.12.vec.extract, float %c.12.vec.extract) #5
  %_0.0.vec.insert = insertelement <4 x float> undef, float %3, i32 0
  %_0.4.vec.insert = insertelement <4 x float> %_0.0.vec.insert, float %4, i32 1
  %_0.8.vec.insert = insertelement <4 x float> %_0.4.vec.insert, float %5, i32 2
  %_0.12.vec.insert = insertelement <4 x float> %_0.8.vec.insert, float %6, i32 3
  ret <4 x float> %_0.12.vec.insert
}
1 Like