Getting explicit SIMD on stable Rust

scottmcm · December 1, 2016, 8:50am

Actually, it looks like that's no longer a problem. Per LangRef for getelementptr-instruction:

When indexing into an array, pointer or vector, integers of any width are allowed, and they are not required to be constant. [...] The first type indexed into must be a pointer value, subsequent types can be arrays, vectors, and structs.

Nightly will even generate such a thing if you do things like write to a SIMD type through a borrow transmuted to &mut [f32; 4], like

  %8 = tail call float @llvm.fma.f32(float %5, float %6, float %7) #5
  %9 = getelementptr inbounds <4 x float>, <4 x float>* %r, i64 0, i64 0
  store float %8, float* %9, align 16

Noticed while experimenting further with what happens if I try to code intrinsics in Rust. Turns out that while "You can use llvm.fma on any floating point or vector of floating point type", it doesn't currently manage to turn an extract-fma-insert-each into an @llvm.fma.v4f32.

Rust + LLVM for fma thing

#[no_mangle]
pub fn _mm_fmadd_ps(a: Simd4x<f32>, b: Simd4x<f32>, c: Simd4x<f32>) -> Simd4x<f32> {
    Simd4x(
        a.0.mul_add(b.0, c.0),
        a.1.mul_add(b.1, c.1),
        a.2.mul_add(b.2, c.2),
        a.3.mul_add(b.3, c.3),
    )
}

; Function Attrs: nounwind readnone uwtable
define <4 x float> @_mm_fmadd_ps(<4 x float>, <4 x float>, <4 x float>) unnamed_addr #0 {
entry-block:
  %a.0.vec.extract = extractelement <4 x float> %0, i32 0
  %b.0.vec.extract = extractelement <4 x float> %1, i32 0
  %c.0.vec.extract = extractelement <4 x float> %2, i32 0
  %3 = tail call float @llvm.fma.f32(float %a.0.vec.extract, float %b.0.vec.extract, float %c.0.vec.extract) #5
  %a.4.vec.extract = extractelement <4 x float> %0, i32 1
  %b.4.vec.extract = extractelement <4 x float> %1, i32 1
  %c.4.vec.extract = extractelement <4 x float> %2, i32 1
  %4 = tail call float @llvm.fma.f32(float %a.4.vec.extract, float %b.4.vec.extract, float %c.4.vec.extract) #5
  %a.8.vec.extract = extractelement <4 x float> %0, i32 2
  %b.8.vec.extract = extractelement <4 x float> %1, i32 2
  %c.8.vec.extract = extractelement <4 x float> %2, i32 2
  %5 = tail call float @llvm.fma.f32(float %a.8.vec.extract, float %b.8.vec.extract, float %c.8.vec.extract) #5
  %a.12.vec.extract = extractelement <4 x float> %0, i32 3
  %b.12.vec.extract = extractelement <4 x float> %1, i32 3
  %c.12.vec.extract = extractelement <4 x float> %2, i32 3
  %6 = tail call float @llvm.fma.f32(float %a.12.vec.extract, float %b.12.vec.extract, float %c.12.vec.extract) #5
  %_0.0.vec.insert = insertelement <4 x float> undef, float %3, i32 0
  %_0.4.vec.insert = insertelement <4 x float> %_0.0.vec.insert, float %4, i32 1
  %_0.8.vec.insert = insertelement <4 x float> %_0.4.vec.insert, float %5, i32 2
  %_0.12.vec.insert = insertelement <4 x float> %_0.8.vec.insert, float %6, i32 3
  ret <4 x float> %_0.12.vec.insert
}

Topic		Replies	Views
Stabilizing SIMD-aligned types ahead of the rest of SIMD language design	3	1844	March 25, 2019
SIMD now available in libstd on nightly! libs	15	9274	March 25, 2019
What's the next step towards the stabilization of SIMD? language design	16	3805	March 25, 2019
How to make core::arch simd intrinsics safe: language design	6	1271	August 28, 2022
Packed_simd: `cfg(target_feature)` does not play well with `#[target_feature]`	3	2069	March 25, 2019

Getting explicit SIMD on stable Rust

Related topics