Getting explicit SIMD on stable Rust

Thanks for the pointer! There's so much good stuff in that thread :trophy:

I read three requirements from the types section of RFC 1199:

  1. Primitive type, repeated 1<<N times
  2. No padding
  3. Appropriate alignment

"Something with the same layout but different type-level attributes" (3) makes me think newtype. And according to repr(Rust) in nomicon, existing rust arrays meet (1) & (2):

However with the exception of arrays (which are densely packed and in-order), the layout of data is not by default specified in Rust.

Combining those in stable, without any special anything, even seems to compile to the LLVM vector instructions:

pub struct Simd<T>(T);

#[no_mangle]
pub fn demo(a: &mut Simd<[i32; 4]>, b: &Simd<[i32; 4]>) {
    a.0[0] += b.0[0];
    a.0[1] += b.0[1];
    a.0[2] += b.0[2];
    a.0[3] += b.0[3];
}

produces

define void @demo(%"Simd<[i32; 4]>"* nocapture dereferenceable(16), %"Simd<[i32; 4]>"* noalias nocapture readonly dereferenceable(16)) unnamed_addr #0 {
entry-block:
  %2 = bitcast %"Simd<[i32; 4]>"* %1 to <4 x i32>*
  %3 = load <4 x i32>, <4 x i32>* %2, align 4
  %4 = bitcast %"Simd<[i32; 4]>"* %0 to <4 x i32>*
  %5 = load <4 x i32>, <4 x i32>* %4, align 4
  %6 = add <4 x i32> %5, %3
  %7 = bitcast %"Simd<[i32; 4]>"* %0 to <4 x i32>*
  store <4 x i32> %6, <4 x i32>* %7, align 4
  ret void
}

and then

demo:
	.cfi_startproc
	movdqu	(%rsi), %xmm0
	movdqu	(%rdi), %xmm1
	paddd	%xmm0, %xmm1
	movdqu	%xmm1, (%rdi)
	retq

(The alignment RFC isn't in nightly, right? I couldn't find any way to force those align 4s to align 128s to see what would happen.)

So I'd tweak part 1 of stoklund's proposal to something like this:

  1. Add pub struct Simd<T>(T); to the library. Don't stabilize #[repr(simd)] for now, but use it internally to require T to be [ (i|u|f)N; 1 << M ] and to add the "appropriate" alignment to the monomorphized type. (It could also continue to do what it does today, for people opted-in in unstable.) Derive the obvious things, but otherwise add the absolute minimal set of Trait implementations—I'm thinking just Index and IndexMut, plus maybe letting it coerse to a slice

Miscellaneous thoughts and justifications:

  • I kept trying to come up with a bikeshed syntax for what a new primitive type would look like for this. Given the similarity, I figured it should be close to arrays, but everything seemed either awkward or likely to collide with associated constants, const fn, value generics, etc. Simd<[T; N]> is surprisingly good.
  • Anyone who wants them can add type aliases for f32x4, i16x8, etc.
  • The layout is predictable enough that unsafe will give you reasonable conversions, so safe conversions are not needed for stabilization.
  • There's no std trait for wrapping_add, so crates can write some trait you import to get it. Not implementing Add is intentional, as it should panic for overflow like normal [ui]N, and it feels like that would generate fundamentally not-SIMD-like code. (It could also be implemented later if that statement turns out to be incorrect.)
  • Layout-compatible conversions (so I can use .r .g .b .a instead of [0] [1] [2] [3]) are something that can be figured out later. I also think they should be discussed broader than just simd types, since I see no reason I shouldn't always be allowed to convert my &[T;7] into &(T,T,T,T,T,T,T). (With generics I figure it's harder to allow &(A,B,C) to convert to anything.) And if tuples are just structs with implied field names, maybe I can opt-in to the same stuff with something like #[repr(array)] struct RGBA { ... } ...
  • This doesn't decide one way or another on whether intrinsics should be stabilized, or what their names should be. But it does say what their types should be.
  • I just realized I never provided anything to let you create such a thing. Maybe it should contain pub T? Describing broadcast as Simd([x; 4]) looks great, and even without alignment gives reasonable-seeming insertelement instructions in LLVM.
3 Likes