Thanks for the pointer! There's so much good stuff in that thread
I read three requirements from the types section of RFC 1199:
- Primitive type, repeated 1<<N times
- No padding
- Appropriate alignment
"Something with the same layout but different type-level attributes" (3) makes me think newtype. And according to repr(Rust) in nomicon, existing rust arrays meet (1) & (2):
However with the exception of arrays (which are densely packed and in-order), the layout of data is not by default specified in Rust.
Combining those in stable, without any special anything, even seems to compile to the LLVM vector instructions:
pub struct Simd<T>(T);
#[no_mangle]
pub fn demo(a: &mut Simd<[i32; 4]>, b: &Simd<[i32; 4]>) {
a.0[0] += b.0[0];
a.0[1] += b.0[1];
a.0[2] += b.0[2];
a.0[3] += b.0[3];
}
produces
define void @demo(%"Simd<[i32; 4]>"* nocapture dereferenceable(16), %"Simd<[i32; 4]>"* noalias nocapture readonly dereferenceable(16)) unnamed_addr #0 {
entry-block:
%2 = bitcast %"Simd<[i32; 4]>"* %1 to <4 x i32>*
%3 = load <4 x i32>, <4 x i32>* %2, align 4
%4 = bitcast %"Simd<[i32; 4]>"* %0 to <4 x i32>*
%5 = load <4 x i32>, <4 x i32>* %4, align 4
%6 = add <4 x i32> %5, %3
%7 = bitcast %"Simd<[i32; 4]>"* %0 to <4 x i32>*
store <4 x i32> %6, <4 x i32>* %7, align 4
ret void
}
and then
demo:
.cfi_startproc
movdqu (%rsi), %xmm0
movdqu (%rdi), %xmm1
paddd %xmm0, %xmm1
movdqu %xmm1, (%rdi)
retq
(The alignment RFC isn't in nightly, right? I couldn't find any way to force those align 4
s to align 128
s to see what would happen.)
So I'd tweak part 1 of stoklund's proposal to something like this:
- Add
pub struct Simd<T>(T);
to the library. Don't stabilize#[repr(simd)]
for now, but use it internally to requireT
to be [ (i|u|f)N; 1 << M ] and to add the "appropriate" alignment to the monomorphized type. (It could also continue to do what it does today, for people opted-in in unstable.) Derive the obvious things, but otherwise add the absolute minimal set of Trait implementations—I'm thinking justIndex
andIndexMut
, plus maybe letting it coerse to a slice
Miscellaneous thoughts and justifications:
- I kept trying to come up with a bikeshed syntax for what a new primitive type would look like for this. Given the similarity, I figured it should be close to arrays, but everything seemed either awkward or likely to collide with associated constants, const fn, value generics, etc.
Simd<[T; N]>
is surprisingly good. - Anyone who wants them can add
type
aliases for f32x4, i16x8, etc. - The layout is predictable enough that
unsafe
will give you reasonable conversions, so safe conversions are not needed for stabilization. - There's no
std
trait forwrapping_add
, so crates can write some trait you import to get it. Not implementingAdd
is intentional, as it should panic for overflow like normal [ui]N, and it feels like that would generate fundamentally not-SIMD-like code. (It could also be implemented later if that statement turns out to be incorrect.) - Layout-compatible conversions (so I can use
.r .g .b .a
instead of[0] [1] [2] [3]
) are something that can be figured out later. I also think they should be discussed broader than just simd types, since I see no reason I shouldn't always be allowed to convert my&[T;7]
into&(T,T,T,T,T,T,T)
. (With generics I figure it's harder to allow&(A,B,C)
to convert to anything.) And if tuples are just structs with implied field names, maybe I can opt-in to the same stuff with something like#[repr(array)] struct RGBA { ... }
... - This doesn't decide one way or another on whether intrinsics should be stabilized, or what their names should be. But it does say what their types should be.
- I just realized I never provided anything to let you create such a thing. Maybe it should contain
pub T
? Describing broadcast asSimd([x; 4])
looks great, and even without alignment gives reasonable-seeminginsertelement
instructions in LLVM.