I think it would help me best if we used concrete examples. Itâs really hard for me to understand otherwise.
For example, in Clangâs emmintrin.h
file, consider its definition of _mm_add_epi64
:
static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_add_epi64(__m128i __a, __m128i __b)
{
return (__m128i)((__v2du)__a + (__v2du)__b);
}
In Rust, Iâm assuming weâd use simd_add
here? (I guess in an ideal world, weâd impl Add
on vector types, but that impl would use simd_add
�)
The other bit I find interesting here is the use of cast to the __v2du
type. There are others:
/* from mmintrin.h */
typedef long long __m64 __attribute__((__vector_size__(8)));
typedef long long __v1di __attribute__((__vector_size__(8)));
typedef int __v2si __attribute__((__vector_size__(8)));
typedef short __v4hi __attribute__((__vector_size__(8)));
typedef char __v8qi __attribute__((__vector_size__(8)));
/* from xmmintrin.h */
typedef int __v4si __attribute__((__vector_size__(16)));
typedef float __v4sf __attribute__((__vector_size__(16)));
typedef float __m128 __attribute__((__vector_size__(16)));
/* Unsigned types */
typedef unsigned int __v4su __attribute__((__vector_size__(16)));
/* from emmintrin.h */
typedef double __m128d __attribute__((__vector_size__(16)));
typedef long long __m128i __attribute__((__vector_size__(16)));
/* Type defines. */
typedef double __v2df __attribute__ ((__vector_size__ (16)));
typedef long long __v2di __attribute__ ((__vector_size__ (16)));
typedef short __v8hi __attribute__((__vector_size__(16)));
typedef char __v16qi __attribute__((__vector_size__(16)));
/* Unsigned types */
typedef unsigned long long __v2du __attribute__ ((__vector_size__ (16)));
typedef unsigned short __v8hu __attribute__((__vector_size__(16)));
typedef unsigned char __v16qu __attribute__((__vector_size__(16)));
/* We need an explicitly signed variant for char. Note that this shouldn't
* appear in the interface though. */
typedef signed char __v16qs __attribute__((__vector_size__(16)));
I guess some of these types indicate the lane size to LLVM? Can we try to convert these to Rust types to make sure weâre all on the same page? Iâll take the first crack.
#[repr(simd)]
struct __m128(f32, f32, f32, f32);
#[repr(simd)]
struct __m128d(f64, f64);
#[repr(simd)]
struct __m128i(i64, i64);
#[repr(simd)]
struct __v2df(f64, f64);
#[repr(simd)]
struct __v2di(i16, i16);
#[repr(simd)]
struct __v8hi(i16, i16, i16, i16, i16, i16, i16, i16);
#[repr(simd)]
struct __v16qi(i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8);
#[repr(simd)]
struct __v2du(u64, u64);
#[repr(simd)]
struct __v8hu(u16, u16, u16, u16, u16, u16, u16, u16);
#[repr(simd)]
struct __v16qu(u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8, u8);
#[repr(simd)]
struct __v16qs(i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8);
If I have this right, then I have some questions. To start:
- Why is there both a
__m128d
and a __v2df
? They appear identical?
- Similarly, why is there both a
__v16qi
and a __v16qs
?
- The implementation of
_mm_add_epi64
in Clang asks for two __m128i
values (which Iâve surmised to be signed), but internally, it casts them each to __v2du
(which Iâve surmised to be unsigned). What is going on here? Is the interface this way simply because Intel didnât spec a __m128u
type? How does one know to cast to unsigned or not? (I guess you need to look at the intrinsic name, e.g., _mm_adds_epi16
.)
- Presumably, weâd export
__m128
, __m128d
and __m128i
, but not __v2di
, __v8hu
, etc.?
- Other things Iâm missing?