Getting explicit SIMD on stable Rust

We seem to be converging on a consensus, so I’ve fleshed out my straw man proposal and added rationales. The proposal is divided into steps that reflect my sense of priority.

Step 1: Common opaque SIMD types

Define a set of opaque SIMD types in std::simd for the commonly supported vector types:

64-bit vectors:

  • i32x2, u32x2, f32x2,
  • i16x4, u16x4,
  • i8x8, u8x8.

128-bit vectors:

  • i64x2, u64x2, f64x2,
  • i32x4, u32x4, f32x4,
  • i16x8, u16x8,
  • i8x16, u8x16.

256-bit vectors:

  • i64x4, u64x4, f64x4,
  • i32x8, u32x8, f32x8,
  • i16x16, u16x16,
  • i8x32, u8x32.

These types are opaque except for the following traits:

  • Copy and Clone.
  • Default. The default vector is the all-zeros bit pattern.
  • From<[T; N]> and Into<[T; N]>, where T is the lane type and N is the number of lanes, so f32x4 implements From<[f32; 4]>, for example.

The SIMD types differ from a user-defined newtype like struct Foo([T;N]) in the following ways:

Alignment

The alignment of SIMD types can be specified by the target ABI, but not all SIMD vector sizes are supported for all targets.

  1. If the target ABI specifies the alignment of a SIMD type, use that. Otherwise,
  2. If a smaller SIMD type with the same lane type exists, use the alignment of the smaller type. Otherwise,
  3. Use the alignment of the lane type.

The alignment of SIMD types that are not supported on a target today are subject to change if that target adds support for the type. For example, if ARM decides to add support for 256-bit SIMD, the alignment of the 256-bit types may have to change on that platform.

FFI

Some ABIs specify alternative behavior of SIMD types in function call parameters and return types. The SIMD types here behave like C SIMD types when used in FFI calls.

If SIMD types that are not covered by the ABI are used in FFI function calls, they behave the same way as a user-defined struct Foo([T;N]) newtype would.

Rationale

The goal here is not to provide general, ergonomic language support for SIMD programming. The goal is to:

  1. Establish standard, well-known names for the SIMD types that are used in practice in order to prevent per-vendor nominal types.
  2. Provide a minimal basis for the implementation of vendor intrinsics.
  3. Provide a minimal basis for portable SIMD programming.
  4. Be forwards compatible with future language support for SIMD types.

The intention is that when full language support for SIMD types is added, these names can be replaced with type aliases (whatever that future syntax may be):

type f32x4 = Simd<f32, 4>;
...

We should make sure today that such a substitution won’t break code tomorrow.

In the spirit of minimalism, I removed even constructors from these types, so they have no methods outside the trait impls. Constructors can be provided externally via the Default and From<[T;N]> implementations.

Step 2: Vendor intrinsics

Provide a complete mapping of the intrinsics in vendor header files like <arm_neon.h>, but using the standard SIMD types. All of these are exposed as functions that are guarded by target feature detection.

The exposed names of the intrinsics should match the vendor names so they can be searched for easily.

Intel integers

The mapping of vendor types to Rust SIMD types is trivial except for the Intel integer vector types. They will be mapped as follows:

  • __m128i becomes i64x2, i32x4, i16x8, or i8x16.
  • __m256i becomes i64x4, i32x8, i16x16, or i8x32.

For most intrinsics, there is only one obvious choice which can be derived from Clang’s corresponding builtin signature. Some intrinsics will need to be provided in multiple per-type versions.

Rationale

Since we’re also adding portable SIMD arithmetic operations, many vendor intrinsics will be redundant. However, since there are thousands of vendor intrinsics, the relative size of the redundancy is very tiny. It is beneficial for somebody porting SIMD code written in C to be able to find everything in one place.

The mapping of the intel integer types is a compromise which:

  • Avoids the creation of a separate nominal vendor-specific SIMD type like x86::__m128i.
  • Provides a small improvement in type safety over Intel’s approach by forcing explicit casts when switching lane geometry.
  • Avoids the duplication of a large number of intrinsic names into signed/unsigned variants.
  • Avoids the mistakes we would inevitable make if we attempted to manually pick correct signed or unsigned types for these some 3000 intrinsics.

Step 3: Portable SIMD operations

Provide a basic set of SIMD operations that are available unconditionally on all target platforms.

For all SIMD types, implement:

  • BitAnd,
  • BitOr,
  • BitXor, and
  • Not.

For all integer SIMD types, add methods:

  • wrapping_neg(),
  • wrapping_add(),
  • wrapping_sub(), and
  • wrapping_mul().

For all floating point SIMD types, implement:

  • Neg,
  • Add,
  • Sub,
  • Mul, and
  • Div.

Add methods:

  • abs() and
  • sqrt().

Rationale

It would be possible, but very complicated, to implement portable SIMD operations in terms of the vendor intrinsics which are basically a 1-1 mapping of the instruction sets. There is a lot of strange holes in the complicated availability matrix, and picking the right instructions is equivalent to writing a code generator. Rust already has a code generator which encodes all of that information—LLVM.

I omitted wrapping_div on purpose because it is not supported by any current architecture.

Step 4: Bitcasts

Provide methods which make it easy to reinterpret the bits in the lane of a vector as a different type.

  • For floating point and unsigned integer SIMD types, add a method to_ibits() which produces a vector with the same lane geometry, but with signed integer lanes, so f32x4 -> i32x4, u8x16 -> i8x16, etc.
  • For floating point and signed integer SIMD types, add a method to_ubits() which produces a vector with the same lane geometry, but with unsigned integer lanes, so f32x4 -> u32x4, i8x16 -> u8x16, etc.
  • For integer SIMD types with 32-bit or 64-bit lanes, add a to_fbits() method which reinterprets the lanes as floating point. u32x4 -> f32x4, etc.
  • For all integer SIMD types T1, T2 of the same size, implement T1::From<T2>.

Rationale

Bitcasts are much more common in SIMD programming than when using regular scalar variables. They should be easy to use.

Our compromise in mapping the Intel intrinsics requires some amount of signed/unsigned flipping.

By providing bitcasts that don’t change the number of lanes, we are able to preserve some of the benefits of type checking, since it is more common to change lane types than to change lane geometry.

Lane geometry changes also happen enough that it makes sense to provide them with the From` trait.

9 Likes