Getting explicit SIMD on stable Rust

@comex One difference between intrinsics and inline assembly is that while the optimizer understands intrinsics and can optimize them, it does currently not optimize inline assembly (it just uses the one provided). Inline assembly is, as a consequence, a worse solutiont than using the backend intrinsics.

@burntsushi Would it be possible to allow mixing crates that require nightly internally (but not on their API or some other constraints) with crates that require stable? That could be an alternative.

@stoklund I still think we need to provide a way to call into the intrinsics directly, not just an abstraction.

I think that whether std has something like std::arch::x86/x86_64/AArch64/... with intrinsics available in the different architectures is orthogonal to a std::simd module that provides a cross-platform abstraction over simd intrinsics. Both things are worth pursuing.

Note that SIMD are not the only types of intrinsics available. Another example of intrinsics would be the bitwise manipulation instruction sets (BMI1, BMI2, ABM, and TBM) or the AES encryption intrinsics, which are also architecture dependent, and also orthogonal to a cross-platform library for bitwise manipulation or for doing encryption. All of these are worth pursuing.

Still, there has been too little experimentation in Rust with building cross-platform libraries for exposing subsets of platform intrinsics. I think that this could be improved if these libraries could be written and used in stable rust. The features that would enable this are mainly:

  • exposing architecture dependent intrinsics in rust stable, probably through std::arch::_ or similar,
  • a stable way to branch on the architecture (scenarios, target_feature, …),
  • [repr(simd)] or similar.

Once we are there we might not need to stabilize the simd crate anymore, and if somebody wants to implement a different way of doing simd, they will be able to write their own crate for that which uses intrinsics directly.

To support run-time architecture dependent code we would need a lot of compiler support. The binary has to include code for a set of architectures, detect the type of architecture on initialization, and be able to modify itself at run-time. AFAIK OpenMP pragma simd does this but is not trivial.

2 Likes