Last week, the libs team discussed SIMD stabilization. I’d like to write up some of the problems we discussed and some possible avenues to move forward on getting explicit use of SIMD on stable Rust. (Explicit use of SIMD means that the programmer takes some explicit action to vectorize their code, as opposed to relying on the compiler to vectorize it for them.)
Disclaimer: I personally have very little experience with SIMD, and my compiler backend knowledge is relatively limited. I have no doubt committed serious errors and omissions. I welcome fixes. I’m hoping the compiler team can chime in!
Prior work on this topic:
In the current state of the world, the only way to use explicit SIMD instructions—whether they are intrinsics exposed by LLVM directly or a convenient abstraction as defined in the simd
crate—requires unstable Rust. There are a number of features required:
-
cfg_target_feature
- AFAIK, this feature permits instructing the compiler to actually emit SIMD instructions. For example, much of thesimd
crate usestarget_feature
for conditional compilation. -
repr_simd
- This is used to annotate structs such that they can be used as parameters to SIMD intrinsics. There are some limitations on whererepr(simd)
can be used (for example, they can’t be used with generics?), but I don’t know the details here. -
platform_intrinsics
- This makes various LLVM intrinsics available for use with an explicitextern
block.
If the above features were stabilized, then for example, the simd
crate could be made to work on stable Rust. With that said, the path to stabilizing them isn’t clear. There are numerous problems. I’ll try to outline them below:
-C target-feature=foo
is hard to use
In today’s Rust, actually using target-feature
is pretty inconvenient. I’ve at least been telling people to use RUSTFLAGS
. For example, to compile ripgrep
with SIMD support, one needs to do this:
RUSTFLAGS="-C target-feature=+ssse3" cargo build --release --features simd-accel
One could also use target-cpu=native
, but the advantage of the above command is that binaries can be distributed to most x86_64 platforms (but not all).
It is possible that this specific thing might be able to get worked out with scenarios, but most folks probably will want to eschew this anyway in favor of runtime detection. Which brings us to the next concern.
How does runtime detection using cpuid work?
The libs team didn’t quite seem to know how this would work. Here’s an example problem that I think should be solvable to help motivate this:
- I’d like to compile a single binary that works on all Linux x86_64 platforms.
- I’d like for that binary to make use of SIMD instructions such as those introduced in SSE 4.2 only if they are available. If they aren’t available, then the program should be capable of using a fallback implementation that doesn’t use SSE 4.2 instructions.
A key thing to note here is that the current system is subtly insufficient. In particular, while said binary might be capable of using SSE 4.2 instructions in places, the compiler probably shouldn’t be using any SSE 4.2 instructions for autovectorization optimizations, since that could preclude running on a platform without SSE 4.2! (N.B. I’m using SSE 4.2 just as an example here.)
Intrinsics
It’s my understanding that there are thousands of intrinsics, and all of them follow specific LLVM naming conventions. (What else is LLVM specific?) Stabilizing these directly seems potentially ill-advised for a couple important reasons:
- The API surface area is huge and platform dependent. If LLVM decided to change or remove one of these intrinsics, we would be beholden to them on the next LLVM upgrade, and thereby possibly sacrificing our stability story.
- If, one day, someone wanted to write a Rust compiler that didn’t use LLVM, would it be feasible for that compiler to provide exactly the same set of intrinsics as LLVM? (Probably not.)
An alternative to stabilizing intrinsics directly
One thing that has been tossed around is the ability to stabilize an abstraction around SIMD instructions without exposing the intrinsics directly. For example, we could, in theory, move the existing simd
crate into std
and stabilize that without tackling the problems with stabilizing intrinsics directly.
My personal take on this is that we really need to provide a way to use intrinsics on stable Rust. There are so many of them that it would be a herculean task to build an abstraction around all of them that met everyone’s use cases. Moreover, my understanding of the current feel of things is that the simd
crate’s abstraction is controversial pending potential future language changes (like integer generics?).
The libs team discussed this particular issue, and one possibility came up of building a special libstd-llvm
crate that shipped with stable Rust and provided access to LLVM’s intrinsics with the caveat that it exists outside of Rust’s stability story. How do folks feel about that?