One thing I think is important to bring up: Both ARM SVE and the RISC-V V[ector] extension (old slides, old video, related work, new presentation a couple days ago with media not up yet) are on the very immediate horizon.
The key distinguishing factor is that they do not have fixed vector lengths.
In addition, there are three main categories of instructions that use SIMD registers:
- Iterated instructions
- Parallel add/sub/and/or/xor/etc
- These are actually quite poorly served by SIMD (see RISC-V slides for summary, or “related work” for detail). Reasons:
- Needing to handle the boundary cases
- Code bloat for handling the different register widths of every generation
- Requires source changes to handle new generations
- Code written for new generations not backwards compatible
- Strip-mining loop is boilerplate, ripe for zero-overhead abstraction
- For these instructions, we might be far better served by intrinsic (or library) functions that take slices of primitive types, and handle the strip-mining for the programmer.
- These, then, would also work for ARM SVE or RISC-V+V.
- Permutative/combinatorial instructions
- PSHUFB and friends; reductions
- This is a category that is very SIMD-friendly, and does not generalize well to arbitrary width.
- However, such instructions may see less use than “iterated instructions” outside of crypto/compression.
- EDIT: According to someone who was present and watched the new RISC-V talk:
- I asked krste whether permute-heavy code like crypto and codecs fits into the model at all, he said that they had permutes but there wasn’t time to discuss
I also asked about reductions, response was “recursive halving or something”
- Editor’s note: If you have permutes, then I think they can be used to recover any reduction order under recursive halving.
- Scalar instructions with large bit-width
- This covers cases like the AES or SHA acceleration instructions on x86.
- Heavily architecture specific, heavily purpose-specific.
- Likely quite worth waiting a while to stabilize.
I think talking about “SIMD intrinsics” as a single unitary thing is a huge mistake: these categories may very well merit being handled differently.