This is one of the things I wish Rust had, because it would make code possibly more performant:
FMV (as in GCC 6) is a technique to generate vectorized code for newer targets as well as fallback versions for machines that don’t support a certain instruction set. This is already possible using the compiler today: If I compile something with --target_arch=+avx2, I get very good assembly. If I compile the same code with --target_arch=+sse2 or with the default flags (no arch specified), I get worse assembly. That’s simply because LLVM can see that it has guaranteed access to these instructions, so it uses them.
The idea of FMV is to compile the same function multiple times (for whatever architectures you wish) - ex. once for no SIMD, once for SSE2+, once for AVX2+. At the callsites of these functions, a small runtime check is added (probably via cpuid) at the callsites (where you call the function) or a global variable is set when the program starts - this variable holds, which instruction set is valid. If the CPU supports AVX2, the AVX2 code branch is chosen. Otherwise, there are fallbacks to the no-SIMD versions.
Obviously, duplicating functions lead to code bloat, so it should be used sparingly. I know that Rusts SIMD story isn’t stabilized yet, but I wanted to ask if something like this would be possible for the Rust compiler and if anyone has thought about it.
Something like this:
#[fmv(auto("+sse2", "+avx2"))]
fn auto_vectorized() {
/* very performance critical code*/
}
#[fmv(manual("+sse2"))]
fn manually_vectorized() { }
This can’t be done using libraries, this would be built into the compiler. I think that LLVM has limited support for FMV, but I don’t know for sure.
Another thing I wanted to ask: Would it be possible to auto-vectorize iterators? For example, you could iterate
/ loop through elements in blocks of 4 or 8 with SIMD, instead of 1 (default iterator). Not sure if this is already done.