Thanks to awesome work by the Portable SIMD project group, portable explicit SIMD programming is now available on nightly. This is an exciting opportunity to better handle all sorts of SIMD work that is not well performed by autovectorization for various reasons, including floating-point reductions, without tying oneself to particular hardware instruction set(s).
One thing that is currently lost with respect to autovectorization, however, is the ability to handle the hardware quirk of inefficient wide vector emulation (AVX-512 on low-end Intel chips, AVX-256 on older AMD chips...) by purposely using a narrower vector width at compile time.
For autovectorization, this is configured by the prefer-vector-width LLVM setting, which is not currently exposed by rustc (even though it could arguably be implemented as a non-binding optimizer hint that is a no-op on compiler backends that don't support it), but automatically set by LLVM to a sensible value for the target CPU.
For explicit vectorization, however, we have the reverse problem that it would be desirable to query the preferred vector width for a given SIMD element type in order to match autovectorization performance.
This could be implemented as a const fn preferred_simd_width<T>() -> Option<usize> API that returns None on compiler backends that do not provide this information (or where support for it is not implemented).
UNICODE_VERSION is constant for a given rustc version which is fine. preferred_simd_width() would have to have a different value depending on the -Ctarget-feature and -Ctarget-cpu arguments given to rustc. If two crates are compiled with different arguments then both can't be linked together if preferred_simd_width() exists as otherwise it would be possible to violate type safety due to a constant function evaluating to two different values at different times.
I'm thinking about generic binaries that are expected to run on a variety of CPUs, possibly with different SIMD capabilities, so -C target-cpu doesn't help. Yes, this is a harder problem -- in the worst case you may need multiple versions of the code.
I think there is a place for preferred_simd_width in projects that cannot afford or do not need the costs of runtime CPU detection (compile time code duplication, incompatibility with no_std, detection and dispatch overhead...).
Even in projects with runtime dispatch, a compile-time preferred SIMD width can also act as a secondary source of information if runtime native SIMD width detection is unavailable for some reason.
This is indeed a problem. It is, however, a problem that we already have: binaries compiled for different target CPUs cannot be safely linked together because the ABI for passing SIMD types by value changes depending on available SIMD instruction sets.