Querying the preferred SIMD width

Thanks to awesome work by the Portable SIMD project group, portable explicit SIMD programming is now available on nightly. This is an exciting opportunity to better handle all sorts of SIMD work that is not well performed by autovectorization for various reasons, including floating-point reductions, without tying oneself to particular hardware instruction set(s).

One thing that is currently lost with respect to autovectorization, however, is the ability to handle the hardware quirk of inefficient wide vector emulation (AVX-512 on low-end Intel chips, AVX-256 on older AMD chips...) by purposely using a narrower vector width at compile time.

For autovectorization, this is configured by the prefer-vector-width LLVM setting, which is not currently exposed by rustc (even though it could arguably be implemented as a non-binding optimizer hint that is a no-op on compiler backends that don't support it), but automatically set by LLVM to a sensible value for the target CPU.

For explicit vectorization, however, we have the reverse problem that it would be desirable to query the preferred vector width for a given SIMD element type in order to match autovectorization performance.

This could be implemented as a const fn preferred_simd_width<T>() -> Option<usize> API that returns None on compiler backends that do not provide this information (or where support for it is not implemented).

What do you think?

This reminds me of https://github.com/rust-lang/rfcs/pull/2545#issuecomment-661305546 -- it'd be a great place to have a numeric const where the actual value isn't semver-fixed.

We sorta already have that with UNICODE_VERSION in std::char - Rust. I guess this would be... significantly easier to accidentally depend on, though.

UNICODE_VERSION is constant for a given rustc version which is fine. preferred_simd_width() would have to have a different value depending on the -Ctarget-feature and -Ctarget-cpu arguments given to rustc. If two crates are compiled with different arguments then both can't be linked together if preferred_simd_width() exists as otherwise it would be possible to violate type safety due to a constant function evaluating to two different values at different times.

Isn't this something that should be checked at runtime? The CPU running the compiler is not necessarily the same as the CPU that will run the compiled program.

That's true, but there's -C target-cpu=_____ for that. I assumed this would use that, same as LLVM does.

(I'd also love a way to be able to write simd that's magically flexible to different sizes at runtime, but that's a way harder problem.)

I'm thinking about generic binaries that are expected to run on a variety of CPUs, possibly with different SIMD capabilities, so -C target-cpu doesn't help. Yes, this is a harder problem -- in the worst case you may need multiple versions of the code.

This is the topic of a separate proposal of mine that is complementary to this one: Dispatch to native vector width · Issue #33 · calebzulawski/multiversion · GitHub .

I think there is a place for preferred_simd_width in projects that cannot afford or do not need the costs of runtime CPU detection (compile time code duplication, incompatibility with no_std, detection and dispatch overhead...).

Even in projects with runtime dispatch, a compile-time preferred SIMD width can also act as a secondary source of information if runtime native SIMD width detection is unavailable for some reason.

This is indeed a problem. It is, however, a problem that we already have: binaries compiled for different target CPUs cannot be safely linked together because the ABI for passing SIMD types by value changes depending on available SIMD instruction sets.

That only applies to the C abi. The rust abi passes all vector types by reference precisely for this reason.

1 Like