Querying the preferred SIMD width

HadrienG · December 21, 2021, 8:21am

Thanks to awesome work by the Portable SIMD project group, portable explicit SIMD programming is now available on nightly. This is an exciting opportunity to better handle all sorts of SIMD work that is not well performed by autovectorization for various reasons, including floating-point reductions, without tying oneself to particular hardware instruction set(s).

One thing that is currently lost with respect to autovectorization, however, is the ability to handle the hardware quirk of inefficient wide vector emulation (AVX-512 on low-end Intel chips, AVX-256 on older AMD chips...) by purposely using a narrower vector width at compile time.

For autovectorization, this is configured by the prefer-vector-width LLVM setting, which is not currently exposed by rustc (even though it could arguably be implemented as a non-binding optimizer hint that is a no-op on compiler backends that don't support it), but automatically set by LLVM to a sensible value for the target CPU.

For explicit vectorization, however, we have the reverse problem that it would be desirable to query the preferred vector width for a given SIMD element type in order to match autovectorization performance.

This could be implemented as a const fn preferred_simd_width<T>() -> Option<usize> API that returns None on compiler backends that do not provide this information (or where support for it is not implemented).

What do you think?

scottmcm · December 21, 2021, 6:48pm

This reminds me of https://github.com/rust-lang/rfcs/pull/2545#issuecomment-661305546 -- it'd be a great place to have a numeric const where the actual value isn't semver-fixed.

tcsc · December 22, 2021, 1:00am

We sorta already have that with UNICODE_VERSION in std::char - Rust. I guess this would be... significantly easier to accidentally depend on, though.

bjorn3 · December 22, 2021, 11:31am

UNICODE_VERSION is constant for a given rustc version which is fine. preferred_simd_width() would have to have a different value depending on the -Ctarget-feature and -Ctarget-cpu arguments given to rustc. If two crates are compiled with different arguments then both can't be linked together if preferred_simd_width() exists as otherwise it would be possible to violate type safety due to a constant function evaluating to two different values at different times.

zackw · December 22, 2021, 1:53pm

Isn't this something that should be checked at runtime? The CPU running the compiler is not necessarily the same as the CPU that will run the compiled program.

scottmcm · December 22, 2021, 7:42pm

That's true, but there's -C target-cpu=_____ for that. I assumed this would use that, same as LLVM does.

(I'd also love a way to be able to write simd that's magically flexible to different sizes at runtime, but that's a way harder problem.)

zackw · December 22, 2021, 7:58pm

I'm thinking about generic binaries that are expected to run on a variety of CPUs, possibly with different SIMD capabilities, so -C target-cpu doesn't help. Yes, this is a harder problem -- in the worst case you may need multiple versions of the code.

HadrienG · December 23, 2021, 10:11am

This is the topic of a separate proposal of mine that is complementary to this one: Dispatch to native vector width · Issue #33 · calebzulawski/multiversion · GitHub .

I think there is a place for preferred_simd_width in projects that cannot afford or do not need the costs of runtime CPU detection (compile time code duplication, incompatibility with no_std, detection and dispatch overhead...).

Even in projects with runtime dispatch, a compile-time preferred SIMD width can also act as a secondary source of information if runtime native SIMD width detection is unavailable for some reason.

HadrienG · December 23, 2021, 10:16am

This is indeed a problem. It is, however, a problem that we already have: binaries compiled for different target CPUs cannot be safely linked together because the ABI for passing SIMD types by value changes depending on available SIMD instruction sets.

bjorn3 · December 23, 2021, 2:16pm

That only applies to the C abi. The rust abi passes all vector types by reference precisely for this reason.

github.com

rust-lang/rust/blob/c1d301bb29fd1b9f6f42de11d52015ae464cc8b2/compiler/rustc_middle/src/ty/layout.rs#L3187-L3212


      
          // This is a fun case! The gist of what this is doing is
          // that we want callers and callees to always agree on the
          // ABI of how they pass SIMD arguments. If we were to *not*
          // make these arguments indirect then they'd be immediates
          // in LLVM, which means that they'd used whatever the
          // appropriate ABI is for the callee and the caller. That
          // means, for example, if the caller doesn't have AVX
          // enabled but the callee does, then passing an AVX argument
          // across this boundary would cause corrupt data to show up.
          //
          // This problem is fixed by unconditionally passing SIMD
          // arguments through memory between callers and callees
          // which should get them all to agree on ABI regardless of
          // target feature sets. Some more information about this
          // issue can be found in #44367.
          //
          // Note that the platform intrinsic ABI is exempt here as
          // that's how we connect up to LLVM and it's unstable
          // anyway, we control all calls to it in libstd.
          Abi::Vector { .. }

This file has been truncated. show original

system · March 23, 2022, 2:16pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting explicit SIMD on stable Rust	336	43983	March 25, 2019
LLVM Scalable Vectors	1	1265	July 15, 2021
[Pre-RFC] Meta-target feature for 128-bit SIMD language design	19	1435	March 25, 2019
Function Multi-Versioning for Rust? compiler	17	3496	March 25, 2019
"SIMD" without the need of specialized types language design	3	988	May 18, 2024

Querying the preferred SIMD width

Related topics