Packed_simd: `cfg(target_feature)` does not play well with `#[target_feature]`


The packed_simd crate ( does not play well with #[target_feature]. Given:

extern crate packed_simd;
use packed_simd::m32x8;

#[target_feature(enable = "sse4.2")]
unsafe fn foo(m: m32x8) -> bool { m.all() }

#[target_feature(enable = "avx2")]
unsafe fn bar(m: m32x8) -> bool { m.all() }

the functions foo and bar are potentially compiled with different target features. Assume that the binary is compiled for one of the x86_64-targets, then the binary is compiled with SSE2 globally enabled, while foo and bar extend SSE2 with SSE4.2 and AVX2.

The method all on masks, like all methods in packed_simd, is #[inline]. The crate packed_simd will be compiled with SSE2 enabled, but because foo and bar extend the feature set that packed_simd was compiled with, the functions defined in the packed_simd crate can be inlined into foo and bar, and for them LLVM will generate code using SSE4.2 and AVX2 in this case.

Now comes the problem. The packed_simd crate has many work arounds for LLVM bugs:

All these workarounds are currently implemented using cfg(target_feature), like this:

impl m32x8 {
    cfg_if! {
        if #[cfg(target_feature = "sse4.2")] {
            #[inline] fn all(self) -> bool { ... }
        } else if #[cfg(target_feature = "avx2")] {
            #[inline] fn all(self) -> bool { ... }
        } else {
            #[inline] fn all(self) -> bool { ... }

and that shows the problem. Because the packed_simd crate was compiled with the global feature set, and neither SSE4.2 nor AVX2 were enabled globally, the worst possible implementation for every work around will be selected independently of the features that the calling context, foo and bar, supports.

So that is the problem in practice. This is a reduced version of this problem:

// interaction within a single function
// (macros won't solve packed_simd's problem)
#[target_feature(enable = "avx2")]
unsafe fn foo() -> bool {
    // often returns false:
    if cfg!(target_feature = "avx2") { true } else { false }

// interaction across functions
#[target_feature(enable = "avx2")]
unsafe fn bar() -> bool { foo() }

How do we improve / fix this? I have no good answer.


I’m very curious if there’s an update to this. In experimenting, it seems that functions in packed_simd are getting inlined as hoped for. However, when I try to write my own code (work in progress) I seem to get the fallback implementation.

Did you find a workaround? Is the problem basically that using an llvm intrinsic (such as llvm.round.v8f32) preserves the conditional compilation by target, but doing direct simd instructions, as I’m trying to do with my SimdRounded, doesn’t?

Presumably this question is the subject for the documentation stub on inlining. I for one would be very interested!


See Using run-time feature detection in core for the most promising path forward, since we have this problem in libcore and libstd as well.

However, when I try to write my own code (work in progress) I seem to get the fallback implementation.

You are doing compile-time feature detection there, so as long as you compile with -C target-feature=+sse4.2 you should be getting the correct intrinsics.

If you want to see round and the other math intrinsics in packed_simd open an issue or send a PR, chances are that the LLVM intrinsics is not going to be good enough for you anyways (they are going to be scalarized by LLVM, but packed_simd supports vectorized libm libraries up to a certain degree that are much faster: ~8-10x faster).