The packed_simd
crate (https://github.com/rust-lang-nursery/packed_simd) does not play well with #[target_feature]
. Given:
extern crate packed_simd;
use packed_simd::m32x8;
#[target_feature(enable = "sse4.2")]
unsafe fn foo(m: m32x8) -> bool { m.all() }
#[target_feature(enable = "avx2")]
unsafe fn bar(m: m32x8) -> bool { m.all() }
the functions foo
and bar
are potentially compiled with different target features. Assume that the binary is compiled for one of the x86_64
-targets, then the binary is compiled with SSE2 globally enabled, while foo
and bar
extend SSE2 with SSE4.2 and AVX2.
The method all
on masks, like all methods in packed_simd
, is #[inline]
. The crate packed_simd
will be compiled with SSE2 enabled, but because foo
and bar
extend the feature set that packed_simd
was compiled with, the functions defined in the packed_simd
crate can be inlined into foo
and bar
, and for them LLVM will generate code using SSE4.2 and AVX2 in this case.
Now comes the problem. The packed_simd
crate has many work arounds for LLVM bugs: https://github.com/rust-lang-nursery/packed_simd/tree/master/src/codegen
All these workarounds are currently implemented using cfg(target_feature)
, like this:
impl m32x8 {
cfg_if! {
if #[cfg(target_feature = "sse4.2")] {
#[inline] fn all(self) -> bool { ... }
} else if #[cfg(target_feature = "avx2")] {
#[inline] fn all(self) -> bool { ... }
} else {
#[inline] fn all(self) -> bool { ... }
}
}
}
and that shows the problem. Because the packed_simd
crate was compiled with the global feature set, and neither SSE4.2 nor AVX2 were enabled globally, the worst possible implementation for every work around will be selected independently of the features that the calling context, foo
and bar
, supports.
So that is the problem in practice. This is a reduced version of this problem:
// interaction within a single function
// (macros won't solve packed_simd's problem)
#[target_feature(enable = "avx2")]
unsafe fn foo() -> bool {
// often returns false:
if cfg!(target_feature = "avx2") { true } else { false }
}
// interaction across functions
#[target_feature(enable = "avx2")]
unsafe fn bar() -> bool { foo() }
How do we improve / fix this? I have no good answer.