Packed_simd: `cfg(target_feature)` does not play well with `#[target_feature]`

See Using run-time feature detection in core for the most promising path forward, since we have this problem in libcore and libstd as well.

However, when I try to write my own code (work in progress) I seem to get the fallback implementation.

You are doing compile-time feature detection there, so as long as you compile with -C target-feature=+sse4.2 you should be getting the correct intrinsics.

If you want to see round and the other math intrinsics in packed_simd open an issue or send a PR, chances are that the LLVM intrinsics is not going to be good enough for you anyways (they are going to be scalarized by LLVM, but packed_simd supports vectorized libm libraries up to a certain degree that are much faster: ~8-10x faster).