I was playing with simd intrinsics today and I noticed the following behavior that was hard for me to debug.
If I have a function which uses x86 simd intrinsics (say, avx512bw) but isn't inside a callchain that declares the appropriate target_feature, those simd intrinsics are still emitted, but they're emitted as out-of-line calls. So the code still works if I'm running on an avx512bw machine, it's just very slow.
I think I understand why you're doing it this way (the idea is, ultimately the whole function gets called by something which does have this target_feature, then everything gets inlined), but even being a compiler hacker myself I was very confused and initially thought this was a bug in rust/llvm and I needed to force it to inline harder.
It would be nice if rust warned me that I'm screwing this up.
Ideally, I don't think you should have to declare a target feature on the whole function in order to use an intrinsic in one branch of it. It seems reasonable for a function to, for instance, detect a CPU feature and then have a loop over calls to an intrinsic available with that CPU feature, with intrinsic being inlined.
I don't know how feasible that is in the compiler. But it seems preferable to do that if possible.
Yeah, the current handling of target features is disappointingly "not Rusty" (i.e. instead of proper compile-time checks we have unsafe everywhere). Long time ago I wrote the following proposal, but unfortunately amount of development activity in this area is very small: