(tags: intrinsics, safety, compile time error)
i've recently used rust's core::simd
and core::arch
modules for the first time. and i ran into quite a few walls.
one of them being the use of intrinsics that your "target" doesn't support: instead of getting compile time errors, as you might expect, you get undefined behavior.
as far as i'm aware, this is a solved problem in clang. so i'll describe the problem in rust and then propose clang's solution.
here is an example of an SSE3 intrinsic from core::arch::x86
:
#[inline]
#[target_feature(enable = "sse3")]
pub unsafe fn _mm_hadd_ps(a: __m128, b: __m128) -> __m128 {
haddps(a, b)
}
- as you can see, it is
#[inline]
, not#[inline(always)]
, as you might expect. that's because#[inline(always)]
and#[target_feature]
are incompatible. the combination would implicitly make the caller#[target_feature]
too, which is unsound. - if you do happen to call this intrinsic from a non-sse3 fn, llvm generates a call to the above wrapper function. that's because the caller isn't sse3, so llvm isn't allowed to inline the wrapper.
- both of these mean that your code will work, if you run it on a compatible machine, but the performance will be terrible. that's the beauty of undefined behavior.
i found all of those things to be very unintuitive. (and the docs didn't sufficiently warn me about this behavior: the "safety" warning is in a section labeled "Overview")
so here's clang's solution: simply allow the #[inline(always)]
+ #[target_feature]
combination, but require the caller to also have all target features enabled. and then make all intrinsics #[inline(always)]
(and non-unsafe where applicable).
clang generates errors like this one, when using an incompatible intrinsic: "error: always_inline function '_mm256_add_ps' requires target feature 'avx', but would be inlined into function 'foo' that is compiled without support for 'avx'"
"require the caller to also have all target features enabled" can be achieved in 3 ways:
- the use of
#[target_feature]
in the user code, making the caller unsafe. but you're more likely to not forget to compile with the correct settings, as you've had to make your fn#[target_feature]
. -- this effectively pushes the "unsafe to call" back a level into user code, which means dynamic feature detection is still possible. -- target_feature is only unsafe when not combined with inline-always (or the fn body is unsafe of course). - the use of
#[cfg]
for conditional compilation. this makes most intrinsics safe to use! - crate level target features. effectively equivalent to putting
#[cfg]
on everything.
doing it the clang way has several benefits, which massively improve intrinsics UX:
- using intrinsics, which are not statically known to be supported, becomes a compile time error. (or requires the user to explicitly label their fns as
#[target_feature]
, making them aware that their code is unsafe to call on non-supported targets.) - the
safe_arch
crate is no longer necessary for safe intrinsics code. (and in fact the clang solution is more general, because you can still use#[target_feature]
for dynamic dispatch, which you can't do withsafe_arch
, and have the compiler check that you in fact used#[target_feature]
). - performance becomes more reliable, because code, that happens to work on your machine (the UB thing), but actually calls the intrinsics wrapper functions (with significant overhead), no longer compiles.