How to make core::arch simd intrinsics safe:

(tags: intrinsics, safety, compile time error)

i've recently used rust's core::simd and core::arch modules for the first time. and i ran into quite a few walls.

one of them being the use of intrinsics that your "target" doesn't support: instead of getting compile time errors, as you might expect, you get undefined behavior.

as far as i'm aware, this is a solved problem in clang. so i'll describe the problem in rust and then propose clang's solution.

here is an example of an SSE3 intrinsic from core::arch::x86:

#[inline]
#[target_feature(enable = "sse3")]
pub unsafe fn _mm_hadd_ps(a: __m128, b: __m128) -> __m128 {
    haddps(a, b)
}
  • as you can see, it is #[inline], not #[inline(always)], as you might expect. that's because #[inline(always)] and #[target_feature] are incompatible. the combination would implicitly make the caller #[target_feature] too, which is unsound.
  • if you do happen to call this intrinsic from a non-sse3 fn, llvm generates a call to the above wrapper function. that's because the caller isn't sse3, so llvm isn't allowed to inline the wrapper.
  • both of these mean that your code will work, if you run it on a compatible machine, but the performance will be terrible. that's the beauty of undefined behavior.

i found all of those things to be very unintuitive. (and the docs didn't sufficiently warn me about this behavior: the "safety" warning is in a section labeled "Overview")

so here's clang's solution: simply allow the #[inline(always)] + #[target_feature] combination, but require the caller to also have all target features enabled. and then make all intrinsics #[inline(always)] (and non-unsafe where applicable).

clang generates errors like this one, when using an incompatible intrinsic: "error: always_inline function '_mm256_add_ps' requires target feature 'avx', but would be inlined into function 'foo' that is compiled without support for 'avx'"

"require the caller to also have all target features enabled" can be achieved in 3 ways:

  • the use of #[target_feature] in the user code, making the caller unsafe. but you're more likely to not forget to compile with the correct settings, as you've had to make your fn #[target_feature]. -- this effectively pushes the "unsafe to call" back a level into user code, which means dynamic feature detection is still possible. -- target_feature is only unsafe when not combined with inline-always (or the fn body is unsafe of course).
  • the use of #[cfg] for conditional compilation. this makes most intrinsics safe to use!
  • crate level target features. effectively equivalent to putting #[cfg] on everything.

doing it the clang way has several benefits, which massively improve intrinsics UX:

  • using intrinsics, which are not statically known to be supported, becomes a compile time error. (or requires the user to explicitly label their fns as #[target_feature], making them aware that their code is unsafe to call on non-supported targets.)
  • the safe_arch crate is no longer necessary for safe intrinsics code. (and in fact the clang solution is more general, because you can still use #[target_feature] for dynamic dispatch, which you can't do with safe_arch, and have the compiler check that you in fact used #[target_feature]).
  • performance becomes more reliable, because code, that happens to work on your machine (the UB thing), but actually calls the intrinsics wrapper functions (with significant overhead), no longer compiles.
5 Likes

here is a code example that demonstrates compile time errors, static & dynamic dispatch:

in the core::arch::x86 module:

// this is now safe, because inline(always) and target_feature ensure the caller has avx.
#[inline(always)]
#[target_feature(enable = "avx")]
pub fn _mm256_sqrt_ps(a: __m256) -> __m256 {
    unsafe { sqrtps256(a) }
}

// also inline(always), but still unsafe, because it dereferences a raw pointer.
#[inline(always)]
#[target_feature(enable = "avx")]
pub unsafe fn _mm256_load_ps(mem_addr: *const f32) -> __m256 {
      *(mem_addr as *const __m256)
}

// this are just a helpers for the code down below. safe because types are compatible, no pointers.
#[inline(always)]
#[target_feature(enable = "avx")]
pub fn _mm256_cvtarr_ps(arr: [f32; 8]) -> __m256 { unsafe { core::mem::transmute(arr) } }
#[inline(always)]
#[target_feature(enable = "avx")]
pub fn _mm256_cvtps_arr(ps: __m256) -> [f32; 8] { unsafe { core::mem::transmute(arr) } }

static dispatch:

// this is safe, because it only calls safe intrinsics and has the "avx" target feature.
#[cfg(target_feature = "avx")]
fn sqrt(arr: [f32; 8]) -> [f32; 8] {
    let ps = _mm256_cvtarr_ps(arr);
    let sqrt = _mm256_sqrt_ps(ps);
    _mm256_cvtps_arr(sqrt)
}

// this is just the scalar fallback.
#[cfg(not(target_feature = "avx"))]
fn sqrt(arr: [f32; 8]) -> [f32; 8] {
    // a scalar or sse2 implementation.
}

// this doesn't compile, because it uses intrinsics,
// which are not supported, but would have to be inlined.
// aka: the clang error.
fn sqrt_err(arr: [f32; 8]) -> [f32; 8] {
    let ps = _mm256_cvtarr_ps(arr);
    let sqrt = _mm256_sqrt_ps(ps);
    _mm256_cvtps_arr(sqrt)
}

dynamic dispatch:

// unsafe because it uses target-feature, but isn't inline(always) -> unsafe to call.
// (note that the implementation isn't actually unsafe,
//  so the unsafe on the fn is actually overly restrictive. 
//  there is a difference between "unsafe implementation" and "unsafe to call",
//  which rust can't express -- without wrapper functions.
//  it could be solved by a `safe` block, which you can put into an unsafe block,
//  to make things safe again.)
#[target_feature(enable = "avx")]
unsafe fn sqrt_avx(arr: [f32; 8]) -> [f32; 8] { // safe {
    let ps = _mm256_cvtarr_ps(arr);
    let sqrt = _mm256_sqrt_ps(ps);
    _mm256_cvtps_arr(sqrt)
} // }

fn sqrt_scalar(arr: [f32; 8]) -> [f32; 8] {
    // a scalar or sse2 implementation.
}

fn sqrt(arr: [f32; 8]) -> [f32; 8] {
    if target_supports_avx {
        unsafe { sqrt_avx(arr) }
    }
    else {
        sqrt_scalar(arr)
    }
}

(that safe block idea would of course be useful independently, to improve the granularity of the unsafe feature.)

Have you seen target_feature 1.1 by gnzlbg · Pull Request #2396 · rust-lang/rfcs · GitHub?

no, i didn't really know how to search for this. that approach seems interesting too!

i'm not yet sure what the consequences of the differences are.

well, it looks like that got accepted, so yay!

let's see how long it takes until that becomes stable :^)

That RFC got accepted a long time ago and AIUI, it doesn't currently have anyone championing it. So there is no path to stability right now.

But yes, the way things work right now is not ideal and was done to get a working system in place and was also the fastest path to avoid the compiler yielding errors at monophorization time. (I'm not a compiler expert, but I understand it is something to avoid.)