Pre-RFC: stabilization of target_feature

alexcrichton · June 20, 2017, 9:57pm

@gnzlbg oh thanks for clearing up! Sorry if I was getting a little forceful as well, I forgot that we hadn’t really spelled this out much yet!

I figure though it may be worth writing down more of this issue in detail. A lot of this boils down to the way LLVM generates functions and works with arguments. Let’s say you’ve got a function like so:

pub fn foo(arg: u64x4) {
    // ...
}

Here we just have a simple function which is taking a 256-bit vector argument. This function importantly works on all platforms, even those that don’t have support for 256-bit vectors. Essentially LLVM has an emulation layer where it’ll recreate operations with other primitives, depending on what’s available.

So for now I’ll focus on x86_64 for ease, and we can be in one of three situations:

We’ve got AVX2 instructions and access to 256-bit vectors natively
We’ve only got SSE2 which gives us 128-bit vector
We have nothing, no SIMD registers

In each of these three situations LLVM will codegen the above function differently. It’s important to note that LLVM’s IR representation is identical regardless of enabled target features. What’s happening here is that LLVM is taking a function and then creating machine code based on the enabled target features for that function. Each function can then have a different set of target features enabled.

So for example for each of the three situations we have:

If we have AVX2, arg is passed to the function in %ymm0
If we only have SSE2, arg is passed in %xmm0 and %xmm1, two 128-bit registers
If we don’t have anything, arg is passed in four registers %rdi, %rsi, %rdx, %rcx (I think these are the first four parameter registers on x86_64)

So given that LLVM will codegen all of these differently, there’s a very big problem if we mismatch them! As far as I know LLVM will not catch this problem for us. I think it’s taking a “naive” view of the world and simply generating code for each function isolation, avoiding cross-function mismatches.

This means, for example, that this cde has a mismatched ABI:

#[target_feature = "+avx2"]
fn bar() {
    foo(u64x4::splat(0))
}

#[target_feature = "-sse2"]
fn foo(arg: u64x4) {
}

In this situation bar will call foo by passing a parameter in the %ymm0 register, but foo will expect the first four argument registers to have the arg instead. While this isn’t memory unsafe yet it’s not hard to imagine it becoming memory unsafe quickly! Also note that inlining only helps to some degree, we’ll always have (due to separate compilation and the #[target-feature] attribute) this scenario in one way or another.

So this all sounds clearly bad, what can we do about it? My thinking is that rustc is the one generating all these call instructions. Namely we can know:

What types differ in ABI-passing depending on CPU features (e.g. u64x4 is a “simd type”)
What cpu features are enabled for the caller, which in this case is bar
What cpu features are enabled for the callee, which in this case is foo

Given all this information, the compiler can detect that the the function foo called by bar uses a type u64x4 which is changing ABI, and hence will not work. The compiler can take one of two options here. First it could emit an error saying that this is invalid. Next it could also do something like insert a shim like so:

#[target_feature = "+avx2"]
fn bar() {
    foo_shim(&u64x4::splat(0))
}

#[target_feature = "-sse2"]
fn foo_shim(arg: &u64x4) {
    foo(*arg)
}

#[target_feature = "-sse2"]
fn foo(arg: u64x4) {
}

Notably the shim, foo_shim, has the same ABI as the callee, in this case foo. Additionally all arguments to not rely on SIMD registers, for example in this case the argument is passed by reference (it’ll be stored on the stack).

So while this works it’s obviously a performance hit, and probably not intended at all for SIMD usage. I’d personally be in favor of starting off with a hard error here and then taking the shim route if we really need it. This strategy has a few consequences, however:

Errors show up during monomorphization. This is very rare for Rust where we avoid this as much as possible. This means that the crate-at-fault could be way upstream in your dependency graph and you have little-to-no recourse to fix it yourself. The saving grace here, however, is that SIMD types as arguments tend to be very local to a crate and don’t propagate much. Functions that do actually return or take SIMD types are often #[inline] which means the compiler has more leeway as well. Again though I also think the errors here can be a stopgap to a solution if we need it. You’re almost for sure doing something wrong if you hit this error, and likely want to be notified of it anyway.
I think this means that all SIMD types are basically banned from FFI. We won’t know how the callee is genreated, so we don’t know what ABI it’s using to expect the SIMD argument (or returning it). This is sort of like how many types just aren’t FFI safe today though, so there may not be much to worry about here.

The overall tl;dr; though is that this issue (a) originates in LLVM translation, (b) should be detectable by rustc, and © we can statically rule out this ever happening. So faults or problems with “abi passing problems” with functions that don’t match on #[target_feature] I believe is eliminated. This only leaves us with the “is it safe to execute an unknown instruction” question, which I detailed more above.

Let me know though if any of that doesn’t make sense!

Topic		Replies	Views
What's the next step towards the stabilization of SIMD? language design	16	3806	March 25, 2019
[Pre-Pre-RFC] Target restriction contexts language design	8	2934	March 25, 2019
`cfg(target_feature = "fast-bmi2")`? `cfg(target_cpu = "...")`? language design	5	1463	March 14, 2022
Getting explicit SIMD on stable Rust	336	44118	March 25, 2019
Comprehensive list of desired `cfg_target_feature`s	3	1682	March 25, 2019

Pre-RFC: stabilization of target_feature

Related topics