Getting explicit SIMD on stable Rust

All right folks, strap yourselves in, because I think it’s going to get a little bumpy!

I think that, for now, we’ve mostly settled on the least offensive way to expose (in std) a set of low level vendor intrinsics and a very tiny cross platform API based on defining some platform independent SIMD types. The low level vendor intrinsics are not exposed as compiler intrinsics. Instead, we define normal Rust function stubs implemented by LLVM intrinsics and export those. There are undoubtedly details on which we’ll disagree, but I think we can punt on those until the RFC. (I do intend to write a pre-RFC before that point though.)

So… time to mush on to the next problem: dealing with functions whose type signatures permit SIMD types to either be passed as parameters or returned. I’d like to describe this problem from first principles so that we can get other folks participating in this thread without having to read all of it.

A key part of the SIMD proposal thus far is the existence of a new attribute called #[target_feature]. This feature maps precisely to the __attribute__(((target("..."))) annotation found in both gcc and Clang. An initial form of #[target_feature] was recently merged into Rust proper. In short, #[target_feature] is an attribute that one can apply to a function that specifies target options specifically for that function that may be different than ones specified on the command line (e.g., with rustc -C target-feature=...). This means that one can, for example, call AVX2 intrinsics like _mm256_slli_epi64 in a function labeled with #[target_feature = "+avx2"] without having to use rustc -C target-feature=+avx2.

There are two really really important problems that this solves:

  1. This permits Rust programmers to do runtime detection with CPUID to determine which subtarget features are enabled on the host running the program. By comparison, achieving something similar with a cfg! oriented system would require producing a binary for every interesting combination of target features, and then requiring the end user to download the right one. With #[target_feature], we can tell LLVM to enable a specific target feature for codegen of a specific function. If the caller calls that function on a platform without that support, then they could wind up executing a CPU instruction that doesn’t exist. i.e., On Linux, a SIGILL is raised.
  2. This permits the maintainers of the Rust distribution to distribute a single std for a particular target triple. For a similar line of reasoning as (1), if std used a cfg! oriented system, then you’d need to re-compile std for any combination of target features that you’d want to enable. With #[target_feature], we can ship everything in one compiled version of std, and it will be the responsibility of a Rust programmer to ensure they are called on platforms that support it. This opens the door for common pitfalls (like calling _mm256_slli_epi64 on a host without avx2 support), but, on the same token, these vendor intrinsics are intended to be a very low level interface.

If we decided not to go through with #[target_feature], we’d be missing out on (1) completely, and I don’t think there’s any way around it. We’d also have to completely rethink how we stabilize these intrinsics, since a cfg! oriented system in std doesn’t quite work as explained in (2).

Let’s make this concrete with an example. This is how one might define the _mm256_slli_epi64 AVX2 intrinsic:

#[inline(always)]
#[target_feature = "+avx2"]
fn _mm256_slli_epi64(a: i64x4, imm8: i32) -> i64x4 {
    // I think this extern block may be offensive.
    // Presuppose some other way to access the
    // correct LLVM intrinsic from *inside std*. :-)
    #[allow(improper_ctypes)]
    extern {
        #[link_name = "llvm.x86.avx2.pslli.q"]
        fn pslliq(a: i64x4, imm8: i32) -> i64x4;
    }
    unsafe { pslliq(a, imm8) }
}

And an example use inside of a larger program:

#![feature(link_llvm_intrinsics, repr_simd, simd_ffi, target_feature)]

#[repr(simd)]
#[derive(Debug)]
#[allow(non_camel_case_types)]
struct i64x4(i64, i64, i64, i64);

#[inline(always)]
#[target_feature = "+avx2"]
fn _mm256_slli_epi64(a: i64x4, imm8: i32) -> i64x4 {
    #[allow(improper_ctypes)]
    extern {
        #[link_name = "llvm.x86.avx2.pslli.q"]
        fn pslliq(a: i64x4, imm8: i32) -> i64x4;
    }
    unsafe { pslliq(a, imm8) }
}

#[inline(always)]
#[target_feature = "+avx2"]
fn testfoo(x: i64, y: i64, shift: i32) -> i64 {
    let a = i64x4(x, x, y, y);
    _mm256_slli_epi64(a, shift).0
}

#[target_feature = "+avx2"]
fn main() {
    let x: i64 = ::std::env::args().nth(1).unwrap().parse().unwrap();
    let y: i64 = ::std::env::args().nth(2).unwrap().parse().unwrap();
    let shift: i32 = ::std::env::args().nth(3).unwrap().parse().unwrap();
    println!("{:?}", testfoo(x, y, shift));
}

Compile/run with (remember this uses #[target_feature] which I don’t think has hit nightly yet):

$ rustc -O test1.rs  # no -C target-feature required
$ ./test1 15 5 4
240

The generated ASM is correct. Namely, I see a vpsllq instruction emitted and everything appears to get inlined. We’ve made it to the promised land! Errmm, not quite…

There are some really interesting failure modes here. First up, removing #[target_feature = "+avx2"] on testfoo is A-OK. This makes sense, I think, because the caller is asserting that they know the platform has AVX2 support. Indeed, compiling and running the program produces the expected output. However, keeping the #[target_feature = "+avx2"] removed from testfoo and also changing inline(always) to inline(never) produces an LLVM codegen error:

#[inline(never)]
fn testfoo(x: i64, y: i64, shift: i32) -> i64 {
    let a = i64x4(x, x, y, y);
    _mm256_slli_epi64(a, shift).0
}

Compiling yields:

$ rustc -O test1.rs
LLVM ERROR: Do not know how to split the result of this operator!

AFAIK, this cannot be part of stable SIMD on Rust. We must prevent all LLVM codegen errors. How? What exactly is going on here?

OK, let’s move on to another interesting problem. Instead of testfoo calling a SIMD intrinsic internally, it will try to return a SIMD vector. Here is the full program. Notice the missing inline annotation on testfoo:

#![feature(link_llvm_intrinsics, repr_simd, simd_ffi, target_feature)]

#[repr(simd)]
#[derive(Debug)]
#[allow(non_camel_case_types)]
struct i64x4(i64, i64, i64, i64);

#[inline(always)]
#[target_feature = "+avx2"]
fn _mm256_slli_epi64(a: i64x4, imm8: i32) -> i64x4 {
    #[allow(improper_ctypes)]
    extern {
        #[link_name = "llvm.x86.avx2.pslli.q"]
        fn pslliq(a: i64x4, imm8: i32) -> i64x4;
    }
    unsafe { pslliq(a, imm8) }
}

#[target_feature = "+avx2"]
fn testfoo(x: i64, y: i64, shift: i32) -> i64x4 {
    let a = i64x4(x, x, y, y);
    _mm256_slli_epi64(a, shift)
}

fn main() {
    let x: i64 = ::std::env::args().nth(1).unwrap().parse().unwrap();
    let y: i64 = ::std::env::args().nth(2).unwrap().parse().unwrap();
    let shift: i32 = ::std::env::args().nth(3).unwrap().parse().unwrap();
    println!("{:?}", testfoo(x, y, shift));
}

Compiling is successful, but running the program leads to interesting results:

$ rustc -O --emit asm test1.rs^C
$ ./test 15 5 4
i64x4(240, 240, 4, 0)

The correct output, AIUI, should be:

$ ./test 15 5 4
i64x4(240, 240, 80, 80)

Interestingly, if we apply an inline(always) annotation to testfoo, then we get an LLVM error again:

$ rustc -O test.rs 
LLVM ERROR: Do not know how to split the result of this operator!

If I force the issue and instruct rustc to enable avx2 explicitly, then things are golden:

$ rustc -C target-feature=+avx2 -O test.rs 
$ ./test 15 5 4
i64x4(240, 240, 80, 80)

The above works regardless of the inline annotations used on testfoo.

I am pretty flummoxed by the above. I don’t actually know what’s going on at the lowest level here, although I can make a high level guess that applying #[target_feature = "..."] to a function changes something about its ABI, and that this can interact weirdly with SIMD vectors. Since I don’t quite understand the problem, it’s even harder for me to understand the solution space.


@alexcrichton and I talked about the above briefly yesterday, and we brainstormed a few things. The solutions we were tossing out were things along the lines of “always passing SIMD vectors by-ref to LLVM” or “banning SIMD vectors from appearing in the type signature of a safe function” or something similar. These all have additional complications, such as what to do in the presence of generics or when defining type signatures for other ABIs like extern "C" ....


Overall, this seems pretty hairy. I briefly tried to play with something analogous in C and compared the behaviors of gcc and Clang. It seems like they suffer from similarish problems, although gcc appears to warn you. For example, consider this C program:

#include <stdint.h>
#include <stdio.h>
#include <x86intrin.h>

__attribute__((target("avx2")))
__m256i test_avx2() {
    __m256i a = _mm256_set_epi64x(1, 2, 3, 4);
    __m256i b = _mm256_set_epi64x(5, 6, 7, 8);
    return _mm256_add_epi64(a, b);
}

int main() {
    __m256i result = test_avx2();
    int64_t *x = (int64_t *) &result;
    printf("%ld %ld %ld %ld\n", x[0], x[1], x[2], x[3]);
    return 0;
}

Compiled/run with Clang:

$ clang test.c -O
$ ./a.out 
12 10 3399988123389603631 3399988123389603631

And now with gcc:

$ gcc test.c -O
test.c: In function ‘main’:
test.c:13:13: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
     __m256i result = test_avx2();
             ^~~~~~
$ ./a.out 
140724298882158 140120498842949 1 4195757
$ ./a.out 
1 4195757 0 0
$ ./a.out 
140736787806926 139757628374341 1 4195757

Now, of course, this isn’t to say that I expected this C program to work. Rather, I’m showing this as a comparison point to demonstrate different failure modes of other compilers. In Rust, we have to answer questions like whether an analogous Rust program should be rejected by the compiler and/or whether the behavior in the above C program exhibits memory unsafety.


Can anyone help me untangle this?

cc @alexcrichton, @nikomatsakis, @withoutboats, @aatch, @aturon, @eddyb (please CC other relevant folks that I’ve missed :-))

4 Likes