All right folks, strap yourselves in, because I think itâs going to get a little bumpy!
I think that, for now, weâve mostly settled on the least offensive way to expose (in std
) a set of low level vendor intrinsics and a very tiny cross platform API based on defining some platform independent SIMD types. The low level vendor intrinsics are not exposed as compiler intrinsics. Instead, we define normal Rust function stubs implemented by LLVM intrinsics and export those. There are undoubtedly details on which weâll disagree, but I think we can punt on those until the RFC. (I do intend to write a pre-RFC before that point though.)
So⌠time to mush on to the next problem: dealing with functions whose type signatures permit SIMD types to either be passed as parameters or returned. Iâd like to describe this problem from first principles so that we can get other folks participating in this thread without having to read all of it.
A key part of the SIMD proposal thus far is the existence of a new attribute called #[target_feature]
. This feature maps precisely to the __attribute__(((target("...")))
annotation found in both gcc and Clang. An initial form of #[target_feature]
was recently merged into Rust proper. In short, #[target_feature]
is an attribute that one can apply to a function that specifies target options specifically for that function that may be different than ones specified on the command line (e.g., with rustc -C target-feature=...
). This means that one can, for example, call AVX2 intrinsics like _mm256_slli_epi64
in a function labeled with #[target_feature = "+avx2"]
without having to use rustc -C target-feature=+avx2
.
There are two really really important problems that this solves:
- This permits Rust programmers to do runtime detection with
CPUID
to determine which subtarget features are enabled on the host running the program. By comparison, achieving something similar with a cfg!
oriented system would require producing a binary for every interesting combination of target features, and then requiring the end user to download the right one. With #[target_feature]
, we can tell LLVM to enable a specific target feature for codegen of a specific function. If the caller calls that function on a platform without that support, then they could wind up executing a CPU instruction that doesnât exist. i.e., On Linux, a SIGILL
is raised.
- This permits the maintainers of the Rust distribution to distribute a single
std
for a particular target triple. For a similar line of reasoning as (1), if std
used a cfg!
oriented system, then youâd need to re-compile std
for any combination of target features that youâd want to enable. With #[target_feature]
, we can ship everything in one compiled version of std
, and it will be the responsibility of a Rust programmer to ensure they are called on platforms that support it. This opens the door for common pitfalls (like calling _mm256_slli_epi64
on a host without avx2
support), but, on the same token, these vendor intrinsics are intended to be a very low level interface.
If we decided not to go through with #[target_feature]
, weâd be missing out on (1) completely, and I donât think thereâs any way around it. Weâd also have to completely rethink how we stabilize these intrinsics, since a cfg!
oriented system in std
doesnât quite work as explained in (2).
Letâs make this concrete with an example. This is how one might define the _mm256_slli_epi64
AVX2 intrinsic:
#[inline(always)]
#[target_feature = "+avx2"]
fn _mm256_slli_epi64(a: i64x4, imm8: i32) -> i64x4 {
// I think this extern block may be offensive.
// Presuppose some other way to access the
// correct LLVM intrinsic from *inside std*. :-)
#[allow(improper_ctypes)]
extern {
#[link_name = "llvm.x86.avx2.pslli.q"]
fn pslliq(a: i64x4, imm8: i32) -> i64x4;
}
unsafe { pslliq(a, imm8) }
}
And an example use inside of a larger program:
#![feature(link_llvm_intrinsics, repr_simd, simd_ffi, target_feature)]
#[repr(simd)]
#[derive(Debug)]
#[allow(non_camel_case_types)]
struct i64x4(i64, i64, i64, i64);
#[inline(always)]
#[target_feature = "+avx2"]
fn _mm256_slli_epi64(a: i64x4, imm8: i32) -> i64x4 {
#[allow(improper_ctypes)]
extern {
#[link_name = "llvm.x86.avx2.pslli.q"]
fn pslliq(a: i64x4, imm8: i32) -> i64x4;
}
unsafe { pslliq(a, imm8) }
}
#[inline(always)]
#[target_feature = "+avx2"]
fn testfoo(x: i64, y: i64, shift: i32) -> i64 {
let a = i64x4(x, x, y, y);
_mm256_slli_epi64(a, shift).0
}
#[target_feature = "+avx2"]
fn main() {
let x: i64 = ::std::env::args().nth(1).unwrap().parse().unwrap();
let y: i64 = ::std::env::args().nth(2).unwrap().parse().unwrap();
let shift: i32 = ::std::env::args().nth(3).unwrap().parse().unwrap();
println!("{:?}", testfoo(x, y, shift));
}
Compile/run with (remember this uses #[target_feature]
which I donât think has hit nightly yet):
$ rustc -O test1.rs # no -C target-feature required
$ ./test1 15 5 4
240
The generated ASM is correct. Namely, I see a vpsllq
instruction emitted and everything appears to get inlined. Weâve made it to the promised land! Errmm, not quiteâŚ
There are some really interesting failure modes here. First up, removing #[target_feature = "+avx2"]
on testfoo
is A-OK. This makes sense, I think, because the caller is asserting that they know the platform has AVX2 support. Indeed, compiling and running the program produces the expected output. However, keeping the #[target_feature = "+avx2"]
removed from testfoo
and also changing inline(always)
to inline(never)
produces an LLVM codegen error:
#[inline(never)]
fn testfoo(x: i64, y: i64, shift: i32) -> i64 {
let a = i64x4(x, x, y, y);
_mm256_slli_epi64(a, shift).0
}
Compiling yields:
$ rustc -O test1.rs
LLVM ERROR: Do not know how to split the result of this operator!
AFAIK, this cannot be part of stable SIMD on Rust. We must prevent all LLVM codegen errors. How? What exactly is going on here?
OK, letâs move on to another interesting problem. Instead of testfoo
calling a SIMD intrinsic internally, it will try to return a SIMD vector. Here is the full program. Notice the missing inline
annotation on testfoo
:
#![feature(link_llvm_intrinsics, repr_simd, simd_ffi, target_feature)]
#[repr(simd)]
#[derive(Debug)]
#[allow(non_camel_case_types)]
struct i64x4(i64, i64, i64, i64);
#[inline(always)]
#[target_feature = "+avx2"]
fn _mm256_slli_epi64(a: i64x4, imm8: i32) -> i64x4 {
#[allow(improper_ctypes)]
extern {
#[link_name = "llvm.x86.avx2.pslli.q"]
fn pslliq(a: i64x4, imm8: i32) -> i64x4;
}
unsafe { pslliq(a, imm8) }
}
#[target_feature = "+avx2"]
fn testfoo(x: i64, y: i64, shift: i32) -> i64x4 {
let a = i64x4(x, x, y, y);
_mm256_slli_epi64(a, shift)
}
fn main() {
let x: i64 = ::std::env::args().nth(1).unwrap().parse().unwrap();
let y: i64 = ::std::env::args().nth(2).unwrap().parse().unwrap();
let shift: i32 = ::std::env::args().nth(3).unwrap().parse().unwrap();
println!("{:?}", testfoo(x, y, shift));
}
Compiling is successful, but running the program leads to interesting results:
$ rustc -O --emit asm test1.rs^C
$ ./test 15 5 4
i64x4(240, 240, 4, 0)
The correct output, AIUI, should be:
$ ./test 15 5 4
i64x4(240, 240, 80, 80)
Interestingly, if we apply an inline(always)
annotation to testfoo
, then we get an LLVM error again:
$ rustc -O test.rs
LLVM ERROR: Do not know how to split the result of this operator!
If I force the issue and instruct rustc
to enable avx2
explicitly, then things are golden:
$ rustc -C target-feature=+avx2 -O test.rs
$ ./test 15 5 4
i64x4(240, 240, 80, 80)
The above works regardless of the inline annotations used on testfoo
.
I am pretty flummoxed by the above. I donât actually know whatâs going on at the lowest level here, although I can make a high level guess that applying #[target_feature = "..."]
to a function changes something about its ABI, and that this can interact weirdly with SIMD vectors. Since I donât quite understand the problem, itâs even harder for me to understand the solution space.
@alexcrichton and I talked about the above briefly yesterday, and we brainstormed a few things. The solutions we were tossing out were things along the lines of âalways passing SIMD vectors by-ref to LLVMâ or âbanning SIMD vectors from appearing in the type signature of a safe functionâ or something similar. These all have additional complications, such as what to do in the presence of generics or when defining type signatures for other ABIs like extern "C" ...
.
Overall, this seems pretty hairy. I briefly tried to play with something analogous in C and compared the behaviors of gcc and Clang. It seems like they suffer from similarish problems, although gcc appears to warn you. For example, consider this C program:
#include <stdint.h>
#include <stdio.h>
#include <x86intrin.h>
__attribute__((target("avx2")))
__m256i test_avx2() {
__m256i a = _mm256_set_epi64x(1, 2, 3, 4);
__m256i b = _mm256_set_epi64x(5, 6, 7, 8);
return _mm256_add_epi64(a, b);
}
int main() {
__m256i result = test_avx2();
int64_t *x = (int64_t *) &result;
printf("%ld %ld %ld %ld\n", x[0], x[1], x[2], x[3]);
return 0;
}
Compiled/run with Clang:
$ clang test.c -O
$ ./a.out
12 10 3399988123389603631 3399988123389603631
And now with gcc:
$ gcc test.c -O
test.c: In function âmainâ:
test.c:13:13: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
__m256i result = test_avx2();
^~~~~~
$ ./a.out
140724298882158 140120498842949 1 4195757
$ ./a.out
1 4195757 0 0
$ ./a.out
140736787806926 139757628374341 1 4195757
Now, of course, this isnât to say that I expected this C program to work. Rather, Iâm showing this as a comparison point to demonstrate different failure modes of other compilers. In Rust, we have to answer questions like whether an analogous Rust program should be rejected by the compiler and/or whether the behavior in the above C program exhibits memory unsafety.
Can anyone help me untangle this?
cc @alexcrichton, @nikomatsakis, @withoutboats, @aatch, @aturon, @eddyb (please CC other relevant folks that Iâve missed :-))