Summary
Add #[target_feature]
attributes to structs, and enable the corresponding
target features to functions taking those structs as parameters.
Motivation
Currently, the only way to tell the compiler it can assume the availability of
hardware features is by annotating a function with the corresponding
#[target_feature]
attribute. This requires that the annotated function be
marked as unsafe as the caller must check whether the features are available at
runtime.
This also makes it difficult for library authors to use in certain situations, as
they may not know which features the library user wants to detect, and at what
level the dynamic dispatch should be done.
Assume we want to implement a library function that multiplies a slice of f64
values by 2.0
.
pub fn times_two(v: &mut [f64]) {
for v in v {
*v *= 2.0;
}
}
Generally speaking, during code generation, the compiler will only assume the
availability of globally enabled target features (e.g., sse2
on x86-64
unless additional feature flags are passed to the compiler).
This means that if the code is run on a machine with more efficient features
such as avx2
, the function will not be able to make good use of them.
To fix this, the library author may decide to add runtime feature detection to their implementation, choosing some set of features to detect.
#[inline(always)]
fn times_two_generic(v: &mut [f64]) {
for v in v {
*v *= 2.0;
}
}
#[target_feature(enable = "avx")]
unsafe fn times_two_avx(v: &mut [f64]) {
times_two_generic(v);
}
#[target_feature(enable = "avx512f")]
unsafe fn times_two_avx512f(v: &mut [f64]) {
times_two_generic(v);
}
pub fn times_two(v: &mut[f64]) {
if is_x86_feature_detected!("avx512f") {
times_two_avx512f(v);
} else if is_x86_feature_detected!("avx") {
times_two_avx(v);
} else {
times_two_generic(v);
}
}
This decision, however, comes with a few drawbacks:
- The runtime dispatch now implies that the code has some additional overhead to detect the hardware features, which can harm performance for small slices.
- The addition of more code paths increases binary size.
- The dispatch acts as a barrier that prevents inlining, which can prevent compiler optimizations at the call-site.
- This requires adding unsafe code to library code, which adds an unnecessary risk.
The proposed alternative offers solutions for these issues.
Guide-level explanation
This RFC does not propose any additions to libcore
or libstd
. But let us
assume for the sake of simplicity that core::arch::x86_64
includes the following structs.
#[target_feature(enable = "avx")]
#[derive(Clone, Copy, Debug)]
pub struct Avx;
#[target_feature(enable = "avx512f")]
#[derive(Clone, Copy, Debug)]
pub struct Avx512f;
The #[target_feature(enable = "avx")]
annotation tells the compiler that
instances of this struct can only be created if the avx
target feature is
available, and allows it to optimize code based on that assumption.
Note that this makes the creation of instances of type Avx
unsafe.
Now assume that the following functions are added to std::arch::x86_64
.
#[inline]
pub fn try_new_avx() -> Option<Avx> {
if is_x86_feature_detected!("avx") {
Some(unsafe { Avx })
} else {
None
}
}
#[inline]
pub fn try_new_avx512f() -> Option<Avx512f> {
if is_x86_feature_detected!("avxf") {
Some(unsafe { Avx512f })
} else {
None
}
}
Then the library code can now be written as
pub fn times_two<S>(simd: S, v: &mut [f64]) {
for v in v {
*v *= 2.0;
}
}
Now the user can call this function in this manner.
fn main() {
let mut v = [1.0; 1024];
if let Some(simd) = std::arch::x86_64::try_new_avx512f() {
times_two(simd, &mut v); // 1
} else if let Some(simd) = std::arch::x86_64::try_new_avx() {
times_two(simd, &mut v); // 2
} else {
times_two((), &mut v); // 3
}
}
In the first branch, the compiler instantiates and calls the function
times_two::<Avx512f>
, which has the signature fn(Avx512f, &mut [f64])
.
Since the function takes as an input parameter Avx512f
, that means that
calling this function implies that the avx512f
feature is available, which
allows the compiler to perform optimizations that wouldn't otherwise be
possible (in this case, automatically vectorizing the code with AVX512
instructions).
In the second branch, the same logic applies but for the Avx
struct and the
avx
feature.
In the third branch, the called function has the signature fn((), &mut [f64])
.
None of its parameters have types that were annotated with the
#[target_feature]
attribute, so the compiler can't assume the availability of
features other than those that are enabled at the global scope.
Moving the dispatch responsibility to the caller allows more control over how the dispatch is performed, whether to optimize for code size or performance.
Additionally, the process no longer requires any unsafe code.
Reference-level explanation
This RFC proposes that structs, tuple structs, and units be allowed to have
one or several #[target_feature(enable = "...")]
attributes.
Structs with such annotations are unsafe to construct. Creating an instance of
such a struct has the same safety requirements as calling a function marked with
the same #[target_feature]
attribute.
References to that struct, tuples/other structs/tuple structs containing that
struct implicitly inherit the #[target_feature]
attribute. But unlike the
explicitly annotated struct, they remain safe to construct.
Note that PhantomData<T>
must not inherit target feature attributes, as it is
always safe to construct.
This RFC additionally proposes that functions taking parameters with a type
that has been annotated with a #[target_feature]
(or implicitly inherited the attribute),
also behave as if they have been annotated with the corresponding
#[target_feature(enable = "...")]
, except that this doesn't impose on them
the requirement of having to be marked unsafe
.
Drawbacks
Implicitly annotating the functions with the #[target_feature]
attribute may
cause them to be uninlined in certain situations, which may pessimize
performance.
Since the proposed API is opt-in, this has no effect on existing code.
Rationale and alternatives
?
Prior art
?
Unresolved questions
?
Future possibilities
None that I can think of.