Getting explicit SIMD on stable Rust

gnzlbg · November 16, 2016, 5:29pm

alexcrichton:

Telling you that AVX can't be used when SSE4 is enabled is much harder, though. For example consider:
let answer = if avx_enabled() {
    _my_avx_intrinsic(arg)
} else {
    fallback(arg)
};
Here the compiler would have to know that in the first expression of the if you can use avx intrinsics, but not in the latter. In general this seems like a very very hard to solve problem which has very odd repercussions on the language itself (weird changes to resolve). I'd prefer to consider this as "would be nice to solve" but not necessary for stabilizing SIMD access.

The way I was thinking about solving this is the following. Your example would fail to compile:

let answer = if avx_enabled() {
    _my_avx_intrinsic(arg)
    //^^^^ Error: tried to use AVX intrinsic but none in scope.
} else {
    fallback(arg)
};

but the following example would succeed:

let answer = if avx_enabled() {
   // Users opts into target features explicitly: 
   #[use_target_feature(SSE4, AVX)] {
     _my_avx_intrinsic(arg)
   }
} else {
    fallback(arg)
};

and the following would fail as well:

let answer = if avx_enabled() {
   #[use_target_feature(SSE4)] {
     _my_avx_intrinsic(arg)
     //^^^^ Error: tried to use AVX intrinsic but none in scope.
   }
} else {
    fallback(arg)
};

Typically libraries like liboil, and OpenMP, use "something" like the following pattern.

They have an static function pointer, that is initialized to the implementation to be used. We can have a macro for conditional compilation for incompatible architectures, I just called it, target_architecture, but that is a strawman:

// Conditional compilation for x86
#[cfg(target_architecture(x86))] {  

// Detect the features at run-time and initialize a static function pointer
// with the appropriate algorithm implementation: 
lazy_static! {
    static ref SOME_ALGORITHM_IMPL:  fn(...) -> ... =
      if avx_enabled() {
        some_algorithm_avx_impl 
      } else if sse42_enabled() {
        some_algorithm_sse42_impl
      } else {
        some_algorithm_fallback_impl
      }
    };
}

Note how this code doesn't have any target_feature flags, since it is not doing anything "feature" specific, it is just setting a function pointer.

In the same way, we can add the code for ARM:

// conditional compilation for ARM
#[cfg(target_architecture(ARM))] {  
lazy_static! {
    static ref SOME_ALGORITHM_IMPL:  fn(...) -> ... =
      if neon_enabled() {
        some_algorithm_neon_impl 
      } else {
        some_algorithm_fallback_impl
      }
    };
}

and the code for other architectures:

// conditional compilation for not X86, ARM
#[cfg(!target_feature(x86), !target_architecture(ARM))] { 

// no need to use lazy static here:
static SOME_ALGORITHM_IMPL:  fn(...) -> ... = some_algorithm_fallback_impl;

}

Now we implement the algorithm for all architectures, it just forward to the function pointer:

// The algorithm just uses the function pointer
fn some_algorithm(args...) -> ... {
  SOME_ALGORITHM_IMPL(args...)
}

And now we use the target_feature macros combined with the target_architecture macros to generate the code of the different implementations:

// For X86
#[cfg(target_architecture(x86))] { 

// Different implementations of the functions are generated by the compiler

#[target_feature(AVX)]
fn some_algorithm_avx_impl(args...) -> ... {
  // Might use AVX features (and probably SSE42, since AVX is a strict superset)
}

#[target_feature(SSE42)]
fn some_algorithm_sse42_impl(args...) -> ... {
  // Might use SSE42 features, cannot use AVX features (compiler error) c
}
} 

// For ARM
#[cfg(target_architecture(ARM))] { 

#[target_feature(NEON)]
fn some_algorithm_neon_impl(args...) -> ... { }

}

// The fallback is generated for all architectures
fn some_algorithm_fallback_impl(args...) -> ... {
  // Compiler should error if user tries to use any target features here
}

Note how one must use #[target_feature(...)] on the functions to enable the features for the whole function. That should be just sugar for:

fn name(...) -> ... {
 #[target_feature(...)] {
  // body
 }
}

This should work very similarly to the current way in which code is conditionally included depending on enabled target features:

// This works in Rust today (in nightly)
pub fn pext<T: IntF32T64>(x: T, mask_: T) -> T {
    if cfg!(target_feature = "bmi2") {  // compile-time condition
        unsafe { intrinsics::pext(x, mask_) }
    } else {
        alg::bmi2::pext(x, mask_)
    }

I said before that in the feature blocks the compiler should not use features not supported even if the binary target is set to use those features, but I think that does not make sense. The compiler will use those features everywhere else, so the binary cannot work in targets that don't support those anyways.

Topic		Replies	Views
Stabilizing SIMD-aligned types ahead of the rest of SIMD language design	3	1790	March 25, 2019
SIMD now available in libstd on nightly! libs	15	9061	March 25, 2019
Packed_simd: `cfg(target_feature)` does not play well with `#[target_feature]`	3	1980	March 25, 2019
Pre-RFC: stabilization of target_feature compiler	130	10803	March 25, 2019
Pre-RFC: SIMD groundwork language design	40	9783	March 25, 2019

Getting explicit SIMD on stable Rust

Related Topics