The way I was thinking about solving this is the following. Your example would fail to compile:
let answer = if avx_enabled() {
_my_avx_intrinsic(arg)
//^^^^ Error: tried to use AVX intrinsic but none in scope.
} else {
fallback(arg)
};
but the following example would succeed:
let answer = if avx_enabled() {
// Users opts into target features explicitly:
#[use_target_feature(SSE4, AVX)] {
_my_avx_intrinsic(arg)
}
} else {
fallback(arg)
};
and the following would fail as well:
let answer = if avx_enabled() {
#[use_target_feature(SSE4)] {
_my_avx_intrinsic(arg)
//^^^^ Error: tried to use AVX intrinsic but none in scope.
}
} else {
fallback(arg)
};
Typically libraries like liboil, and OpenMP, use "something" like the following pattern.
They have an static function pointer, that is initialized to the implementation to be used.
We can have a macro for conditional compilation for incompatible architectures, I just called it, target_architecture
, but that is a strawman:
// Conditional compilation for x86
#[cfg(target_architecture(x86))] {
// Detect the features at run-time and initialize a static function pointer
// with the appropriate algorithm implementation:
lazy_static! {
static ref SOME_ALGORITHM_IMPL: fn(...) -> ... =
if avx_enabled() {
some_algorithm_avx_impl
} else if sse42_enabled() {
some_algorithm_sse42_impl
} else {
some_algorithm_fallback_impl
}
};
}
Note how this code doesn't have any target_feature
flags, since it is not doing anything "feature" specific, it is just setting a function pointer.
In the same way, we can add the code for ARM:
// conditional compilation for ARM
#[cfg(target_architecture(ARM))] {
lazy_static! {
static ref SOME_ALGORITHM_IMPL: fn(...) -> ... =
if neon_enabled() {
some_algorithm_neon_impl
} else {
some_algorithm_fallback_impl
}
};
}
and the code for other architectures:
// conditional compilation for not X86, ARM
#[cfg(!target_feature(x86), !target_architecture(ARM))] {
// no need to use lazy static here:
static SOME_ALGORITHM_IMPL: fn(...) -> ... = some_algorithm_fallback_impl;
}
Now we implement the algorithm for all architectures, it just forward to the function pointer:
// The algorithm just uses the function pointer
fn some_algorithm(args...) -> ... {
SOME_ALGORITHM_IMPL(args...)
}
And now we use the target_feature
macros combined with the target_architecture
macros to generate the code of the different implementations:
// For X86
#[cfg(target_architecture(x86))] {
// Different implementations of the functions are generated by the compiler
#[target_feature(AVX)]
fn some_algorithm_avx_impl(args...) -> ... {
// Might use AVX features (and probably SSE42, since AVX is a strict superset)
}
#[target_feature(SSE42)]
fn some_algorithm_sse42_impl(args...) -> ... {
// Might use SSE42 features, cannot use AVX features (compiler error) c
}
}
// For ARM
#[cfg(target_architecture(ARM))] {
#[target_feature(NEON)]
fn some_algorithm_neon_impl(args...) -> ... { }
}
// The fallback is generated for all architectures
fn some_algorithm_fallback_impl(args...) -> ... {
// Compiler should error if user tries to use any target features here
}
Note how one must use #[target_feature(...)]
on the functions to enable the features for the whole function. That should be just sugar for:
fn name(...) -> ... {
#[target_feature(...)] {
// body
}
}
This should work very similarly to the current way in which code is conditionally included depending on enabled target features:
// This works in Rust today (in nightly)
pub fn pext<T: IntF32T64>(x: T, mask_: T) -> T {
if cfg!(target_feature = "bmi2") { // compile-time condition
unsafe { intrinsics::pext(x, mask_) }
} else {
alg::bmi2::pext(x, mask_)
}
I said before that in the feature blocks the compiler should not use features not supported even if the binary target is set to use those features, but I think that does not make sense. The compiler will use those features everywhere else, so the binary cannot work in targets that don't support those anyways.