Hi @pedorcr, here is the procedural macro I mentioned I have been working on. The macro itself is more or less working but the runtime library it calls still needs to be implemented. It works like this:
- The original function is replaced by one that loads a static function pointer and calls that.
- The function pointer is initially pointing to a setup function that checks that hardware capabilities and replaces the function pointer with the optimal version of function for subsequent calls.
Basically it turns
#[runtime_target_feature("+avx")]
pub fn sum(input: &[u32]) -> u32 {
input.iter().sum()
}
to
pub fn sum(input: &[u32]) -> u32 {
pub extern crate runtime_target_feature_rt as rt;
static PTR: rt::atomic::Atomic<fn (&[u32]) -> u32> = rt::atomic::Atomic::new(setup);
fn setup(input: &[u32]) -> u32 {
let chosen_function = if rt::have_avx( ) {
enable_avx
} else {
default
};
PTR.store(chosen_function, rt::atomic::Ordering::Relaxed);
chosen_function(input)
}
fn default(input: &[u32]) -> u32 {
input.iter().sum( )
}
#[target_feature = "+avx"]
fn enable_avx(input: &[u32] ) -> u32 {
input.iter().sum()
}
PTR.load(rt::atomic::Ordering::Relaxed)(input)
}
Your #[makefast]
attribute could just be a wrapper around runtime_target_feature
with some appropriately hardcoded features.
links (note this is the first proper rust I’ve written so any code review is welcome) : The procedural macro attribute source and rt crate and test crate
The GNU ifunc you mention does have some advantages but is quite fragile and not very portable so I wouldn’t recommend implementing that. https://sourceware.org/ml/libc-alpha/2015-11/msg00108.html