Suggestion for a low-effort way to take advantage of SIMD and other architecture specific tricks LLVM knows about

parched · June 5, 2017, 10:49pm

Hi @pedorcr, here is the procedural macro I mentioned I have been working on. The macro itself is more or less working but the runtime library it calls still needs to be implemented. It works like this:

The original function is replaced by one that loads a static function pointer and calls that.
The function pointer is initially pointing to a setup function that checks that hardware capabilities and replaces the function pointer with the optimal version of function for subsequent calls.

Basically it turns

#[runtime_target_feature("+avx")]
pub fn sum(input: &[u32]) -> u32 {
    input.iter().sum()
}

to

pub fn sum(input: &[u32]) -> u32 {
    pub extern crate runtime_target_feature_rt as rt;

    static PTR: rt::atomic::Atomic<fn (&[u32]) -> u32> = rt::atomic::Atomic::new(setup);

    fn setup(input: &[u32]) -> u32 {
        let chosen_function = if rt::have_avx( ) {
            enable_avx
        } else {
            default
        };
        PTR.store(chosen_function, rt::atomic::Ordering::Relaxed);
        chosen_function(input)
    }

    fn default(input: &[u32]) -> u32 {
        input.iter().sum( )
    }

    #[target_feature = "+avx"]
    fn enable_avx(input: &[u32] ) -> u32 {
        input.iter().sum()
    }

    PTR.load(rt::atomic::Ordering::Relaxed)(input)
}

Your #[makefast] attribute could just be a wrapper around runtime_target_feature with some appropriately hardcoded features.

links (note this is the first proper rust I’ve written so any code review is welcome) : The procedural macro attribute source and rt crate and test crate

The GNU ifunc you mention does have some advantages but is quite fragile and not very portable so I wouldn’t recommend implementing that. https://sourceware.org/ml/libc-alpha/2015-11/msg00108.html

Topic		Replies	Views
Getting explicit SIMD on stable Rust	336	46079	March 25, 2019
Function Multi-Versioning for Rust? compiler	17	3755	March 25, 2019
Pre-RFC: SIMD groundwork language design	40	10271	March 25, 2019
[Pre-RFC] Meta-target feature for 128-bit SIMD language design	19	1565	March 25, 2019
Mir optimization pass that implements auto-vectorization compiler	12	3503	June 26, 2022

Suggestion for a low-effort way to take advantage of SIMD and other architecture specific tricks LLVM knows about

Related topics