Pre-RFC: Struct target feature

sarah · November 17, 2022, 11:58am

Summary

Add #[target_feature] attributes to structs, and enable the corresponding target features to functions taking those structs as parameters.

Motivation

Currently, the only way to tell the compiler it can assume the availability of hardware features is by annotating a function with the corresponding #[target_feature] attribute. This requires that the annotated function be marked as unsafe as the caller must check whether the features are available at runtime.
This also makes it difficult for library authors to use in certain situations, as they may not know which features the library user wants to detect, and at what level the dynamic dispatch should be done.

Assume we want to implement a library function that multiplies a slice of f64 values by 2.0.

pub fn times_two(v: &mut [f64]) {
    for v in v {
        *v *= 2.0;
    }
}

Generally speaking, during code generation, the compiler will only assume the availability of globally enabled target features (e.g., sse2 on x86-64 unless additional feature flags are passed to the compiler).

This means that if the code is run on a machine with more efficient features such as avx2, the function will not be able to make good use of them.

To fix this, the library author may decide to add runtime feature detection to their implementation, choosing some set of features to detect.

#[inline(always)]
fn times_two_generic(v: &mut [f64]) {
    for v in v {
        *v *= 2.0;
    }
}

#[target_feature(enable = "avx")]
unsafe fn times_two_avx(v: &mut [f64]) {
    times_two_generic(v);
}

#[target_feature(enable = "avx512f")]
unsafe fn times_two_avx512f(v: &mut [f64]) {
    times_two_generic(v);
}

pub fn times_two(v: &mut[f64]) {
    if is_x86_feature_detected!("avx512f") {
        times_two_avx512f(v);
    } else if is_x86_feature_detected!("avx") {
        times_two_avx(v);
    } else {
        times_two_generic(v);
    }
}

This decision, however, comes with a few drawbacks:

The runtime dispatch now implies that the code has some additional overhead to detect the hardware features, which can harm performance for small slices.
The addition of more code paths increases binary size.
The dispatch acts as a barrier that prevents inlining, which can prevent compiler optimizations at the call-site.
This requires adding unsafe code to library code, which adds an unnecessary risk.

The proposed alternative offers solutions for these issues.

Guide-level explanation

This RFC does not propose any additions to libcore or libstd. But let us assume for the sake of simplicity that core::arch::x86_64 includes the following structs.

#[target_feature(enable = "avx")]
#[derive(Clone, Copy, Debug)]
pub struct Avx;

#[target_feature(enable = "avx512f")]
#[derive(Clone, Copy, Debug)]
pub struct Avx512f;

The #[target_feature(enable = "avx")] annotation tells the compiler that instances of this struct can only be created if the avx target feature is available, and allows it to optimize code based on that assumption.

Note that this makes the creation of instances of type Avx unsafe.

Now assume that the following functions are added to std::arch::x86_64.

#[inline]
pub fn try_new_avx() -> Option<Avx> {
    if is_x86_feature_detected!("avx") {
        Some(unsafe { Avx })
    } else {
        None
    }
}

#[inline]
pub fn try_new_avx512f() -> Option<Avx512f> {
    if is_x86_feature_detected!("avxf") {
        Some(unsafe { Avx512f })
    } else {
        None
    }
}

Then the library code can now be written as

pub fn times_two<S>(simd: S, v: &mut [f64]) {
    for v in v {
        *v *= 2.0;
    }
}

Now the user can call this function in this manner.

fn main() {
    let mut v = [1.0; 1024];

    if let Some(simd) = std::arch::x86_64::try_new_avx512f() {
        times_two(simd, &mut v); // 1
    } else if let Some(simd) = std::arch::x86_64::try_new_avx() {
        times_two(simd, &mut v); // 2
    } else {
        times_two((), &mut v); // 3
    }
}

In the first branch, the compiler instantiates and calls the function times_two::<Avx512f>, which has the signature fn(Avx512f, &mut [f64]). Since the function takes as an input parameter Avx512f, that means that calling this function implies that the avx512f feature is available, which allows the compiler to perform optimizations that wouldn't otherwise be possible (in this case, automatically vectorizing the code with AVX512 instructions).

In the second branch, the same logic applies but for the Avx struct and the avx feature.

In the third branch, the called function has the signature fn((), &mut [f64]). None of its parameters have types that were annotated with the #[target_feature] attribute, so the compiler can't assume the availability of features other than those that are enabled at the global scope.

Moving the dispatch responsibility to the caller allows more control over how the dispatch is performed, whether to optimize for code size or performance.

Additionally, the process no longer requires any unsafe code.

Reference-level explanation

This RFC proposes that structs, tuple structs, and units be allowed to have one or several #[target_feature(enable = "...")] attributes.

Structs with such annotations are unsafe to construct. Creating an instance of such a struct has the same safety requirements as calling a function marked with the same #[target_feature] attribute.

References to that struct, tuples/other structs/tuple structs containing that struct implicitly inherit the #[target_feature] attribute. But unlike the explicitly annotated struct, they remain safe to construct.

Note that PhantomData<T> must not inherit target feature attributes, as it is always safe to construct.

This RFC additionally proposes that functions taking parameters with a type that has been annotated with a #[target_feature] (or implicitly inherited the attribute), also behave as if they have been annotated with the corresponding #[target_feature(enable = "...")], except that this doesn't impose on them the requirement of having to be marked unsafe.

Drawbacks

Implicitly annotating the functions with the #[target_feature] attribute may cause them to be uninlined in certain situations, which may pessimize performance.

Since the proposed API is opt-in, this has no effect on existing code.

Rationale and alternatives

?

Prior art

?

Unresolved questions

?

Future possibilities

None that I can think of.

HerrJemine · November 17, 2022, 9:42pm

I very much like this idea for using a proof token for a target feature and letting the compiler (or other unsafe code) reason with this for function optimisation.

What I'm not quite sure about is the notion of structs/tuples inheriting the annotation from their fields. Is that necessary for the MVP? It seems not to be needed for the basic use case of passing a ZST proof token around.

And it feels a bit too implicit to me and raises some questions:

Would it show up in RustDoc?

Does it also apply for private fields?

Is it then a back-compat hazard?

But in general I'm in favor!

sarah · November 17, 2022, 9:49pm

the idea behind tuples and structs inheriting it is that you want token proofs to be composable, to some extent.

for example maybe you want both avx2 and fma, and instead of having to make a brand new proof token for it. you could just use an instance of (Avx2, Fma), since the existence of that proves that both features are available.

i don't think this needs to show up in rustdoc, since it's effectively an invisible optimization. it should also apply to private fields but i don't think this makes it a back-compat hazard, since this doesn't affect the observable behavior of the program.

Nugine · November 19, 2022, 1:41pm

You can define a container function with the same signature and make it multiversioned. The library function is likely to be inlined into the container function.

pub mod library {
    #![forbid(unsafe_code)]

    #[inline]
    pub fn times_two(v: &mut [f64]) {
        for v in v {
            *v *= 2.0;
        }
    }
}

pub mod caller {
    #![forbid(unsafe_code)]

    use super::library;
    use multiversion::multiversion;

    #[multiversion]
    #[clone(target = "[x86|x86_64]+avx")]
    #[clone(target = "[x86|x86_64]+sse2")]
    fn times_two(v: &mut [f64]) {
        library::times_two(v)
    }

    fn main() {
        let mut v = [1.0; 1024];
        times_two(&mut v);
    }
}

sarah · November 19, 2022, 1:44pm

this only works if everything (that benefits from vectorization) inside the callee has been inlined. if the library code calls other library code, and that other code doesn't get inlined (which is a pretty common case if the code is reused in multiple places), then you only get the vectorization in the surface part of the library code

Nugine · November 19, 2022, 1:57pm

I think such a way for transitive target features is intrusive for upstream libraries.

If the library intently provides some functions which can be auto-vectorized, then they are very likely to be marked as inlineable. So the performance may not be a problem.

HeroicKatora · November 19, 2022, 1:59pm

I like the idea of representing runtime target features in safe-ish code. It would definitely help the ergonomics of mixing different target features in the same binary, which Rust makes easier to write and compile than most other languages.

That said, it's not necessary to have a new language feature to make construction of structs unsafe. Adding a zero-sized, non-public, field with a safe and unsafe const fn constructor should be enough for the correct semantics.

The second point, using those tokens in the type system for invoking unsafe feature methods, is much more specific. However, another pattern of using such exists when there is no ABI difference created by the feature. e.g. jpeg vectorization

/// Arch-specific implementation of YCbCr conversion. Returns the number of pixels that were
/// converted.
pub fn get_color_convert_line_ycbcr() -> Option<unsafe fn(&[u8], &[u8], &[u8], &mut [u8]) -> usize>
{
    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
    #[allow(unsafe_code)]
    {
        if is_x86_feature_detected!("ssse3") {
            return Some(ssse3::color_convert_line_ycbcr);
        }
    }
    // Runtime detection is not needed on aarch64.
    #[cfg(all(feature = "nightly_aarch64_neon", target_arch = "aarch64"))]
    {
        return Some(neon::color_convert_line_ycbcr);
    }
    #[allow(unreachable_code)]
    None
}

In essence, I would rather see the conversion to a safe function as something that the Avx2 struct or similar do and therefore would prefer having these as methods on such struct. In total, without respect to bikeshedding any of the names:

if let Some(simd) = Avx2::try_new() {
    simd.safe_fn(times_two)(&mut v)
}

sarah · November 19, 2022, 2:17pm

i'm not sure i understand the first point about making this possible without new language features. could you please elaborate on that?

for the second point, i really like your idea about being able to create function pointers. but i'm not sure how the implementation for this could work. simd.safe_fn would need the function to be known as compile time, since it generates a new function. so does it only take function items or an arbitrary impl FnOnce?

and what does it return? if it only takes and returns function items, then the output can be coercible to a function pointer, which is what we want here. but then im not sure how to express that in the language.

not to mention that safe_fn removes the first parameter (Avx2) from the function, so that we don't need to pass it anymore.

that means the signature for it would have to look like something similar to this

fn safe_fn<Args: Tuple>(self, f: impl FnItem(Self, Args) -> R) -> impl FnItem(Args) -> R;

i hope i'm not misunderstanding anything, but i feel like this raises a lot of questions regarding the implementation

sarah · November 19, 2022, 2:19pm

it is intrusive, yes. but inlining everything is sometimes not an option. it could result in significant code bloat and much longer compile times.

Nugine · November 19, 2022, 2:43pm

Yes. Monomorphization has the same problem. rustc needs to compile every generic function in the call stack for every target feature token. SIMD always causes some kind of code bloat.

sarah · November 19, 2022, 2:49pm

not to this degree. imagine the following scenario

pub mod library {
    fn foo(v: &mut[f64]) { /* assume the implementation is big */ }

    pub fn bar(v: &mut [f64]) {
        foo(v);
        // some other code
    }

    pub fn baz(v: &mut [f64]) {
        foo(v);
        // some other code
    }
}

using the multiversion approach, we need to guarantee that foo will be inlined inside both bar and baz. and the generated code will be duplicated inside both, so in the final binary you get the contents of bar, baz and twice the contents of foo

using the simd token approach, the function can be generated once with the vectorization applied to it and then the compiler can choose not to inline it if it thinks that the call overhead is negligible compared to the work that needs to be done, and so in the final binary you get the contents of bar, baz and only once the contents of foo

HeroicKatora · November 19, 2022, 3:19pm

The language feature does a lot at the same time. That's not generally how Rust language features are designed. Simple features are preferred because they are easier to implement, stabilize and reason about. My point was trying to factor the different indivudal effects of the feature (unsafe construction, it's intended effect as a generic parameter, the function marker) to check which are required. And in particular, unsafe construction is nothing new to the language. It already exists, the feature would need to motivate why those eixsting mechanisms wouldn't be applicable if that's one of the intended effects of the target_feature marker attribute.

I'm not certain why such a parameter is required to be passed in the first place. Somehow this type seems to modifies the surrounding function parameter. During monorphization? The mechanism by which the compiler would modify the function type isn't explained in detail. Does it check all types of a function? Usually, marker traits—such as Copy, Unpin, or internally Freeze—are used when a property of a type is supposed to be picked up and acted upon by the compiler. It also runs counter to intuition that there isn't any bound on such a parameter. What happens if you pass a type that isn't one of the simd markers?

It seems that the parameter value could be entirely orthogonal to the generic type parameter that affects the function type. It's used more like the 'constant' parameter of some x86 intrinsics such as _mm_i32gather_pd's scale, which are passed as parameters but actually should be encoded as constants in the instruction stream. In that case, the std/rustc implementation relies on llvm to optimize out unused match-arms that encode all different variants, but discussion of turning it into a proper const parameter are underway. At least to me this sounds similar enough to warrant discussion if the feature would work via a similar approach, or if not, why not, and which thoughts about proper constification apply to the target feature parameter as well.

sarah · November 19, 2022, 3:39pm

i'm not aware of unsafe construction already being in the language. the only thing i can find about it online is other IRLO threads Making struct construction unsafe, or warn in safe context, Type construction safety: unsafe struct / unsafe drop , or closed RFCs

the reason the parameter needs to be passed to the function is because its existence acts as a ZST proof that the features are available, which is why the compiler can safely apply the target feature to the function.

the target feature (as described in my post) has no effect on the struct, other than making its construction unsafe. its actual effect comes into play when that struct appears as the type of a parameter of a function, in which case the target features are automatically applied to the function.

so if you pass a type that isn't a simd marker, nothing happens. the function just doesn't get any target features automatically applied to it.

HeroicKatora · November 19, 2022, 4:11pm

There's nothing to find in RFCS because unsafe construction doesn't need any language intervention:

#[derive(Clone, Copy)]
struct Bad { inner_: () }

impl Bad {
    /// # Safety
    /// Only call this if *environment conditions*, such as when you 
    /// the user owns a cat named Godzilla.
    unsafe fn this_is_the_only_constructor_besides_clone_and_copy() {
        Bad { inner_: () }
    }
}

That's already enough to soundly rely on the construction preconditions for unsafe methods using values of that type. Note that i've written somewhat extensively on using zst-proof types with existing semantics.

You need the ZST proof for the invocation, true, but there are alternatives to using a parameter or even true runtime arguments to achieve that effect. Such mechanism would be new in any case, meaning there's little upside to mashing it into existing, semantical features such as function arguments. Choosing a parameter has downsides, parameters already have a wide array of semantics that are not needed for the type-safety of a proof type and where interaction with them will be heavily complex to manage. I'm not saying that parameters are the wrong mechanism necessarily but the RFC should identify and discuss these downsides.

By your post it has the effect of propagating the property from a field. That is an effect. How does this work with generic type parameters on the struct, that become fields, anyways? We just can't error post-monomorphisation on this. That comes back to the marker trait point I made about designing properties observed by the compiler. Since they also propgate to references how does this work with & dyn_? That ties in with the main point: I think the rfc tries to do too much at once that it doesn't need to which causes all kinds of interactions with other language features, that would need to be considered if such that design for target-feature-invocation is chosen.

So you can use the feature completely incorrectly without any indication of that happening? That certainly doesn't sound like a Rust feature.

sarah · November 19, 2022, 4:56pm

thanks for the link! that was a very enlightening blog post. in retrospect, i agree that the rfc tries to do too much, and that the behavior is too implicit. i think the implicit propagation stuff can be removed as it is not essential, and goes against the principles of rust (and if it does end up being desirable, it can be introduced in a later RFC in a backwards compatible fashion). i'll try to think of alternatives based on your suggestions and try to come up with a better design

if you also have any more ideas or advice to share, i'd be very happy to hear them

HeroicKatora · November 22, 2022, 2:31pm

Regarding ideas, the most pressing unsolved question is indeed how to integrate target-feature dependent functions into the type system which also is in need of either choosing existing mechanisms or adding some terminology (e.g. a new kind of generic) to talk about.

The idea of introducing values for such function types with an attribute, #[target_feature(enable = …), seems quite alright. Locally, that is for the concrete, anonymous ZST representing that exact function, this is somewhat analogous to #[repr(packed)] even if we modifyied semantic rules of how to interact with that type/value. Just a bit ad-hoc. Maybe if the attribute was attached to the supposed ZST argument (and generic) instead?

The portion of having core-integrated proof types for particular cpu-features seems also valuable, independent from the function type changes.

The true puzzle for me is figuring out how to explain how the interaction with the normal function type works. Currently, there's safe normal functions, fn(&mut [i32]), and unsafe functions that may carry any unknown caller requirements, unsafe fn (&mut [i32]). Expanding that type lattice of fn with arbitrary feature combinations would, imho, overcomplicate the fn-type entirely and not even address the truly language-internal semantics of blessing one particular kind of technically ABI-incompatible fn-ptr-cast.

Just spitballing here, maybe that could also be solved better by introducing a wrapper around a function type? I don't quite know.

// No public constructor, constructing using the attribute above.
struct FeatureFn<F: Copy, Req> {
    safe_fn_ptr: F,
    required: PhantomData<Req>,
}

impl<F: SealedFnFeaturePtrTrait> FeatureFn<F, Req> {
    // Proof that the features are enabled by providing witness
    pub fn get(self, _: Req) -> F {
        self.safe_fn_ptr
    }

    // Delay the proof to the caller of the unsafe fn pointer.
    pub fn get_unchecked(self) -> Self::UnsafeFnPtr {
        // Some magic to return `unsafe fn` equivalent. E.g. blessing:
        unsafe { std::transmute(self.safe_fn_ptr) }
    }
}

sarah · November 22, 2022, 11:24pm

i think the pattern of getting the function pointer could addressed independently of the structural target feature, for example by allowing (unsafe) casting between function pointers, as long as their signatures only differ by parameters. we'd have to guarantee that their ABI is the same

that way, you can have a safe api over it that looks like this

fn my_fn<S: Copy>(simd: S, v: &mut [f64]) {}
fn my_fn_ptr<S: Copy>(simd: S) -> fn(&mut [f64]) {
    unsafe { my_fn::<S> as _ }
}

HeroicKatora · November 28, 2022, 1:45am

Maybe. It could work. The cast feels slightly off (changing the number of arguments, mostly) and more people with ABI experience will have to weight pros and cons there but the principle would generalize somewhat beyond purely target feature structs, which could be a plus.

What mechanism do you intend to encapculate the unsafe cast for the user?

sarah · November 28, 2022, 9:59am

we can check that the types we're removing from the signature (in this case S) have size 0, and are Clone which means that we can essentially create them from nothing. them having size 0, means that the abi of the function doesn't change, and them being Clone means that we can pretend to be passing them into the function, as long as we've proven the existence of at least one instance of that type, which we know is true here, since we've received it as a function parameter.

HeroicKatora · November 28, 2022, 7:58pm

Permitting universally that function cast would commit to a specific Rust-ABI for zero-sized type for all architectures–namely one where such parameters are invisible at binary level. Is that desirable? I'll assume you meant Copy in your latest comment? (Both !core::mem::needs_drop::<S>() and trivial clone is necessary for your line of reasoning).

What's less clear to me is how the programmer would check, i.e. write code they can trust and rely upon.

unsafe { my_fn::<S> as _ }

This suggests the existence of some safety requirement to that particular as _ coercion. Where would it be documented—and how? (Note: the existence of as has received some criticism in favor of more intention methods such as pointer::expose_addr, TryFrom). Can the cast be misused? The major motivation was removing some unsafe code in favor of sound abstractions. Which sound abstractions would be enabled with that new unsafe cast primitive? Which safe abstractions would this unsafe cast enable that are otherwise not expressible?

Topic		Replies	Views
Pre-RFC: contextual target feature detection libs	16	795	September 10, 2023
Pre-RFC: #[target_feature(...)] in trait methods language design	4	844	March 25, 2021
Raise a compile warning if you use an x86 intrinsic that's not supported by the target_feature	3	176	October 27, 2024
Allow #[cfg(target_feature = "-feature")]	13	1643	April 21, 2023
Pre-proposal: `#[arm]`/`#[thumb]` function annotations for ARM targets language design	10	1143	March 25, 2019