Pre-RFC: stabilization of target_feature

These three features are enough to do everything indeed. My only two points are:

  1. Having the run-time detection be mandatory for new features that are standardized would be important to not end up with a mix of different levels of compatibility for each feature
  2. Having to roll your own feature detection with #[unconditional_target_feature] and the runtime lib instead of having that be the thing that's standardized sounds a bit like standardizing the low-level pieces and leaving it up to the user to put it together, but maybe that's fine.

@alexcrichton I think the td;dr; is simply that if you expose in the language something that turns on extra features without runtime checks then at least in some architectures you are allowing programs to be written that will cause memory corruption or other unsafe instances when the feature-enabled code ends up running on a CPU without the feature. I don't think this is much of an issue though. First that unsafety already exists today on those architectures if you just compile with that feature enabled for the whole program. Second the way the RFC is designed is fairly low level so normal users will end up only interacting with it through crates that abstract away the unsafe with some nicely usable macros.

@parched it also seems to me that having cfg!(target_feature = "avx") == true inside a #[target_feature = "avx"] function makes the most sense.

Why do you feel this? To use SIMD in Rust you never need to use target_feature. If you were writing the simd crate, then yes, but if you are just using it, then no. If you are not using the simd crate but rustc intrinsics directly, all of them are unsafe, so I don't know why you would care either.


I still don't think this is that critical since we have been able to use all this functionality without standard feature detection just fine. However, if you feel strong about this it is possible to write a small RFC on top of mine motivating it. If enough people feel the same way as you do, we can merge those RFCs and be done with it :slight_smile:

Thanks for explaining @pedrocr and @gnzlbg. I remain unconvinced, however. Are there links to concrete examples of where this causes memory unsafety? What platforms does this happen on?

@gnzlbg the decision of safety here is much larger than intrinsics, it’s a definition of SIMD as-a-whole. If we declare avx2 features as “fundamentally unsafe” then the unsafety must propagate outwards to the actual point where you do a runtime dispatch based on available CPU features. We won’t’ be able to hide this under the cover with intrinsics.

I responded to this before, but maybe you missed it. The vast majority of vendor intrinsics are not unsafe at all.

@burntsushi I see that your stdsimd crate uses the SIMD types and #[target_feature = "..."] to define the intrinsic functions, as opposed to how the simd crate in the nursery does it (which is to call the LLVM intrinsics directly, which is unsafe): simd/src/x86/avx.rs at master ¡ hsivonen/simd ¡ GitHub . Since I've only used the crate in the nursery, I was talking only about that way of doing things.

  1. Are you testing the generated assembly? #[target_feature = "..."] allows the compiler to use those instructions, but the compiler doesn't need to use them. For example, if you don't call the intrinsic directly LLVM often doesn't recognize the code, the intrinsic won't be emitted, and the wrong instructions are generated (e.g. rbit, but pdep, pext, bzhi, and many other algorithms have this problem). At the same time, even if LLVM emits the correct intrinsic, sometimes it will generate "far from optimal code", so at least in my bitintr crate I had to start testing the generated assembly and filling LLVM bugs (some of which were answered in IRC by "you need to use the intrinsic directly, LLVM will probably never be able to recognize the algorithms").

  2. To me both ways of implementing a wrapper around compiler intrinsics look functionality-wise identical. Could you comment on the tradeoffs of doing it one way or the other? On a first look is the main differences are that one requires less unsafe than the other, and that because you are relying on LLVM portable vector intrinsics (which is the right thing for SIMD), your code is nicer to read. Anyhow IIRC calling an intrinsic function on a platform that doesn't support it is undefined behavior per LLVM (some intrinsics are portable though). Since using #[target_feature] allows LLVM to introduce these calls, the old approach was doing the right thing (maybe not in the right way) in assuming that using these intrinsics was unsafe. Note also that just because #[target_feature] could require unsafe fn doesn't mean that your users will need to write unsafe code, e.g., the nursery/simd crate deals with this by wrapping the unsafe functions in safe ones, so that users of the crate never need to deal with unsafe.


@alexcrichton

@gnzlbg the decision of safety here is much larger than intrinsics, it's a definition of SIMD as-a-whole. If we declare avx2 features as "fundamentally unsafe" then the unsafety must propagate outwards to the actual point where you do a runtime dispatch based on available CPU features. We won't' be able to hide this under the cover with intrinsics.

This is only true if your SIMD wrapper exposes this unsafety to the users. The ("old?") nursery/simd crate uses the unsafe SIMD intrinsics everywhere, but it doesn't expose this unsafety to the users. This is how the SIMD approach is supposed to look like with this RFC:


// Raw intrinsic function: dispatches to LLVM directly. 
// Undefined behavior to call this
// on an unsupported target
extern "C" { fn raw_intrinsic_function(f64, f64) -> f64; }

// Software emulation of the intrinsic, 
// works on all architectures
fn software_emulation_of_raw_intrinsic_function(f64, f64) -> f64;

// Calling this function is always safe, has zero cost 
// (can be inlined, etc.) see below
fn my_intrinsic(a: f64, b: f64) -> f64 {
  // Note: if the intrinsic is portable just, directly call the intrinsic
  #[cfg!(target_feature = "some_feature")] {
    // if "some_feature" is enabled, it is safe to call the raw intrinsic function
    unsafe { raw_intrinsic_function(a, b) }
  }
  #[not(cfg!(target_feature = "some_feature"))] {
     // if "some_feature" is disabled calling 
     // the raw intrinsic function is
     // undefined behavior (per LLVM), we call the software emulation
     software_emulation_of_raw_intrinsic_function(a, b)
  }
}

// Provide run-time dispatch to the best implementation:
// ifunc! is a procedural macro that generates copies 
/// of `my_intrinsic` with different target features,
// does run-time feature detection on binary initialization, 
// and sets my_intrinsic_rt 
// to dispatch to the appropriate implementation.
static my_intrinsic_rt = ifunc!(my_intrinsic, ["default_target", "some_feature"]);
// Note that because `#[target_feature = "some_feature"] fn foo; sets 
// cfg!(target_feature = "some_feature") to true, the right 
// branch within my intrinsic will be taken. 

// This procedural macro expands to the following:

fn initialize_my_intrinsic_fn_ptr() {
  if std::cpuid::has_feature("some_feature") -> typeof(my_intrinsic) {
    // using my_intrinsic_some_feature_wrapper is unsafe in general, 
    // but because we have made the run-time feature test we have made it safe:
    unsafe { my_intrinsic_some_feature_wrapper /* do we need a cast here?: as typeof(my_intrinsic) */ }
  } else {
    my_intrinsic
  }
}

// The macro copies the tokens of `my_intrinsic` 
// and annotates the function with #[target_feature]
#[target_feature = "some_feature"]
unsafe fn my_intrinsic_some_feature_wrapper(a: f64, b: f64) 
{
  #[cfg!(target_feature = "some_feature")] {
    // this branch will always be taken because 
    // `#[target_feautre = "..."]` defines `cfg!(target_feature = "...") == true`
    unsafe { raw_intrinsic_function(a, b) }
  }
  #[not(cfg!(target_feature = "some_feature"))] {
     // dead code
     software_emulation_of_raw_intrinsic_function(a, b)
  }
}

Maybe it would make sense to compare this with the approach that @burntsushi is proposing, and weight pros/cons. But I really still don't understand how #[target_feature = "..."] unsafe fn kills the SIMD story given that we already have a SIMD crate (nursery/simd) for which this is not an issue at all.


EDIT: an updated version of this example can be found on this comment of the RFC PR.

They aren't as different as you might imagine. Look at _mm_adds_epi16, for example. You'll notice that it is calling the LLVM intrinsic. The simd crate is effectively doing the same thing, but through the platform-intrinsics machinery in rustc. stdsimd bypasses platform-intrinsics completely. platform-intrinsics are insufficient for exposing vendor intrinsics.

The plan is to add these tests later because rustc already has some infrastructure for doing this, and I don't understand it. Thus far, I've been ad hoc looking at the generated Assembly.

Note that this is how Clang exposes vendor intrinsics. I am not taking a novel path here. This is well trodden ground.

They aren't? The simd crate is exposing a high level interface to SIMD functionality. stdsimd, on the other hand, is explicitly providing an implementation of the vendor intrinsic interfaces defined by CPU vendors. stdsimd is modeled on how Clang provides the same interface. I'd encourage you to compare and contrast: clang/lib/Headers/emmintrin.h at master ¡ llvm-mirror/clang ¡ GitHub

This really isn't a question of whether or not a high level SIMD library should be unsafe or not, and I'm really confused as to why you're focusing on that. We are specifically talking about explicit use of #[target_feature], which I imagine will be commonly used in conjunction with specific vendor intrinsics. Vendor intrinsics must be defined with #[target_feature] in order for anything to work at all.

I see, thanks for chiming in.

The plan is to add these tests later because rustc already has some infrastructure for doing this, and I don't understand it. Thus far, I've been ad hoc looking at the generated Assembly.

If you ever get to it let me know. I did not understand those either and quickly hacked my own tiny system for it that just does a 1:1 comparison of the generated assembly. If you ever use this for the stdsimd crate let me know, I'll be glad to dump mine and use what rustc uses.

Note that this is how Clang exposes vendor intrinsics. I am not taking a novel path here. This is well trodden ground.

Note that calling a function with __target__ in a CPU that doesn't support its feature set seems to be undefined behavior (in LLVM, the assembler, and probably the CPU as well). See this comment on the RFC PR.

Basically, not even x86 guarantees the illegal instruction will be reached at all, nor that the decoder will detect that it is an illegal instruction, nor that the assembler didn't insert other target specific code before the instruction that in a different CPU might lead to memory unsafety.

This really isn't a question of whether or not a high level SIMD library should be unsafe or not, and I'm really confused as to why you're focusing on that.

I thought that stdsimd was a safe wrapper for SIMD intrinsics but IIUC it is only a proposal for the low-level intrinsics that std should expose, such that high-level wrappers can be built.

Vendor intrinsics must be defined with #[target_feature] in order for anything to work at all.

Not all of them. Maybe this is common for SIMD, but for the bit manipulation instruction sets (TBM, ABM, BMI1, and BMI2) this is not required. Calling the llvm intrinsic directly, like here for rbit or here for pdep, is enough.


@alexcrichton I think after this explanation from @burntsushi I finally understand what you mean with making #[target_feature] unsafe would significantly impact our SIMD story. I think this is an issue, but I am still not convinced that this is a big / critical one. Here is why:

The purpose of putting the lowest-level-possible SIMD intrinsics in std:: is to allow SIMD libraries to be written in stable rust out of tree. If the std:: intrinsics for SIMD are unsafe, those writing a wrapper around them will need to deal with this unsafety. These intrinsics can be wrapped in "low-lever" libraries out of std:: that provide a safe API without introducing an extra cost.

So unless one is writing a "lowest-level-safe SIMD wrapper" one shouldn't need to deal with this. Even those writing slightly higher-level should build on top of safe-low-level wrappers.

Also, we can always make these intrinsics safe in a backwards compatible way.

What we cannot do is make these intrinsics unsafe after we made them safe in a backwards compatible way.

I really hope that I am missing something, and that @alexcrichton or @burntsushi can convince me that either the impact to the ecosystem will be significant or that it is 100% safe to call these intrinsics on targets that do not support them. After talking with some LLVM-devs about how decoders on Intel CPUs cannot be relied at all to raise SIGILL exceptions and how the assembler might insert filling code before that happens anyways that might modify memory I still lean on the safest solution which is to make #[target_feature] unsafe.

1 Like

Yeah, I saw that. That may make this quibbling moot. :-/

It's two things I guess. The goal is to expose a small platform independent API consisting of SIMD vector types, while also providing a means to expose platform dependent vendor APIs that interoperate with aforementioned SIMD vector types. Both things are, IMO, required for a SIMD ecosystem to grow. Safety for the platform dependent APIs isn't the primary objective. Making the intrinsics available at all is the primary objective. If they have to be unsafe, then so be it, but up until your recent post on the RFC, my understanding was that most of the vendor intrinsics would be entirely safe to call (assuming we straighten out the ABI issues).

That is interesting. I guess that only works if the LLVM intrinsic doesn't require any SIMD vector types? Otherwise I think LLVM will complain. It also might be something will stop working over time if LLVM ever removes the explicit intrinsic. In particular, LLVM does not have intrinsics for every vendor intrinsic and instead relies on code generation. In that case, I think you will need #[target_feature] even if you aren't using any SIMD vectors.

1 Like

Without the attribute, how would LLVM know which ABI to use for the function when using vector types as arguments? Maybe this is why it is required for the SIMD functions? If this is the case, maybe this is also the key to fixing or detecting ABI incompatibility, we would just need to make the #[target_feature] part of the function type or somehow have it as extra meta-data in the generated code.

A different issue is how does dispatching through a function pointer work in practice if the different function versions have a different ABI ?

EDIT: I've filled a bug in the LLVM bugzilla asking for clarification on what the semantics of __attribute__((__target__("feature"))) are on clang / llvm. It is a waste of our time to go 1000 times around this if LLVM just says that this is undefined behavior. I've tried to look everywhere for how LLVM defines this and haven't been able to find anything. So... if it is not defined, then it is undefined... but maybe the bug is easy to fix. For the RFC we need a definitive answer to this question because otherwise we might be introducing something unimplementable (or a soundness issue).

@burntsushi This is an example of an ABI mismatch using portable vector types:

typedef int v8si __attribute__ ((vector_size (32)));

__attribute__((target("sse")))  v8si bar(v8si a); 
__attribute__((target("avx2")))  int foo(v8si a);

__attribute__((target("sse"))) 
void baz(v8si a) { foo(bar(a)); }

GCC warns about this, clang 4.0 doesn’t warn about this, but both seem to emit code to convert between ABIs:

baz(int __vector(8)):                            # @baz(int __vector(8))
        push    rbp
        mov     rbp, rsp
        and     rsp, -32
        sub     rsp, 192
        lea     rax, [rbp + 16]
        lea     rcx, [rsp + 64]
        movaps  xmm0, xmmword ptr [rax]
        movaps  xmm1, xmmword ptr [rax + 16]
        movaps  xmmword ptr [rsp + 144], xmm1
        movaps  xmmword ptr [rsp + 128], xmm0
        movaps  xmm0, xmmword ptr [rsp + 128]
        movaps  xmm1, xmmword ptr [rsp + 144]
        movaps  xmmword ptr [rsp + 112], xmm1
        movaps  xmmword ptr [rsp + 96], xmm0
        movaps  xmm0, xmmword ptr [rsp + 96]
        movaps  xmm1, xmmword ptr [rsp + 112]
        mov     rax, rsp
        movaps  xmmword ptr [rax + 16], xmm1
        movaps  xmmword ptr [rax], xmm0
        mov     qword ptr [rsp + 56], rcx # 8-byte Spill
        call    bar(int __vector(8))
        movaps  xmmword ptr [rsp + 80], xmm1
        movaps  xmmword ptr [rsp + 64], xmm0
        mov     rax, qword ptr [rsp + 56] # 8-byte Reload
        mov     rcx, qword ptr [rax]
        mov     qword ptr [rsp], rcx
        mov     rcx, qword ptr [rax + 8]
        mov     qword ptr [rsp + 8], rcx
        mov     rcx, qword ptr [rax + 16]
        mov     qword ptr [rsp + 16], rcx
        mov     rcx, qword ptr [rax + 24]
        mov     qword ptr [rsp + 24], rcx
        call    foo(int __vector(8))
        mov     dword ptr [rsp + 52], eax # 4-byte Spill
        mov     rsp, rbp
        pop     rbp
        ret

@zackw after talking with some LLVM devs, they all agree that this would be a compiler bug. Currently, the inliner doesn't (or shouldn't) act across mismatching features, so we don't need to even mark the functions as inline(never) (that would, however, be the safe thing to do if we encounter LLVM bugs).

Even if the function is marked with always_inline it won't be inlined across mismatching features, for example, for this code:

#include <cstdio>
__attribute__((target("avx"), always_inline))
void foo_avx() { printf("avx"); }

bool has_feature() { return false; }

__attribute__((target("sse"))) 
void bar() {
  if (has_feature()) {
    foo_avx();
  }
}

clang errors with:

<source>:10:5: error: always_inline function 'foo_avx' requires target feature 'sse4.2', but would be inlined into function 'bar' that is compiled without support for 'sse4.2'
    foo_avx();
    ^
1 error generated.
Compiler exited with result code 1

Rust should IMO do the same. I need to add the interaction of target feature with always inline (and inlining in general) to the RFC.

There are two things you could mean by this, with very different implications. One possibility is that you are just saying that LLVM guarantees not to inline a function that is annotated with "this function uses feature X" into a function that doesn't have this annotation. This would still mean that we are on our own for detecting features and we cannot have a feature-dependent block in the middle of a feature-independent function. The other possibility is that LLVM actually does have a "detect runtime presence of feature" intrinsic, tracks feature-dependency at the level of individual machine instructions, and won't hoist a feature-dependent instruction past a detection intrinsic. That would be great if it were true, and would mean that an awful lot of the complexity of this RFC was unnecessary, but it would entail so much complexity inside LLVM that I suspect it's not true.

Some concreteness would probably help here, and since I don't know anything about the guts of LLVM, I am going to express said concreteness with x86(-64) assembly language. The AVX-specific variant of memchr in GNU libc (sysdeps/x86_64/multiarch/memchr-avx2.S) begins like this (after stripping out a bunch of macro goo that's not relevant):

__memchr_avx2:
        /* Check for zero length.  */
        testq   %rdx, %rdx
        jz      .Lnull
        movl    %edi, %ecx
        /* Broadcast CHAR to YMM0.  */
        vmovd   %esi, %xmm0
        vpbroadcastb %xmm0, %ymm0

And the version that assumes only SSE2, which is always available on a 64-bit x86 (sysdeps/x86_64/memchr.S), begins like this:

__memchr_sse2:
        movd    %esi, %xmm1
        mov     %edi, %ecx
        punpcklbw %xmm1, %xmm1
        test    %rdx, %rdx
        jz      .Lreturn_null
        punpcklbw %xmm1, %xmm1

(Yes, there are two punpcklbw instructions in a row. No, that's not a mistake, as far as I can tell.)

Now, imagine that x86 had a test_feature_set instruction that let you directly branch on whether or not AVX is available, then you could choose between these two implementations using it:

memchr:
        test_feature_set #avx2
        beq .Lmemchr_sse2  /* if not available */
        /* Check for zero length.  */
        testq   %rdx, %rdx
        jz      .Lnull
        movl    %edi, %ecx
        /* Broadcast CHAR to YMM0.  */
        vmovd   %esi, %xmm0
        vpbroadcastb %xmm0, %ymm0
        ...
.Lmemchr_sse2:
        movd    %esi, %xmm1
        mov     %edi, %ecx
        punpcklbw %xmm1, %xmm1
        test    %rdx, %rdx
        jz      .Lreturn_null
        punpcklbw %xmm1, %xmm1
        ...

Let us further imagine that this is late-stage LLVM intermediate representation for a Rust memchr compiled using the runtime if cfg!(avx2) { ... } else { ... } thing, and the compiler is now looking for shared code that it can factor out. With adjustments to the register allocation choices, it could do this:

memchr:
        /* Check for zero length.  */
        testq   %rdx, %rdx
        jz      .Lreturn_null
        movl    %edi, %ecx
        vmovd   %esi, %xmm0
        test_feature_set #avx2
        beq .Lno_avx2  /* AVX not available */
        /* Broadcast CHAR to YMM0.  */
        vpbroadcastb %xmm0, %ymm0
        ...
.Lno_avx2:
        punpcklbw %xmm0, %xmm0
        punpcklbw %xmm0, %xmm0
        ...

But the vpbroadcastb instruction must not be moved up any further, because it's only known to be available after the test_feature_set / beq pair. And that's the key question: Does LLVM, in some way, know that? My suspicion is that it doesn't — that it would cheerfully move the vpbroadcastb up and render the code unsafe to run on non-AVX processors.

Incidentally, cpuid is slow and clobbers an awful lot of integer registers. You notice neither version of memchr needs a stack frame? That wouldn't be true if it were invoking cpuid every time it's called, and that's a big point in favor of the ifunc (i.e. "function pointers set up at the beginning of main") approach to architecture-specific code.

1 Like

… I don’t understand why that hyperlink isn’t getting HTMLified. Sorry.

So i've pushed a v2 version of the RFC to the RFC repo, it would be great if we could keep on discussing this there to avoid having to follow two different discussions. That version of the RFC resolves the unsafety issue, ABI issues, inlining issues, and run-time detection issues, so if you can re-read the RFC and give me feedback that would be greatly appreciated.

Here is the link: RFC: target_feature by gnzlbg ¡ Pull Request #2045 ¡ rust-lang/rfcs ¡ GitHub


@zackw

One possibility is that you are just saying that LLVM guarantees not to inline a function that is annotated with "this function uses feature X" into a function that doesn't have this annotation.

Not necessarily, if foo is annotated with sse4 and you call it from bar which is annotated with avx, then LLVM can safely inline foo into bar just fine because sse4 is a subset of avx. If the functions are annotated the other way around, then you are correct, and the function won't be inlined by LLVM. LLVM doesn't do any more complex analysis than that, the target features are not "compatible" so the function is not inlined, period.

This would still mean that we are on our own for detecting features and we cannot have a feature-dependent block in the middle of a feature-independent function.

This is correct. It is explained in the #[target_feature] section of the new RFC and possible ways to allow this are discusseed in the Alternative section of the RFC (relaxing inlining).

The other possibility is that LLVM actually does have a "detect runtime presence of feature" intrinsic, tracks feature-dependency at the level of individual machine instructions, and won't hoist a feature-dependent instruction past a detection intrinsic.

No, your first guess was correct, this is not true. For run-time feature detection we are on our own (need our own library, or need to link to a C library, etc.).

Does LLVM, in some way, know that?

Once LLVM or the assembler see code for a particular target feature they assume that this code can only be reached on a host that supports the feature (because reaching that code in a host that does not support the feautre is undefined behavior in LLVM, the assembler, and possibly the hardware).

The LLVM inliner just doesn't inline across mismatching features. Because LLVM does not support __target__ on blocks, but only on functions, this is enough to guarantee correctness. Anything more complicated probably requires improving LLVM. We might be able to work around this limitation in rustc, but this hasn't been explored in much depth.


Again, it would be great if we can move the discussion to the RFC repo for the time being, here the link: RFC: target_feature by gnzlbg ¡ Pull Request #2045 ¡ rust-lang/rfcs ¡ GitHub

Note that llvm already has the code somewhere as it needs it to implement -march=native. So maybe it would be useful to ask the LLVM developers if that code can be turned into something that's easy to embed in compiled output? It would be really nice if the output of #[target_feature] was guaranteed to be the same as that of cfg!("target_feature") when -C target-cpu=native is used.

@pedrocr

Note that llvm already has the code somewhere as it needs it to implement -march=native. So maybe it would be useful to ask the LLVM developers if that code can be turned into something that's easy to embed in compiled output?

Note that LLVM can cross-compile to many target, but LLVM itself only runs on a few of them, so I would expect the -march=native code to at best only handle those targets on which LLVM runs on. GCC has custom code for doing run-time feature detection in binaries when doing function multi-versioning.

So yeah, it would be nice, but 1) doing run-time feature detection isn't that hard either, and 2) the foot-print for a binary run-time must be as small as possible (e.g. if you are targetting ARM you don't want to include run-time detection code for x86).


EDIT: @pedrocr see this comment: https://github.com/rust-lang/rfcs/pull/2045#issuecomment-311643572

Would it work to instead define vendor intrinsics with a #[require_target_feature] attribute, with the same effect as #[target_feature], but without the "unsafe" restriction and also only being callable from functions marked with a matching #[target_feature]?

Such a distinction could possibly also help with ABI issues: let's say that #[target_feature] functions get the standard ABI, while #[require_target_feature] functions are allowed to use fancy calling conventions and such, and taking addresses to #[require_target_feature] functions could be forbidden as well. Or whatever is useful.

No. The whole point of #[target_feature] is being able to call those functions from functions that aren't marked with #[target_feature]. (Think about what you need to do for runtime dispatching.)

Sure, we’d keep #[target_feature], and let such functions be called from anywhere (with unsafe, as it looks right now). We’d just restrict the ability to call intrinsics functions to #[target_feature] functions, meaning you might need to write a #[target_feature] wrapper function.

At least right now one doesn't need to use #[target_feature] to call some target-specific intrinsics that invoke undefined behavior in the wrong targets.

I think that one needs to use #[target_feature] for SIMD intrinsics to:

  • deal with the ABI of LLVM's portable vector types,
  • to allow LLVM to emit non-portable-code when using portable arithmetic expressions like a + b on them.

So while I don't think that what you propose is a bad idea (looks like better type-checking/errors to me), I also don't know how it fits with these other types of intrinsics that do not require #[target_feature], nor how it fits with portable vector intrinsics that do not require #[target_feature] at all (they are portable). Basically these other intrinsics are implicitly annotated with #[target_feature] by LLVM.