Pre-RFC: stabilization of target_feature

Yes this is pretty much it. The only difference I think is that I'm suggesting the syntax to work on if/else blocks so it's easy to combine multiple features (e.g., the SIMD levels). That could still be implemented by having each block content be a function and have the multiple selection of which block will be run decided at startup. So somethink like:

fn main() {
  if cfg!(runtime_target_feature = "avx") {
    println!("foo");
  } else {
    println!("bar");
  }
}

would get turned into:

fn block1() { println!("foo"); }
fn block2() { println!("bar"); }
static mut block: fn() -> () = block2;

fn main() {
  // this would be a more complicated logic if there were several if/else blocks
  if cpu_features::detect_avx() { unsafe { block = block1; } }

  block();
}

Can you give a more concrete example of what common usage looks like to you? When you write:

... some code here that uses avx2, throught the simd crate,

I really don't know what you are implying. Is that ... supposed to call some function from the SIMD crate? What does this function look like? Should that function fail to compile when called in the other branch of the if? Why do you need cfg! to call a function from the SIMD crate? (note: you don't need it to call any function in the SIMD crate).

... intrinsics

Calling intrinsics is unsafe and can be done independently of cfg!(target_feature). So.. really, you need to be more concrete here. Are you implying that whether an intrinsic can be called or not should be checked by target feature?


@zackw

So does this hypothetical binary, which I believe is what @pedrocr is proposing:

There is a difference, in my example foo can be inlined because it is not behind a function pointer, in your example it cannot be inlined in general (unless the compiler inserts a run-time branch).

IMO what you want is a language feature for ifunc, while this is a language feature for C's __attribute__((target_feature="...")). They are different and serve different purposes. Just because we can emulate ifunc on top of target_feature doesn't mean we should.

OTOH ifunc only works on Linux, while an emulation on top of target feature works on all platforms that LLVM supports.

Any code will work. I mentioned the SIMD crate because if you use things like f32x4 with avx enabled avx instructions get generated. For my use both branches would have the same code it would just be that one is compiled with avx and the other not. The only difference between the branches is that in one LLVM is allowed to use AVX instructions and in the other not. Everything else is the same.

No, just saying that if you know that you're in a branch with AVX enabled you can now use an intrinsic and know it won't crash at runtime with a SIGILL.

Ok, so my question is the following: why does the code between both branches need to differ? Why cannot it be the same?

Example: you write:

if cfg!(runtime_target_feature = "avx2") {
  ... some code here that uses avx2, throught the simd crate, intrinsics or just letting the compiler optimize ...
} else {
  ... some equivalent code that doesn't use avx2...
}

but this is a run-time check, not a compile-time one IIUC correctly.

If this were a compile-time check, I can understand how you want to generate different code for different branches:

fn foo() {
   if cfg!(compile_time_target_feature = "avx2") {
      ... do something with avx2 unsafe intrinsics ...
    } else {
      ... do something else without avx2 and safe code...
    }
}

// somewhere else call different versions of foo at run-time:
if cfg!(run_time_target_feature = "avx2") {
  foo()  // calls version of foo compiled with avx2 enabled
} else {
  foo() // calls version of foo compiled without avx2
}

But then my questions are, what if foo comes for a third-party library that doesn’t want to give you the code? You cannot recompile it with avx2 enabled, how can this library provide on its ABI functions specific for different target features? (that’s what #[target_feature =...] as a function attribute does. It instructs the compiler to generate this functions that can be used on ABI boundaries).

Second, why do you need that run-time if at all?

Can’t you just say:

#[ifunc("default; sse4; avx; avx2")] // compiles 4 versions of foo
fn foo() {
  if cfg!(compile_time_target_feature = "default") {
    // provide a default implementation without vectorization
  } else if ... {
    // provide some other versions without the SIMD crate
  }
}

// somewhere else:
foo(); // calls best implementation at run-time

the compiler could then, on binary initialization, set some function pointers based on target features, so that you just need to call foo() and always the best thing is chosen at run-time.

I agree that this is useful, but this is ifunc, not target_feature, and the main difference is that this call to foo doesn’t have the same cost as one call to foo using target feature.

Inlining a CPU-feature-dependent function into a CPU-feature-independent one is very, very dangerous, because there is then nothing to prevent the compiler from hoisting feature-dependent instructions (e.g. vector constant loads) out of the control-flow subgraph guarded by the runtime feature check. I do not know anything about the guts of LLVM, but I would expect it to have specific logic in its inliner to prevent it from doing this.

For the same reason, I don't think the runtime version of if cfg!(feature) is feasible.

It can but it doesn't have to. You may want to write code that's more ammenable to one feature or another.

Simple enough, just use the same construct inside the library so that when you call 'somecrate::foo()` it will also runtime dispatch between avx or no avx.

I think you're trying to replicate the exact same features as exist in C when those don't even work very well. My suggestion was for something more ergonomic and consistent between runtime and compile time feature selection. ifunc is just a implementation detail.

Indeed, @burntsushi I guess this is what you meant with whether the "unsafety" of a #[target_feature] function can leak out of the function.

Oh, why not? In my example it gets converted to function calls and those guarantee the proper boundaries, no?

Unless we write our own compiler backend we are very much restricted with the functionality and semantics that LLVM does supports, which sadly means what clang does support, which again emerged as an effort to be compatible with GCC.


@pedrocr I think it would be great if you could re-formulate your proposed solution with the exact semantics on code generation, run-time execution, type-checking, ... that you want for each case, since the one that you posted above was a start, but has crystalized more in the discussion. (When you are not clear about the level of detail required, maybe mention to which rust code your abstractions desugar too)

@gnzlbg isn’t the block1/block2 case I wrote enough? What else are you looking for or what doubts does that give you?

I think this would also depend on whether the generated LLVM IR explicitly use platform specific intrinsics (or even asm! directly), or LLVM generic intrinsics. LLVM can hoist its own intrinsics out of these ifs without problems (although this isn't probably what the user wanted).

The only fix I can think of is making these functions inline(never), but this is a huge hammer.

Yes, if you always outline such blocks to inline(never) functions it should be safe.

All of the above can and will be hoisted, because the compiler is not aware that the feature-requiring instruction has a "dependency" (in the compiler-jargon sense) on the feature check.

I'm afraid it is the only hammer in town.

That seems fine to me. To get good speedups just make sure to do the checks sufficiently outside the hotter bits of code to not have a bunch of function call overhead.

@pedrocr you wrote:

This is clear, if cfg!(target_feature = "some feature") { is a macro that expands at compile-time to either true or false. This enables conditional code generation. Here the question of whether the compiler can hoist target specific instructions out of the if remains open. The compiler shouldn't do this. Has the same semantics as cfg!(target_feature = "some feature") in the RFC.

IIUC this macro expands to some runtime cpuid code, but it doesn't have any semantics beyond that right? That is, the value of cfg!(target_feature) in both branches is the same. Is this correct? Also, you can call whatever code you like in both branches, nothing is checked at compile-time, at worst you get a segfault. Is this correct? This is basically the same as calling a function from an external crate, like in the cpuid examples of the RFC.

This has the same semantics of #[target_feature] in the RFC.

So IIUC the only change you want is for some sort of run-time detection to be added to the RFC, whether it happens via cfg!(runtime_target_feature = "feature") or std::cpu::has_feature("feature") is just a minor syntactic detail right?

It does. In the if branch the feature is enabled for code generation, in the else branch it's not.

So what happens when I call a function on the branch where the feature is enabled? Is the feature enabled for that function (e.g. a copy of the function with the feature enabled is generated?). If yes, what does this do when the source code is not available? E.g. on ABI boundaries? Can it somehow select function on #[unconditional_target_feature] ?

If it cannot "see" through functions, and it doesn't enable cfg!(target_feature), there is no way for the code inside the branches to know and implement feature-dependent behavior. Doesn't the code look identical in both branches? If yes, how do you avoid this code repetition? If not, what is the difference?

Nothing, it's a function call. Unless the feature that you've enabled changes something about how to do function calls it will be just the same call.

Sometimes it can be sometimes it will not be. As a programmer you know that the first block has the feature enabled and the second doesn't and you can take advantage of that. Maybe that's writing a different algorithm that's more ammenable to the feature. Maybe that's using an unsafe intrinsic that you now know will be available...

I see what you mean there and maybe enabling that flag will be a good idea. Since you're in a block that has the feature enabled maybe this makes sense:

if cfg!(runtime_target_feature="avx") {
  if cfg!(target_feature="avx") {
    ... your code here ...
  } else {
    unreachable!();
  }
} else {
  ... some other code here...
}

Now obviously this case makes no sense as you already know the check works but if the target_feature check is somewhere inside a macro that gets expanded out it makes sense that it would evaluate to true when it's inside the runtime_target_feature true branch.

How do you abstract this code between run-time and compile-time detection? You want to write this code once, but be able to select the proper branch at either compile-time or run-time.

Now obviously this case makes no sense as you already know the check works but if the target_feature check is somewhere inside a macro that gets expanded out it makes sense that it would evaluate to true when it's inside the runtime_target_feature true branch.

Can you maybe check the "How do we teach this?" section of the RFC and let me know what you think?

The only difference AFAIK is that instead of using if cfg!(runtime_target_feature it uses if std::cpuid::has_target_feature, but semantically it should be very similar to the point you are coming to.

It's basically a three layered approach:

  • compile-time detection is used to implement the algorithms

  • unconditional_target_feature is used to force code generation using particular features

  • some run-time detection library is used to dispatch on these unconditional_target_features

And then any kind of sugar that makes this easier can be provided as procedural macros.

@gnzlbg, Nice work, just a few comments from me.

Does that mean all features that are exposed have to be additive? I.e., for subtractive feature like the arm llvm feature d16 would have to be exposed by rust as +d32 = -d16. (I'd be fine with that renaming). What about mutually exclusive features like +/-thumb-mode on arm? I guess mutually exclusive ones could be their own attributes like #[thumb_mode] and #[arm_mode] like GCC does in this case, however there is already a precedent for mutually exclusive target features on the command line with crt-static. Also if - isn't allowed then we can remove the + as it's redundant.

+1 from me here.

One thing I think this RFC needs to address is how #[target_feature] and cfg(target_feature) should interact. I.e., should cfg(target_feature) know about the extra features if it is used in a #[target_feature] function? IMO, yes. See `cfg_target_feature` and `target_feature` don't interact properly ¡ Issue #42515 ¡ rust-lang/rust ¡ GitHub for some discussion.

Sorry a huge amount of discussion has happened on this thread since I last got a chance to read it over. Is there perhaps a tl;dr; of why the RFC recommends requiring unsafe for #[target_feature] usage? I continue to personally feel that’ll essentially kill SIMD usability in Rust.