Pre-RFC: stabilization of target_feature

Oh, I'm well aware. My point is that the compiler will already have to include the runtime code to determine feature availability to implement target-cpu=native so might as well have a single well defined and consistent implementation of that. Otherwise you'll have things that target-cpu=native will enable but the equivalent #[target_feature] will not and vice versa which is not ideal from a platform consistency standpoint. Doing it also solves the SIGILL problem by having that detection be handled in a well-defined way by rust instead of a poorly understood platform-specific way by the OS/CPU that may or may not work.

All the more reason to have a single well-implemented and platform-consistent way of doing it, instead of having the compiler and the runtime implementation do different things. If you're implementing the feature, having to also implement the way to detect it consistently would be a great guarantee by the platform. If you standardize avx2 why not also standardize the runtime way to detect avx2 so that all bits of the compiler and platform work consistently?

The compiler uses the LLVM target components for target-feature=..., #[target_feature], and cfg!(target_feature), so I think this part is done consistently.

Third party crates are allowed to use LLVM for run-time feature detection as well (just like rustc does). As such crates appear, mature, and we get more experience, we will be able to tell whether using LLVM for run-time feature detection is a good idea or not. Right now, those crates don't exist, so we don't know.

Or in other words, yes! It would be nice to have an official crate for run-time feature detection that is lightweight, accurate, fast, and consistent with rustc. This crate currently doesn't exist, and there is no reason why this crate cannot evolve out-of-tree as a third-party crate [*], like many others do. Since we have zero experience with this in Rust, it is also unclear how the API should look like, and what are the trade-offs. So I would say, let's wait and see what happens. We can always move a fundamental crate into the nursery later, or even use it from within the compiler.

[*] Maybe a reason could be that this third-party crate wants to use the exact version of LLVM that rustc is using, but the solution is not to move the crate into rustc, but rather to expose the LLVM somehow through, e.g., cargo, so that third-party crates can use it.

Okay, that convinced me. I now think its a bad idea to require all #[target_feature] to be unsafe, and we should do something else instead.

My main motivating issue for that demand was not the potential memory unsafety in fact, but the issue that you now somehow have to do the dispatch yourself, and the potential footgun that you might have by omitting the check. Of course though, this wouldn't be the first place where we unwrap or abort if there is some error (like if we get no more memory by the os, or if we recurse too much).

So the reasoning behind requiring it to make unsafe is that it makes it harder to use the feature without a wrapping library that does the dispatch for you, or similar means.

That being said, unsafe is, as you outlined, not really the means to do that. I'll think a little and maybe I'll come up with some nice proposal.

I’ve recently ran my lewton benchmarks one of my performance checks with RUSTFLAGS="-Ctarget-cpu=native" and to my surprise found a 20% speedup solely from enabling that flag. This made me thinking: what about taking the main performance critical components of my library and compiling them via different #[target_feature] annotations, and then dispatching on them in very cold code that gets called very seldomly?

In order to do this, it would be really cool if the #[target_feature] flag worked on modules as well, inferring #[target_feature] for all its functions inside.

I don't know if target_feature is going to apply to the avr-rust effort, but: on most avr devices (like the Atmega128 introduced in 2002), opcodes 0xa000-0xafff decode to LDD/STD instructions; however, there is a "reduced core" instruction set, used in Atmega ATTiny10-type devices (introduced in 2009), where 0xa000-0xafff decode to the similarly-named but differently functional LDS/STS instructions. (the LDS opcode fetches from a 7-bit offset into SRAM; LDD takes a 6-bit offset which is added to the Y register to address the SRAM)

(Also, in general , avr cpus do not have any kind of "illegal instruction" signal)

3 Likes

My point is that if you make target_feature enforce the run-time detection then every time you stabilize a new feature you also require stabilization of the runtime check. Whoever codes the feature also needs to take that into account. If you leave that to external crates the runtime detection will lag the features and generally be less well defined. It may also allow stabilization of features that are just impossible to runtime detect. It can become a mess quite quickly.

My suggestion was to not have new API at all. Instead change the semantics of target_feature to be "generate this block of code with the target feature enabled and only run it if it's available at runtime". It would be the runtime equivalent of cfg_target_feature that also does feature detection. In my view it would be a much more consistent way of handling it. Both features would mean "run this code with feature X enabled if feature X is available", one at compile time, one at runtime. The runtime detection would also be exactly the same making for a much more consistent developer experience.

@pedrocr How do you propose that this is implemented?

The reason nobody has done this is that we don’t have a good way to implement this. Rustc uses LLVM for feature detection. Embedding LLVM in the runtime of Rust’s binaries is not an option (since it would significantly increase binary size). Also, there is no way to check for this without incurring a run-time cost when calling functions that use target feature, but the whole point of target feature is to improve performance. These things are in tension. Even if it can be done, there would need to be a way to use target feature without these checks. So I don’t see why this cannot be done later, as a QoI issue for debug builds.

@gnzlbg what I’m suggesting is that for someone to submit a feature to rust two bits of code need to be provided:

  1. Something that enables that feature for a section of code when LLVM is called for code generation
  2. A test to see if the current machine has the feature (that doesn’t depend on LLVM)

Then with that do the following:

  • Use 1) to enable the feature in code generation both for cfg_target_feature and target_feature
  • To implement cfg_target_feature run 2) at compile time to check the feature
  • To implement target_feature embed 2) in the compiled executable, running before main() and putting the result in a global variable. Then emit the code around a big if GLOBAL_FEATURE_X {...} block.

As for the performance reasons they are the same if this is in a crate or in rustc. And it seems like a much safer setup to have target_feature allways imply that large if block so that you don’t have to depend on SIGILL to crash safely instead of doing weird things.

As for doing target_feature without the check I don’t really see why you’d want to use it without some form of runtime detection. If you are fine with your program crashing with illegal instructions just compile it with the feature enabled and be done with it. But it would be trivial to have a third version that has neither compile time nor runtime checks and just means ā€œenable this feature for this section of codeā€. Maybe someone wants to have fine-grained control over which sections use SSE and which don’t even though the instructions are always available?

A couple of remarks:

  • this assumes that such functionality exists for all targets but it does not; some targets do not allow querying of features at run-time.

  • we would need such framework for each target / OS that LLVM supports, since how to query this information is target dependent. There are a couple of frameworks like libcpuid (the cpuid Rust crate) available, but they at most support x86 and ARM; libcpuid only supports x86 and x86_64.

  • such a framework must be kept in sync with rustc backends updates (LLVM, Cretonne) since they can add new targets that the framework doesn't support.

  • the existing frameworks are not guaranteed to produce consistent results with LLVM. Only LLVM is guaranteed to produce consistent results with itself; I don't see a way around that. For the Cretonne backend we would need whatever feature detection Cretonne uses.

So... given that this cannot be done for all targets, and that it is very unclear to me if it is even possible to do this consistently with rustc backends without incurring a huge cost (more on the cost below), I think that the best solution is to try to provide this in debug builds (or some other configuration) on a best effort basis.

As for doing target_feature without the check I don't really see why you'd want to use it without some form of runtime detection.

I don't think this makes much sense either.

As for the performance reasons they are the same if this is in a crate or in rustc.

Not really. Users must do explicit run-time feature detection anyways, but they can do them in ways that the compiler cannot do. If we insert automatic checks, all calls to functions using #[target_feature] look like this:

let static has_feature = detect_feature_xyz_at_runtime();  // executed only once
if has_feature { function_with_feature_xyz(); } else { panic!(""); }

Doing this on every function call is unacceptable. A user can do way better, for example, by performing a global initialization for its own library that does the check once, sets some function pointers, and then just dereferences those function pointers without doing any kind of check. The compiler cannot do this because it doesn't know which functions will actually be called, so it must do the tests before each call.

I think that it would be great if there was some good library for detecting run-time features. As it is now, first we'll get some libraries that only work on some platforms (like the cpuid crate), and maybe in the future those will be wrapped by some other libraries that will try to provide a cross-platform API. Maybe then one of those can be moved into the nursery, and even used by rustc on some kind of hardened builds that provide better diagnostics on logic errors like calling the wrong function on the wrong target. But giving that doing a 100% perfect job is not possible, we really cannot guarantee that users will get a nice error message (or even an error message at all) if they call the wrong #[target_feature] function in some platform. As long as program termination is ensured, we can make #[target_feature] safe, but if in some platform it is impossible to detect target features at runtime, and 2) using an illegal instruction leads to undefined behavior in the CPU (e.g. in AVR), which might lead to memory corruption, I really do not see an alternative to making #[target_feature] require unsafe.

I understand that detection may not be trivial for all features which is actually exactly why I think this needs to be done by rustc and not left up to the user. If you have a feature that can't be detected it shouldn't be available. If you have a feature that can't be detected in that specific target it needs to be disabled for that target. And that doesn't seem hard at all. Have the feature detection code default to false and only become true if someone provides working runtime detection for that feature on that platform.

Keeping in sync with LLVM doesn't seem to be an actual problem either. You only need LLVM to generate code with the feature enabled, if LLVM detects a feature that your code doesn't you're just missing an opportunity to speed up, if LLVM doesn't detect a feature but you do, and that's not a bug, then more power to you. If you have a bug and are detecting features that don't actually exist that bug needs fixing.

Maybe I'm not reading this right but it seems to be the compiler can and should do exactly this. Do the test at startup and redirect the relevant functions. In the GNU toolchain that's done with ifunc:

http://www.agner.org/optimize/blog/read.php?i=167

Doing an 100% perfect job is only not possible if features are allowed to be stabilized without proper detection. I would simply reject features that don't provide a proper runtime check.

I don't think this is an issue because these platforms are already broken today. If you just compile normally and run that binary in a machine that does that undefined behavior you already get this issue without any unsafe usage.

1 Like

So here goes v2 of the RFC, which hopefully includes all the feedback (let me know if it doesn’t). I went with the #[target_feature] unsafe fn variant due to the issues mentioned in the alternative section. We can always make it safe later.

Pre-RFC v2: target_feature

Note, both cfg!(target_feature) and #[target_feature] are already available on nightly, but no RFC was ever submitted for them. This RFC attempts to lay down their behavior so that they can be stabilized in the near future.

Summary

Platforms like x86_64 and ARM provide a ā€œdefaultā€ set of operation that each CPU supports. Extra operations like AVX vector instructions are only available within a subset of the x86_64 CPUs. This RFC proposes extending Rust to allow conditionally generating and executing code for a particular architecture depending on which ā€œfeaturesā€ the target supports:

  • Language constructs:

    • conditional compilation: cfg!(target_feature = "feature_name"): the result of thecfg!macro returnstrueif the target CPU supports the target featurefeature_name(e.g., cfg!(target_feature = "sse4") returnstrue` if the target supports the SSE4 instruction set)
    • code generation: #[target_feature = "+sse2"] unsafe fn foo(...): the function attribute #[target_feature] instructs the compiler to use SSE2 instructions when generating code for the function foo
  • Backend options:

    • rustc -C --target-feature=+sse2: instructs the compiler to use SSE2 instructions when generating code for a crate
    • rustc -C --target-cpu=native: instructs the compiler to use all available features of the host target when generating code for a crate

Detailed design

The default set of target features enabled for a particular architecture is implementation defined. This RFC proposes adding the following two constructs to the language:

  • the cfg!(target_feature = "feature_name") macro, and
  • the #[target_feature = "+feature_name") function attribute for unsafe functions.

Only stabilized feature_names can be used with these constructs. This RFC porposes that each feature name is gated behind its own feature macro: target_feature_feature_name (that is, to use a target feature on nightly crates must write #![allow(target_feature_avx2)]).

1. cfg!(target_feature)

The cfg!(target_feature = "feature_name") macro allows querying specific hardware features of the target at compile-time. This information can be used for conditional compilation or conditional code execution:

// Conditional compilation:
#[cfg!(target_feature = "bmi2")] {
    // if target has the BMI2 instruction set, use a BMI2 instruction:
    unsafe { intrinsic::bmi2::bzhi(x, bit_position) }
} 
#[not(cfg!(target_feature = "bmi2"))] {
    // otherwise call an algorithm that emulates the instruction:
    software_fallback::bzhi(x, bit_position)
} 

// Conditional code execution:
if cfg!(target_feature = "bmi2") {
    // if target has the BMI2 instruction set, use a BMI2 instruction:
    unsafe { intrinsic::bmi2::bzhi(x, bit_position) }
} else {
    // otherwise call an algorithm that emulates the instruction:
    software_fallback::bzhi(x, bit_position)
}

The macro cfg!(target_feature = "feature_name") returns:

  • true if the feature is enabled,
  • false if the feature is disabled.

2. #[target_feature]

The unsafe function attribute #[target_feature = "+feature_name"] extends the feature set of a function, that is, the implementation is allowed to use more target features when generating code for a function. Removing features from the default_feature_set using #[target_feature = "-feature_name"] is not allowed. Calling a function on a platform that does not support the feature set of the function is undefined behavior.

#[target_feature = "+avx"] 
unsafe fn foo_avx(...) { ... } 

#[target_feature = "+sse4"] 
unsafe fn foo_sse4(...) { ... } 

// Check run-time features on initialization 
// and set up safe to call fn ptr:
let global_foo_ptr = initialize_foo();
fn initialize_foo() -> fn (...) -> ... {
    let info = cpuid::identify().unwrap();
    match info {
        info.has_feature(cpuid::CpuFeature::AVX) => unsafe { foo_avx }, 
        info.has_feature(cpuid::CpuFeature::SSE4) => unsafe { foo_sse4 }, 
        _ => unreachable!()
    }
}

// Dispatch on function call:
fn foo() -> ... {
    let info = cpuid::identify().unwrap();
    match info {
        info.has_feature(cpuid::CpuFeature::AVX) => unsafe { foo_avx(...) }, 
        info.has_feature(cpuid::CpuFeature::SSE4) => unsafe { foo_sse4(...) }, 
        _ => unreachable!()
    }
}

ABI mismatches with #[target_feature]

Functions with different #[target_feature]s might have a different ABI for, e.g., passing and returning function arguments (e.g. depending on the feature the registers for this might differ).

The implementation must ensure that this cannot invoke undefined behavior. That is, the implementation must detect this at compile-time, and translate arguments from one ABI to another. Since this hasn’t been implemented yet, it is unclear how to do this, or if it can be done for all cases. Hence it is mentioned as an unresolved question below.

3. Backend compilation options

The implementation must provide the following two ways of talking with its code generation backend:

  • -C --target-feature=+/-backend_target_feature_name: where +/- add/remove features from the default feature set of the platform for the whole crate. The behavior for non-stabilized features is implementation defined. The behavior for stabilized features is:
    • to implicitly mark all functions in the crate with #[target_feature = "+/-feature_name"] (where - is still a hard error)
    • cfg!(target_feature = "feature_name") returns true if the feature is enabled. If the backend does not support the feature, the feature might be disabled even if the user explicitly enabled it, in which case cfg! returns false. A soft diagnostic is encouraged.
  • -C --target-cpu=backend_cpu_name, which changens the default feature set of the crate to be that of backend_cpu_name.

These two options are already ā€œdefacto-stabilizedā€, that is, they are work on stable Rust today. This just specifies its semantics with respect to stable target features. When stabilizing a feature a different name or a deprecation period (with a warning) should be used to avoid breaking backwards compatibility.

Drawbacks

The classic reason is that extra features increase language complexity.

If we don’t solve the problem that these features do solve, lots of libraries cannot be implemented efficiently out of rustc (e.g., SIMD, bitwise manipulation instructions, AES, …). These libraries are still implementable outside of rustc by directly using asm!, but this is less efficient than using LLVM directly or indirectly to generate the intrinsics because while LLVM can reason about its intrinsics to perform optimizations, it cannot reason about asm! (e.g. if asm! is used to perform SIMD, LLVM won’t fuse vector operations when possible).

A reason not to do this as proposed in this RFC is that we don’t know how to do this safely. OTOH, the motivation to investigate safe alternatives is low if it cannot be done anyways.

Alternative designs / Unresolved questions

Make #[target_feature] safe

To make #[target_feature] safe we would need to ensure that calling functions on a targets that do not support their feature set cannot result on memory unsafety.

There are two main problems:

  • platforms like AVR invoke undefined behavior in hardware when an illegal instruction is encountered (as opposed to, e.g., x86, which throws a SIGILL exception, crashing the process, and preventing memory unsafety).
  • not all platforms support run-time feature detection: x86 has pretty good run-time feature detection, other platforms like PowerPC allow querying the cpu model, but the user must know which features the model implies, other platforms do not allow querying anything at all.

This RFC conservatively makes #[target_feature] ā€œunsafeā€, that is, only functions that are unsafe to call can use this attribute. If we find a solution that guarantees a lack of memory unsafety on all platforms, ideally without introducing a big run-time cost or significantly increasing binary sizes, we can always make it safe later.

An implementation might, on a best-effort basis, provide run-time instrumentation to catch calls to functions on platforms that do not support their target feature set (this is allowed by undefined behavior).

To see why this can introduce a run-time cost, consider the following code:

use std::io;

#[target_feature = "+avx2"]
unsafe fn foo(); 

fn main() {
    let mut s = String::new();
    io::stdin().read_line(&mut s).unwrap();

    match s.trim_right().parse::<i32>() {
        Ok(i) => { 
          if i == 1337 { unsafe { foo(); } }
        },
        Err(_) => println!("Invalid number."),
    }
}

This code works fine on all x86 CPUs, unless the user passes 1337 as input, in which case it only works for AVX2 CPUs. The implementation can check whether the CPU supports AVX2 on initialization, but even if it doesn’t, the program is correct unless the user inputs 1337, so the fatal check must be done when calling any functions using #[target_feature]. Now replace 1337 from the result of some system cpuid library, there is no way the compiler can know that a result value from a third-party library correspond to a target feature unless we decide to encode these in the type system (which is an option not proposed here).

Never segfault on #[target_feature]

This is an extension of making #[target_feature] safe. Even if we can guarantee lack of memory unsafety, users will still be getting potentially hard-to-debug crashes. It would be better if we could guarantee that they always get a panic! instead. For this we need reliable run-time feature detection, which is not possible on all cases.

Feature-dependent unsafety

Whether #[target_feature = "feature_name"] requires unsafe fn or not could depend on feature_name. For example, this would allow making all uses of #[target_feature] safe on x86 / ARM / PowerPC / MIPS /… since executing an illegal instruction is guaranteed to crash the application and hence cannot introduce memory unsafety.

Some other features like d16, which reduces the number of FP registers so the fallback path can use more of them, are inherently safe.

This extension can be pursued in a backwards compatible way.

Allow removing features from the default feature set in #[target_feature]

Allowing #[target_feature = "-feature_name"] can result in non-sensical behavior. The backend might always assume that at least the default feature set for the current platform is available. In these cases it might be better to globally choose a different default using -C --target-cpu / -C --target-feature.

Make --target-feature/--target-cpu unsafe

Rename them to --unsafe-target-feature/--unsafe-target-cpu. The rationale would be that a binary for an architecture compiled with the wrong flags would still be able to run in that architecture, but because using illegal instructions on some architectures might lead to memory unsafety, then these operations are unsafe.

Path towards stabilization

  1. Enable a warning on nightly on usage of target features without the corresponding feature(target_feature_xxx) feature flag.
  2. After a transition period (e.g. 1 release cycle) make this a hard error on nightly.
  3. After all unresolved questions are resolved, stabilize cfg! and #[target_feature]
  4. …N: stabilize those target_feature_xxxs that we want to have on stable following either a mini-RFC in the rust-lang issues for these features or the normal RFC process.

How do we teach this?

First, these features are low-level features. We don’t expect many users to use them directly. We expect that for the most commonly used platforms, an ecosystem of cpuid crates will emerge, that will allow querying features at run-time. Some of these crates already exist, like the cpuid crate for doing this on x86. In C/C++ similar crates exist for ARM and other platforms. These libraries would also use cfg!(target_feature = "...") internally to avoid run-time checks when the binary is compiled with --target-feature/--target-cpu and the results of cpuid are known at compile-time.

Then we expect that some higher-level libraries will emerge, that will make it easy to safely generate and dispatch on different implementations of some function using different target features. For example, given:

fn foo() -> ... {}

let foo_ptr = ifunc!(foo, ["sse4", "avx", "avx2"]); 

where ifunc! is a procedural macro that creates foo_sse4/avx/avx2 implementations annotated with the corresponding #[target_feature = "+xxx"]s, and on binary initialization at run-time uses one of the cpuid crates to query the cpu for target features, and sets the function pointer foo_ptr to the best implementation. Performing the check once on initialization means that calling the function pointer foo_ptr is safe and will not result in any SIGILL, for example. These libraries could also use cfg!(target_feature = "...") internally to avoid run-time checks when the binary is compiled with --target-feature/--target-cpu and the results of cpuid are known at compile-time., or rely on their cpuid libraries doing these optimizations.

Similar procedural macros could emerge for performing dispatch inline, where the cpuid flags are tested the first time a function is called. Some of these crates already exist, like the runtime-target-feature-rs crate.

So given these expectations, we would need to teach both the low-level way in which cfg! and target_feature work, and also the easier / safer ways that exists for the most common cases.

For the low-level documentation we need:

  • a section in the book listing the stabilized target features, and including an example application that uses both cfg! and #[target_feature] to explain how they work, and
  • extend the section on cfg! with target_feature

For the higher-level documentation it would be great if we could get a couple of fundamental libraries (cpuid for x86, ARM, … and some procedural macros for the common cases) close to their 1.0 releases before these features are stabilized.

Unresolved questions

Before stabilization the following issues must be resolved:

  • Is it possible to make #[target_feature] both safe and zero-cost?

On the most widely used platforms the answer is yes, this is possible. On other platforms like AVR the answer is probably not, but we are not sure.

  • Is it possible to provide good error messages on misues of #[target_feature] at zero-cost ?

The answer is probably not: just because a binary includes a function does not mean that the function will be called. Hence the only way to check whether an invalid call happens is to add a check before each function call which incurs a potentially significant cost (e.g. in the form of missed optimizations).

  • Is it possible to automatically detect and correct ABI mismatches between target features?

Acknowledgements

@parched @burntsushi @alexcrichton @est31 @pedrocr

1 Like

A binary containing functions for targets the current CPU doesn't support can run on that CPU perfectly fine as long as it doesn't call those functions. Or in other words, just because a function is in the binary doesn't mean it must be called.

The only place where the check can be reliably done is when the user attempts to call the function. The compiler might be able to hoist this checks out of loops, functions, ... until the crate root, or it might not (e.g. if it is behind an inlining barrier).

A user, on the other hand, knows a bit more than the compiler. E.g. the user might put in the readme of its crate that it requires AVX and AVX2, so its crate might detect this on initialization and print a nice error message if neither AVX nor AVX2 are available.

If you attempt to run a binary or a function on a target that doesn't support its target features, today, on all targets, you are guaranteed undefined behavior without any unsafe rust code.

You are trying to improve this situation by always producing 100% reliable behavior when an user attempts to call a function on a target that doesn't support it (e.g. in the form of a nice panic!).

Some targets do not support this, so unless somebody finds a solution this doesn't produce 100% reliable behavior, but rather, sometimes defined / sometimes undefined behavior, which is worse than where we started.

We could provide better diagnostics on targets that support this, but I don't think we can do this on all targets.

EDIT: something like ifunc might be nice as well, but given that Rust doesn't support function overloading, and that one can easily emulate this using lazy_static! or a global static variable, I think that it will probably be relegated to libraries. E.g. a library takes a list of pairs of features and functions, and using a plugin checks the features, and generates the boilerplate code for you. For example:

let global_ptr = overload_target_feature![("avx2", foo_avx2), 
    ("avx", foo_avx), ("sse4;sse3;sse2", foo_sse2)];

expands to the global initialization using unsafe from above, but also querying the compiler for the extra features enabled for foo_avx2, foo_avx, and foo_sse2, and checking at compile-time that those and the ones in the strings do match (and emitting a compiler error otherwise). A similar macro can even allow you to provide a single implementation of foo, and copy-pase it to foo_xxx versions, enabling different target features on those, and doing the function pointer initialization for you.

What makes you say this? ifunc is just that, check on startup, redirect the functions and don't do any checks on function call.

If that's what you need you don't need these features. Just compile with the features enabled and do the check on startup to bail out early.

In this case I was just commenting on your adding of unsafe to your RFC. The situation for which you are adding unsafe to #[target_feature] is already easy to trigger without any unsafe so there's no point to the unsafe.

My suggestion is simple. Don't enable a feature unless that specific feature on that specific target can be detected. That's 100% reliable.

ifunc doesn't have to have anything to do with the language. It's just the way to represent the feature branching in the compiled binary if your target supports that.

This is what @parched has created:

This binary works fine on all x86 platforms as long as the user doesn't input 1337 on the terminal in which case it only works on targets with AVX2:

use std::io;

#[target_feature = "+avx2"]
unsafe fn foo(); 

fn main() {
    let mut s = String::new();
    io::stdin().read_line(&mut s).unwrap();

    match s.trim_right().parse::<i32>() {
        Ok(i) => { 
          if i == 1337 { unsafe { foo(); } }
        },
        Err(_) => println!("Invalid number."),
    }
}

If you perform the check for AVX2 on binary initialization, the binary fails for all x86 that don't have AVX2. If you perform the check on foo(), you need to perform at least some check before each function call containing a target_feature annotation.

If that's what you need you don't need these features. Just compile with the features enabled and do the check on startup to bail out early.

How do you then tell the compiler that in some function it is allowed to use AVX2 instead of just AVX without using #[target_feature = "+avx2"]? The only solution in Rust that comes to mind would be putting the function in a different crate, using global features for that crate, and then linking it back.

The situation for which you are adding unsafe to #[target_feature] is already easy to trigger without any unsafe so there's no point to the unsafe.

Until ABI issues are resolved, memory corruption can also happens in targets with perfectly defined behavior for illegal instructions.

Anyhow, if it can violate memory safety, it must be unsafe. If you want it to be safe, please propose a solution that doesn't violate memory safety, has zero run-time cost, and allows working with all features that users might need to use.

My suggestion is simple. Don't enable a feature unless that specific feature on that specific target can be detected. That's 100% reliable.

What do you do then for features that cannot be detected? The only door you are leaving open for this is "use a different programming language". Or am I missing your suggestion for this?

Even if the feature can be detected, how do you do this run-time detection without incurring a run-time cost?

Just consider the example above, if you want to emulate things like ifunc as a library, you must be able to call #[target_feature] functions without overhead, but even if you precompute which function to dispatch to, the compiler cannot know that it is correct, so it must insert a second check.

ifunc doesn't have to have anything to do with the language.

ifunc is a different feature to #[target_feature]; #[target_feature] is "lower-level" and akin to GCC's and clang's __attribute__ ((__target__ ("feature"))); ifunc only works on linux and needs linker support, target_feature does not. A portable ifunc solution would need to be a library anyways. Whether this library is used by the compiler or the user really doesn't make a difference.

Also, #[target_feature] allows building libraries that make it trivially easy to use correctly and safely, and even emulate ifunc (and this emulation can get better if rustc ever gets native ifunc support). But this doesn't work both ways. If you sacrifice functionality to make target_feature safe, the only way to get that functionality back is via yet another language feature (e.g. unsafe_target_feature).

So you want to allow cases where the user knows the feature is available by some other means? Maybe by passing in a configuration file to the program? Seems like too much of a corner case to bother about.

I didn't consider this a use case people really care about. Do people really want to be able to tell the compiler "even though the feature is enabled I don't want you to use it"? My assumption is that the only real use case for specific features in parts of code is to build cross-platform binaries that degrade gracefully.

Sure, but the point still stands. You can create broken binaries today without unsafe that you would require unsafe to create even though they actually have less unsafe bits in them.

I don't think this is an actual problem. Any features people actually do want to use will be almost surely be detectable. But based on the discussion above here's an integrated proposal:

  • Compile time detection is done with target_feature like this:
if cfg!(target_feature = "some feature") {
  ... code with feature ...
} else {
  ... code without ...
}
  • Runtime detection is done the same way:
if cfg!(runtime_target_feature = "some feature") {
  ... code with feature ...
} else {
  ... code without ...
}
  • Unconditional feature enablement is also supported but is unsafe. This is your current target_feature and should be a corner case most people will never use.
#[unconditional_target_feature("some feature")]
fn myfunc() {
  ... code with feature ...
}

This would seem to address all the use cases discussed while giving the programmer a more consistent and safe interface for the most useful cases.

I've updated the RFC v2 with an example, but replace 1337 with the result of some system specific cpuid library, that instead of executing instructions on the CPU to query the cpuid flags, reads them from some cache. Unless we encode target features in the type system (e.g. by providing an enum in the standard library), there is no way that rustc can know what 1337 means.

I didn't consider this a use case people really care about. Do people really want to be able to tell the compiler "even though the feature is enabled I don't want you to use it"?

I want to compile a binary for AVX, and have some function dispatch to AVX2 when available at run-time. The question remains the same, how do you do this without #[target_feature].

My assumption is that the only real use case for specific features in parts of code is to build cross-platform binaries that degrade gracefully.

That's the whole point of #[target_feature]. The fact that one can emulate ifunc on top of it is just a nice side-effect.

But based on the discussion above here's an integrated proposal:

What you propose is almost exactly what the RFC proposes.

Runtime detection is done the same way:

Note that cfg! is a macro, so it returns a result at compile-time.

The way run-time detection should be done as of the RFC (which doesn't mention it, because that's the job of some other RFC), would be to use a cpuid crate that you can query for features, match on them, and then decide how to dispatch. Once you have such a crate for some common architectures (x86, ARM), you can have a parent crate that works for all of them. Then you can use procedural macros to cover the most common patterns, like generating multiple copies of functions for different target features, setting a global function pointer that can be safely called, or just dispatching based on target features where the cpuid check is done only the first time some function is called.

But all of this builds on platform specific cpuid libraries, macros 2.0, and #[target_feature] + cfg!(target_feature), e.g., the library should omit run-time checks for cases that are known to be true at run-time, and it can use cfg! for that. And there are many possibilities for how all these APIs could exactly work out, and if for whatever reason none of them fits your use case, you can still call the fundamental cpuid libraries directly, write your own, or use #[target_feature] unsafely directly.

In particular, if you want to target something weird like AVR and enable some features, then you are on your own. That doesn't mean that you cannot get it done, but rather that you will have to write some unsafe code and use #[target_feature] directly instead of through some nice procedural macro.

I think I'll write an overview on the RFC on how we expect users to use these things, maybe in the "How we teach this section", since this question might come often during the review.

There are two differences. Your #[target_feature] is my unsafe and almost never to be used #[unconditional_target_feature]. And I introduce a runtime_target_feature that uses the same syntax as the compile time one but runs at runtime.

Does it have to? It's a normal if block so I assume if you emit a condition instead of just setting it as true or false at runtime it will work fine.

I know that's what you're proposing, I just think it's a bad idea. It requires setting target_feature as unsafe because you're not doing checks and makes for an inconsistent platform where you provide the compile time checks but not the runtime ones when they're exactly the same piece of code.

I added the #[unconditional_target_feature] for that but it seems to me that's the least common case.

Yes, it has to, cfg! is a macro, you can use it for conditional compilation of a whole function, a module, etc. only because the result is available at compile time. You can use it in an if cfg!(...) { as a side-effect of it being expanded at compile-time to either true or false.

We could make it something like if std::cpuid::has_feature("avx2") { ... }. Would that be ok? (Note that just because that looks like a function doesn't mean we couldn't make it "magic" and give it extra meaning within the compiler).

I just think it's a bad idea. It requires setting target_feature as unsafe because you're not doing checks and makes for an inconsistent platform where you provide the compile time checks but not the runtime ones when they're exactly the same piece of code.

Wait, are you proposing that in

fn foo() {
  if cfg!(target_feature = "avx2") {
     bar()
  } else {
     bar()
}

in one branch bar is compiled with #[target_feature = "avx2"] and in the other branch it isn't? Or are you suggesting something more in the spirit that the following should generate a compilation error?:


#[unconditional_target_feature = "+avx2"]
fn bar_avx2();

fn foo() {
  if cfg!(target_feature = "avx2") {
     bar_avx2()  // compiles OK
  } else {
     bar_avx2() // error: bar_avx2 uses target features that are not available here
}

The first approach doesn't make much sense to me because the code within the if cfg! { branches is the same, you are just telling the compiler that it can use some features on code generation, so I guess you must be meaning the second, but I am not sure (this approach still requires you to use #[unconditonal_target_feature] a lot, but you imply that this is not what you mean).


I've updated the RFC v2 section "How do we teach this" with more information about the ecosystem that we expect to grow around this (some pieces are already in place).


@pedrocr I just pinged you in IRC in case you can chat over there. We might be able to make progress faster if you can show me what you mean (I think you already understood what I mean), and afterwards just summarize things here.

I just meant expanding cfg!(runtime_target_feature = "avx2") to std::cpuid::has_feature("avx2") and then everything works, no?

That would be quite strange indeed. What I mean is compile one branch with avx and the other without but function calls, unless they've been inlined, are just the call. So normal usage would be something like:

if cfg!(runtime_target_feature = "avx2") {
  ... some code here that uses avx2, throught the simd crate, intrinsics or just letting the compiler optimize ...
} else {
  ... some equivalent code that doesn't use avx2...
}

Then it would be easy to use that in a macro like @parched's to annotate a function so that the same exact code gets injected in both sides of the if if all you want is the same code compiler by llvm with and without avx2.

ifunc in glibc currently performs the check upon the first call to the function, but this complicates the dynamic linker and defeats a number of security hardening measures (e.g. read-only PLT). I would like to see it changed so all ifuncs are resolved during startup or dlopen, and it's on my long todo list to try to persuade the other glibc people of that.

So does this hypothetical binary, which I believe is what @pedrocr is proposing:

use std::io;
use std::cpu_features;
fn foo_avx2() { /* avx-dependent stuff */ }
fn foo_fallback() { unreachable!(); }
static mut foo: fn() -> () = foo_fallback;
fn main() {
    // in real life this line would actually be in lang_start,
    // and foo would appear immutable
    if cpu_features::detect_avx() { unsafe { foo = foo_avx2; } }
    let mut s = String::new();
    io::stdin().read_line(&mut s).unwrap();
    match s.trim_right().parse::<i32>() {
        Ok(i) => { if i == 1337 { foo(); } },
        Err(_) => println!("Invalid number."),
    }
}

I see where you're coming from with the desire to push runtime feature detection out of the standard library, but I think it's a mistake for essentially the same reason that I think ifunc resolution should happen immediately upon shlib load: you're making the overall runtime environment significantly more complicated and harder to reason about, for the sake of some corner cases.