So here goes v2 of the RFC, which hopefully includes all the feedback (let me know if it doesnāt). I went with the #[target_feature] unsafe fn variant due to the issues mentioned in the alternative section. We can always make it safe later.
Pre-RFC v2: target_feature
Note, both cfg!(target_feature) and #[target_feature] are already available on nightly, but no RFC was ever submitted for them. This RFC attempts to lay down their behavior so that they can be stabilized in the near future.
Summary
Platforms like x86_64 and ARM provide a ādefaultā set of operation that each CPU supports. Extra operations like AVX vector instructions are only available within a subset of the x86_64 CPUs. This RFC proposes extending Rust to allow conditionally generating and executing code for a particular architecture depending on which āfeaturesā the target supports:
-
Language constructs:
- conditional compilation:
cfg!(target_feature = "feature_name"): the result of thecfg!macro returnstrueif the target CPU supports the target featurefeature_name(e.g., cfg!(target_feature = "sse4") returnstrue` if the target supports the SSE4 instruction set)
- code generation:
#[target_feature = "+sse2"] unsafe fn foo(...): the function attribute #[target_feature] instructs the compiler to use SSE2 instructions when generating code for the function foo
-
Backend options:
-
rustc -C --target-feature=+sse2: instructs the compiler to use SSE2 instructions when generating code for a crate
-
rustc -C --target-cpu=native: instructs the compiler to use all available features of the host target when generating code for a crate
Detailed design
The default set of target features enabled for a particular architecture is implementation defined. This RFC proposes adding the following two constructs to the language:
- the
cfg!(target_feature = "feature_name") macro, and
- the
#[target_feature = "+feature_name") function attribute for unsafe functions.
Only stabilized feature_names can be used with these constructs. This RFC porposes that each feature name is gated behind its own feature macro: target_feature_feature_name (that is, to use a target feature on nightly crates must write #![allow(target_feature_avx2)]).
1. cfg!(target_feature)
The cfg!(target_feature = "feature_name") macro allows querying specific hardware features of the target at compile-time. This information can be used for conditional compilation or conditional code execution:
// Conditional compilation:
#[cfg!(target_feature = "bmi2")] {
// if target has the BMI2 instruction set, use a BMI2 instruction:
unsafe { intrinsic::bmi2::bzhi(x, bit_position) }
}
#[not(cfg!(target_feature = "bmi2"))] {
// otherwise call an algorithm that emulates the instruction:
software_fallback::bzhi(x, bit_position)
}
// Conditional code execution:
if cfg!(target_feature = "bmi2") {
// if target has the BMI2 instruction set, use a BMI2 instruction:
unsafe { intrinsic::bmi2::bzhi(x, bit_position) }
} else {
// otherwise call an algorithm that emulates the instruction:
software_fallback::bzhi(x, bit_position)
}
The macro cfg!(target_feature = "feature_name") returns:
-
true if the feature is enabled,
-
false if the feature is disabled.
2. #[target_feature]
The unsafe function attribute #[target_feature = "+feature_name"] extends the feature set of a function, that is, the implementation is allowed to use more target features when generating code for a function. Removing features from the default_feature_set using #[target_feature = "-feature_name"] is not allowed. Calling a function on a platform that does not support the feature set of the function is undefined behavior.
#[target_feature = "+avx"]
unsafe fn foo_avx(...) { ... }
#[target_feature = "+sse4"]
unsafe fn foo_sse4(...) { ... }
// Check run-time features on initialization
// and set up safe to call fn ptr:
let global_foo_ptr = initialize_foo();
fn initialize_foo() -> fn (...) -> ... {
let info = cpuid::identify().unwrap();
match info {
info.has_feature(cpuid::CpuFeature::AVX) => unsafe { foo_avx },
info.has_feature(cpuid::CpuFeature::SSE4) => unsafe { foo_sse4 },
_ => unreachable!()
}
}
// Dispatch on function call:
fn foo() -> ... {
let info = cpuid::identify().unwrap();
match info {
info.has_feature(cpuid::CpuFeature::AVX) => unsafe { foo_avx(...) },
info.has_feature(cpuid::CpuFeature::SSE4) => unsafe { foo_sse4(...) },
_ => unreachable!()
}
}
ABI mismatches with #[target_feature]
Functions with different #[target_feature]s might have a different ABI for, e.g., passing and returning function arguments (e.g. depending on the feature the registers for this might differ).
The implementation must ensure that this cannot invoke undefined behavior. That is, the implementation must detect this at compile-time, and translate arguments from one ABI to another. Since this hasnāt been implemented yet, it is unclear how to do this, or if it can be done for all cases. Hence it is mentioned as an unresolved question below.
3. Backend compilation options
The implementation must provide the following two ways of talking with its code generation backend:
-
-C --target-feature=+/-backend_target_feature_name: where +/- add/remove features from the default feature set of the platform for the whole crate. The behavior for non-stabilized features is implementation defined. The behavior for stabilized features is:
- to implicitly mark all functions in the crate with
#[target_feature = "+/-feature_name"] (where - is still a hard error)
-
cfg!(target_feature = "feature_name") returns true if the feature is enabled. If the backend does not support the feature, the feature might be disabled even if the user explicitly enabled it, in which case cfg! returns false. A soft diagnostic is encouraged.
-
-C --target-cpu=backend_cpu_name, which changens the default feature set of the crate to be that of backend_cpu_name.
These two options are already ādefacto-stabilizedā, that is, they are work on stable Rust today. This just specifies its semantics with respect to stable target features. When stabilizing a feature a different name or a deprecation period (with a warning) should be used to avoid breaking backwards compatibility.
Drawbacks
The classic reason is that extra features increase language complexity.
If we donāt solve the problem that these features do solve, lots of libraries cannot be implemented efficiently out of rustc (e.g., SIMD, bitwise manipulation instructions, AES, ā¦). These libraries are still implementable outside of rustc by directly using asm!, but this is less efficient than using LLVM directly or indirectly to generate the intrinsics because while LLVM can reason about its intrinsics to perform optimizations, it cannot reason about asm! (e.g. if asm! is used to perform SIMD, LLVM wonāt fuse vector operations when possible).
A reason not to do this as proposed in this RFC is that we donāt know how to do this safely. OTOH, the motivation to investigate safe alternatives is low if it cannot be done anyways.
Alternative designs / Unresolved questions
Make #[target_feature] safe
To make #[target_feature] safe we would need to ensure that calling functions on a targets that do not support their feature set cannot result on memory unsafety.
There are two main problems:
- platforms like
AVR invoke undefined behavior in hardware when an illegal instruction is encountered (as opposed to, e.g., x86, which throws a SIGILL exception, crashing the process, and preventing memory unsafety).
- not all platforms support run-time feature detection: x86 has pretty good run-time feature detection, other platforms like PowerPC allow querying the cpu model, but the user must know which features the model implies, other platforms do not allow querying anything at all.
This RFC conservatively makes #[target_feature] āunsafeā, that is, only functions that are unsafe to call can use this attribute. If we find a solution that guarantees a lack of memory unsafety on all platforms, ideally without introducing a big run-time cost or significantly increasing binary sizes, we can always make it safe later.
An implementation might, on a best-effort basis, provide run-time instrumentation to catch calls to functions on platforms that do not support their target feature set (this is allowed by undefined behavior).
To see why this can introduce a run-time cost, consider the following code:
use std::io;
#[target_feature = "+avx2"]
unsafe fn foo();
fn main() {
let mut s = String::new();
io::stdin().read_line(&mut s).unwrap();
match s.trim_right().parse::<i32>() {
Ok(i) => {
if i == 1337 { unsafe { foo(); } }
},
Err(_) => println!("Invalid number."),
}
}
This code works fine on all x86 CPUs, unless the user passes 1337 as input, in which case it only works for AVX2 CPUs. The implementation can check whether the CPU supports AVX2 on initialization, but even if it doesnāt, the program is correct unless the user inputs 1337, so the fatal check must be done when calling any functions using #[target_feature]. Now replace 1337 from the result of some system cpuid library, there is no way the compiler can know that a result value from a third-party library correspond to a target feature unless we decide to encode these in the type system (which is an option not proposed here).
Never segfault on #[target_feature]
This is an extension of making #[target_feature] safe. Even if we can guarantee lack of memory unsafety, users will still be getting potentially hard-to-debug crashes. It would be better if we could guarantee that they always get a panic! instead. For this we need reliable run-time feature detection, which is not possible on all cases.
Feature-dependent unsafety
Whether #[target_feature = "feature_name"] requires unsafe fn or not could depend on feature_name. For example, this would allow making all uses of #[target_feature] safe on x86 / ARM / PowerPC / MIPS /⦠since executing an illegal instruction is guaranteed to crash the application and hence cannot introduce memory unsafety.
Some other features like d16, which reduces the number of FP registers so the fallback path can use more of them, are inherently safe.
This extension can be pursued in a backwards compatible way.
Allow removing features from the default feature set in #[target_feature]
Allowing #[target_feature = "-feature_name"] can result in non-sensical behavior. The backend might always assume that at least the default feature set for the current platform is available. In these cases it might be better to globally choose a different default using -C --target-cpu / -C --target-feature.
Make --target-feature/--target-cpu unsafe
Rename them to --unsafe-target-feature/--unsafe-target-cpu. The rationale would be that a binary for an architecture compiled with the wrong flags would still be able to run in that architecture, but because using illegal instructions on some architectures might lead to memory unsafety, then these operations are unsafe.
Path towards stabilization
- Enable a warning on nightly on usage of target features without the corresponding
feature(target_feature_xxx) feature flag.
- After a transition period (e.g. 1 release cycle) make this a hard error on nightly.
- After all unresolved questions are resolved, stabilize
cfg! and #[target_feature]
- ā¦N: stabilize those
target_feature_xxxs that we want to have on stable following either a mini-RFC in the rust-lang issues for these features or the normal RFC process.
How do we teach this?
First, these features are low-level features. We donāt expect many users to use them directly. We expect that for the most commonly used platforms, an ecosystem of cpuid crates will emerge, that will allow querying features at run-time. Some of these crates already exist, like the cpuid crate for doing this on x86. In C/C++ similar crates exist for ARM and other platforms. These libraries would also use cfg!(target_feature = "...") internally to avoid run-time checks when the binary is compiled with --target-feature/--target-cpu and the results of cpuid are known at compile-time.
Then we expect that some higher-level libraries will emerge, that will make it easy to safely generate and dispatch on different implementations of some function using different target features. For example, given:
fn foo() -> ... {}
let foo_ptr = ifunc!(foo, ["sse4", "avx", "avx2"]);
where ifunc! is a procedural macro that creates foo_sse4/avx/avx2 implementations annotated with the corresponding #[target_feature = "+xxx"]s, and on binary initialization at run-time uses one of the cpuid crates to query the cpu for target features, and sets the function pointer foo_ptr to the best implementation. Performing the check once on initialization means that calling the function pointer foo_ptr is safe and will not result in any SIGILL, for example. These libraries could also use cfg!(target_feature = "...") internally to avoid run-time checks when the binary is compiled with --target-feature/--target-cpu and the results of cpuid are known at compile-time., or rely on their cpuid libraries doing these optimizations.
Similar procedural macros could emerge for performing dispatch inline, where the cpuid flags are tested the first time a function is called. Some of these crates already exist, like the runtime-target-feature-rs crate.
So given these expectations, we would need to teach both the low-level way in which cfg! and target_feature work, and also the easier / safer ways that exists for the most common cases.
For the low-level documentation we need:
- a section in the book listing the stabilized target features, and including an example application that uses both
cfg! and #[target_feature] to explain how they work, and
- extend the section on
cfg! with target_feature
For the higher-level documentation it would be great if we could get a couple of fundamental libraries (cpuid for x86, ARM, ⦠and some procedural macros for the common cases) close to their 1.0 releases before these features are stabilized.
Unresolved questions
Before stabilization the following issues must be resolved:
- Is it possible to make
#[target_feature] both safe and zero-cost?
On the most widely used platforms the answer is yes, this is possible. On other platforms like AVR the answer is probably not, but we are not sure.
- Is it possible to provide good error messages on misues of
#[target_feature] at zero-cost ?
The answer is probably not: just because a binary includes a function does not mean that the function will be called. Hence the only way to check whether an invalid call happens is to add a check before each function call which incurs a potentially significant cost (e.g. in the form of missed optimizations).
- Is it possible to automatically detect and correct ABI mismatches between target features?
Acknowledgements
@parched @burntsushi @alexcrichton @est31 @pedrocr