Motivation
libcore cannot use SIMD intrinsics when these are available at run-time. This is bad, because str, [T], and Iterator are provided in libcore, and some of their methods could be much faster (>10x faster) if they were to use SIMD intrinsics available at run-time.
Understanding the problems of different solutions
First: run-time feature detection requires operating system support. libcore is operating-system independent. That is, we can’t just move all the run-time feature detection system to libcore, and do it all there, because without an operating system, there is no way to know what to do.
Second: we could hack our way out of this. We have extension traits for slices, so we could probably provide different methods for str and [T], depending on whether libstd is linked. We already do something like this for f32/f64, which are types defined in libcore, but where libstd adds inherent methods to them. Doing this is painful, and while it would be possible, it would require a significant amount of work and hacks.
Third: Iterator is a trait provided in libcore, where default implementation of the methods are provided. How could libstd provide different default implementations of the Iterator methods ? AFAICT this would require providing a different Iterator trait in libstd, and “somehow” hack our ways into making them identical. So that a function bounded by the libcore::Iterator becomes bounded by libstd::Iterator if libstd is linked. Possible? Probably. Hacky? Definitely.
Proposal
We add yet another lang-item, in the same spirit as #[global_allocator], #[panic_handler], etc:
#[is_target_feature_detected]
fn is_target_feature_detected(x: &'static str) -> bool;
There can only be one definition of this item in the whole binary, otherwise, compilation should fail, #[global_allocator]-#[panic_handler]-style.
libstd would provide such an item that would just call the std::detect module:
#[is_target_feature_detected]
fn is_target_feature_detected(x: &'static str) -> bool {
std::detect::check_for(x)
}
If #![no_std] binaries do not provide this item, it will be automatically polyfilled as:
#[is_target_feature_detected]
fn is_target_feature_detected(x: &'static str) -> bool {
false
}
That is, when this item is not provided, this API always returns false: “feature x is not available”. Note, however, that the feature detection macros will not call this API if the feature was enabled at compile-time, so if the feature is enabled at compile time, you’ll correctly get that the feature is also available at run-time (if it isn’t, undefined behavior was invoked since the moment the binary started to run).
We will then move the is_{target_arch}_feature_detected! macros from the std::detect module into libcore, and implement them to call the is_target_feature_detected lang item.
Alternatives
That is what felt like the best alternative to me, since the feature-detection run-time does not change at run-time.
An alternative could be to, e.g., use an AtomicPtr in libcore that points to a fn(&'static str) -> bool, that users can configure to make it point somewhere else. This would allow users to change the runtime during runtime as often as they want, which I don’t think makes sense. A problem with this approach is that one of the things one often wants to check, is whether the CPU supports atomics, and, e.g., use a Mutex if it does not. With an AtomicPtr, one would need atomics to be able to check whether atomics are available. So, AFAICT, we would need to use a Mutex here. That might, however, require operating system support, which we don’t have in libcore.
I can’t think of any other alternatives, but please, this is the brainstorming phase, so maybe you do?
How do you use this
In general, users don’t need to do anything. They just call an Iterator method, and if their CPU supports AVX2, the method might just use it internally.
Today, is_x86_feature_detected! is only exported from libstd, so #[no_std] libraries cannot use it. With this RFC, that would change, and #[no_std] libraries are now able to use the run-time feature detection macros.
Users implementing #![no_std] binaries don’t need to do anything either. The compiler polyfills a sound is_target_feature_detected lang item, and they can just use the run-time detection macros, and if the features are enabled at compile-time, they will benefit from them.
These users can, however, add their own feature-detection run-time. If their app is user-space-like, they can probably just add std_detect crate as a dependency, use its cargo features to tailor it to their application, and that’s it - they get quiet good run-time feature detection.
If their application is more os-kernel-like, then they can implement their own runtime, maybe submitting PRs to std_detect to allow it to support kernel-like platforms, or maybe we can maintain a second std_detect crate tailored for OS kernel that all kernel devs can reuse. This type of experimentation can happen on crates.io. This proposal just enables it.
Drawback
Yet another lang item. We could remove this drawback with extern existentials, but those have been postponed indefinitely.