Motivation
libcore
cannot use SIMD intrinsics when these are available at run-time. This is bad, because str
, [T]
, and Iterator
are provided in libcore, and some of their methods could be much faster (>10x faster) if they were to use SIMD intrinsics available at run-time.
Understanding the problems of different solutions
First: run-time feature detection requires operating system support. libcore
is operating-system independent. That is, we can’t just move all the run-time feature detection system to libcore, and do it all there, because without an operating system, there is no way to know what to do.
Second: we could hack our way out of this. We have extension traits for slices, so we could probably provide different methods for str
and [T]
, depending on whether libstd is linked. We already do something like this for f32
/f64
, which are types defined in libcore, but where libstd adds inherent methods to them. Doing this is painful, and while it would be possible, it would require a significant amount of work and hacks.
Third: Iterator
is a trait provided in libcore, where default implementation of the methods are provided. How could libstd provide different default implementations of the Iterator methods ? AFAICT this would require providing a different Iterator
trait in libstd, and “somehow” hack our ways into making them identical. So that a function bounded by the libcore::Iterator
becomes bounded by libstd::Iterator
if libstd is linked. Possible? Probably. Hacky? Definitely.
Proposal
We add yet another lang-item, in the same spirit as #[global_allocator]
, #[panic_handler]
, etc:
#[is_target_feature_detected]
fn is_target_feature_detected(x: &'static str) -> bool;
There can only be one definition of this item in the whole binary, otherwise, compilation should fail, #[global_allocator]
-#[panic_handler]
-style.
libstd
would provide such an item that would just call the std::detect
module:
#[is_target_feature_detected]
fn is_target_feature_detected(x: &'static str) -> bool {
std::detect::check_for(x)
}
If #![no_std]
binaries do not provide this item, it will be automatically polyfilled as:
#[is_target_feature_detected]
fn is_target_feature_detected(x: &'static str) -> bool {
false
}
That is, when this item is not provided, this API always returns false: “feature x is not available”. Note, however, that the feature detection macros will not call this API if the feature was enabled at compile-time, so if the feature is enabled at compile time, you’ll correctly get that the feature is also available at run-time (if it isn’t, undefined behavior was invoked since the moment the binary started to run).
We will then move the is_{target_arch}_feature_detected!
macros from the std::detect
module into libcore
, and implement them to call the is_target_feature_detected
lang item.
Alternatives
That is what felt like the best alternative to me, since the feature-detection run-time does not change at run-time.
An alternative could be to, e.g., use an AtomicPtr
in libcore that points to a fn(&'static str) -> bool
, that users can configure to make it point somewhere else. This would allow users to change the runtime during runtime as often as they want, which I don’t think makes sense. A problem with this approach is that one of the things one often wants to check, is whether the CPU supports atomics, and, e.g., use a Mutex if it does not. With an AtomicPtr
, one would need atomics to be able to check whether atomics are available. So, AFAICT, we would need to use a Mutex here. That might, however, require operating system support, which we don’t have in libcore.
I can’t think of any other alternatives, but please, this is the brainstorming phase, so maybe you do?
How do you use this
In general, users don’t need to do anything. They just call an Iterator
method, and if their CPU supports AVX2, the method might just use it internally.
Today, is_x86_feature_detected!
is only exported from libstd, so #[no_std]
libraries cannot use it. With this RFC, that would change, and #[no_std]
libraries are now able to use the run-time feature detection macros.
Users implementing #![no_std]
binaries don’t need to do anything either. The compiler polyfills a sound is_target_feature_detected
lang item, and they can just use the run-time detection macros, and if the features are enabled at compile-time, they will benefit from them.
These users can, however, add their own feature-detection run-time. If their app is user-space-like, they can probably just add std_detect
crate as a dependency, use its cargo features to tailor it to their application, and that’s it - they get quiet good run-time feature detection.
If their application is more os-kernel-like, then they can implement their own runtime, maybe submitting PRs to std_detect to allow it to support kernel-like platforms, or maybe we can maintain a second std_detect crate tailored for OS kernel that all kernel devs can reuse. This type of experimentation can happen on crates.io. This proposal just enables it.
Drawback
Yet another lang item. We could remove this drawback with extern existentials, but those have been postponed indefinitely.