Some thoughts on target features

In the cpuid-bool v0.2 announcement I have presented several lose thoughts regarding language features, which I think could be worth to duplicate here:

  • It could be worth to revisit the "life before main" problem. Yes, various crates present "good enough" solutions, but it feels inherently wasteful that we have to check for values initialization everywhere instead of relying on guaranteed initialization provided by the "life before main" code. Plus as I heard it could map quite well to embedded applications. Granted, we should learn from mistakes of the past and properly restrict such feature (similarly to how Rust adds unsized rvalues instead of a general alloca).
  • Instead of the current two "target feature (TF) is enabled/disabled" states, we need 3: TF is enabled (allows to replace runtime checks of this TF at compile time with true), TF is disabled (allows to replace runtime checks with false), TF may be enabled or disabled (runtime checks stay in place). Using a traits analogy, those states are Trait, !Trait, and ?Trait respectively. Currently we have to add force-soft features to our crates for users who would like to remove branches dependent on CPU extensions. In other words, they can remove AES software fallbacks by enabling necessary target features (+aes) via RUSTFLAGS, but using -aes will not result in removal of the AES-NI implementation. Also having those 3 states will make it easier to test different backends.
  • We really need target feature runtime as soon as possible, without it we have to rely on platform-dependent solutions like this. With the M1 release ARM got even more attention and since it does not have CPUID-like instruction accessible from user-space, target feature runtime became even more important. And even on x86 not all targets have access to CPUID, e.g. we had to make an exception for SGX targets.
  • It could be worth to extend cpuid-bool with "CPUID enums", e.g. currently we have 3 backends for ChaCha: software, SSE2, and AVX2. Instead of having two "CPUID booleans" it would be more efficient to keep them together (we use AtomicU8 for caching, so we have space for 254 variants, one value is used for "uninitialized" state and another one for software fallback). Maybe eventually a similar macro could find its way to std? While the target feature runtime is a reasonable foundation (I think is_x86_feature_detected and similar macros should be deprecated in its favor), in my opinion user code should prefer target feature detection which caches boolean or enum, since it's a bit more efficient runtime-wise (space-wise it will use a bit more memory in total, but negligibly so) and probably cache-predictor friendlier.
  • While runtime checks on the crate level result in a good enough performance, ideally we still need to push runtime detection as high as possible. In the RustCrypto case we assemble high-level algorithms from several "primitive" crates, e.g. aes-gcm uses aes and ghash, both those dependencies have backends dependent on certain extensions (AES-NI and CLMUL respectively). Crate-level runtime switches hinder certain compiler optimizations (e.g. if CPU has both AES-NI and CLMUL extensions data can stay in SSE registers instead of spilling to the stack) and can add more branches than necessary (e.g. if CPU has AVX2 we can assume availability of SSE3). So ideally we need a way to compile several versions of the same crate with different target features and use them as different dependencies.
4 Likes