[Pre-RFC] Make some feature-detected function-to-fn-pointer casts safe through ZST token types

Summary

The cast of a target-feature enabled function into a safe, callable rust-ABI function necessarily requires an intermediate unsafe cast step. Make this step safe in scopes where a token-value certifies that the appropriate CPU-feature is available in the current program. This is an alternative or augmentation for struct_target_feature

Motivation

In image related crates we do a lot of SIMD implementations of core routines to guarantee a level of performance on matching targets. This happens via runtime dispatch: we query the feature set and from that point onwards an appropriate function should be used. Crucially this does not happen at call sites of the function. Those are in tight enough loops that the overhead would be problematic to redo featur detection. Instead, a custom fn-pointer table is built at startup and then an indirect call into function references is performed.

While building the table it is necessary to make functions with required target feature sets (tagged with #[target_feature(enable = …)]) compatible with other functions. We do not use a trampoline (performance) but instead constrain ourselves that the signature here does not mention any SIMD specific types and so is ABI compatible. This allows function pointers of different feature sets to be stored into the same fn-pointer table field. The only requirement this imposes on us is that each function is scoped such that there is no expensive spill of any SIMD value from another block to pass them through a more primitive ABI.

With safe arch intrinsics as well as the use of helper crates constraining unsafe intrinsics to a safe interface, some of these can be entirely safe implementations. However, the function cast is very hard to properly abstract away. There is no way to query and represent the feature-enabled set from a type or to represent the gained information in a const way that would allow verifying and discharging the obligations at the conversion site. With macros it is possible to wrap those or create a conversion-utility at the definition site of such functions but this still suffers defects in ABI-compatibility detection and does not allow more advanced selection patterns (i.e. choosing the implementation among the feature-available function set based on runtime performance information)

Guide-level explanation

The instruction-set-specific modules in core::arch define a set of zero-sized types. Each one is a token verifying the availability of a runtime feature of the currently running program. These have a non-const fn constructor that returns an Option<Self>, returning Some(_) if the feature was enabled for compilation or if it can be detected at runtime as is_X_feature_detected`.

In scopes where a local of the zero-sized type is in scope and it is surely initialized, the cast of a function item into a function pointer is safe for a subset of simple function signatures:

#[target_feature(enable = "sse4.1")]
fn crazy() {
    …
}

fn boring_scalar_implementation() {
    …
}

fn choose_implementation(has_sse: Option<core::arch::HaveSse41>) {
    if let Some(_value) = has_sse {
        // Safe!
        crazy as fn()
    } else {
        boring_scalar_implementation as fn()
    }
}

Allowed signatures are checked based on the target fn-pointer, which is required to be concrete enough, which is then already matched against the function ZST-type. The target function pointer is required to use the Rust ABI. Allowed arguments and return type are (a subset of those documented in fn's ABI section):

  • Each primitive type and reference with itself and any super-type of itself.
  • References and Box<T> are compatible with NonNull and raw pointers of the same metadata but not the other way around.
  • NonZero* in an Option matched with the relevant simple type and the converse.

Reference-level explanation

When a cast from a method into a function pointer is attempted, query the local typing context before determining if a unsafe block is required or not. When a local (i.e. including function parameter) in the scope is unified into the corresponding language marker type then all dominated basic blocks are augmented with additional information in their typing context. Different features of the same architecture unify.

The important difference in ABI compatibility is that we assert the soundness for values with the compatibility instead of only the ABI compatibility. So we have additionally ensured that all passed argument values are also correct for the parameter (validity and safety invariants). So zero-sized-align-1 types are not compatible with another (other than via subtyping relationship).

Drawbacks

Code verification passes become more complicated and this check necessarily moves after type unification. Some behavior may be a surprising to users if the passes notion of dominated basic blocks or unification behavior disagrees with the surface level Rust code. We can not express this behavior behind a generic type despite the mechanism using the type system.

Rationale and alternatives

The inability for generics has precedent. In a union, all fields are required to statically denote their lack of Drop. For concrete types this can be determined but for generics there are restrictive rules: ManuallyDrop specifically can be used as the top-level type wrapper for fields that would otherwise not be provable.

Alternative, do not do this: function pointer casts remain unsafe, SIMD libraries require a little bit of unsafe and can not annotate themselves with forbid(unsafe_code).

Or, only do this on call sites. To avoid the biggest performance problems, provide a dynamic representation of the CPU feature set which represents the valid feature set and switch on this. To avoid all unsafety we still need some amount of language integration: this type must dispatch into a set of given functions based on its value. There is no type information on fn-ptr to do this so we also still need concrete ZSTs and magic to query it despite lack of trait bound.

Prior art

Despite the union part:

struct target feature, The RFC is #3525 here, contains the zst types but not a builtin dispatch mechanism. Instead the function signatures are explicitly incompatible due to differing struct / zst parameters.

Instead of overloading the existing function pointer cast, we could have a macro which only allows the safe pointer casts. This macro would consume the token and probably be implemented as a compiler internal. (A method on the type would require magic bounds and instantiations, I'm not sure that would be a good idea).

Unresolved questions

Which ZST marker types to introduce, where should they live.

We could also write the types as FeatureAssertion<T> for various instantiations of T. This might simplify query and be proof against future directions. Looking at the history and internals of NonZero this may not be crucial to decide right away.

Should there be more constructors on the ZST marker types.

Should there be an unsafe constructor with documented invariants? Currently, there are a number of alternative cpu-feature-detection implementations in the crates.io ecosystem. It seems obvious to expect that these would otherwise transmute the ZST valids out of thin air anyways so as to match the standard library integration. This is just as soundness critical but not documented from std's side.

Future possibilities

When we gain the ability to represent target features on function pointers, not only zero-sized function types, we should extend the cast ability to these. Also we might be able to make the token types work under generic code if there were function-pointer metadata that matches some trait-bound on a feature detection ZST.

A possible alternative: give the ZST tokens themselves a method that casts from function item to function pointer. I suspect this alternative doesn't help improve the number of special cases compared to the original proposal (because you need some way to be generic over function item types that require a specific set of target features to be used safely, which is the sort of thing that is hard to represent using just traits), but it would be less magical than "this value has to be in scope" and thus might be easier to understand upon seeing it in code.

The least magical version would be "functions that require target features must be given the ZST values that represent those target features as arguments", which would be simple both in terms of using it and in understanding how it works, but unfortunately I don't think it would be backwards-compatible (and it also wouldn't help with the problem of wanting to store pointers to functions with different target feature requirements in a variable of a single concrete type, which should in theory be sound if all the features in question are actually available at runtime and the calling convention doesn't change as a consequence).