sorry, i'm not sure i fully understand all your questions. so let me know if there's something i forgot to address.
you're right, i meant Copy instead of Clone. i'm not sure if that specific ABI is desirable, and such discussions will have to be maybe brought up with the compiler team. the idea i had in mind for how this could be used safely is to have a utility similar to std::mem::transmute, (which in this case is the as cast) that checks at compile time that the source and target function pointers have the same ABI, and that they differ only by ZST+Copy parameters.
the cast can be misused, for example by casting a function pointer that takes a token to a function pointer that doesn't take that token, and then calling it without making sure that we already have an instance of that token, as this will have the effect of creating a token out of thin air without first checking that it's possible. that's the part that makes it unsafe.
as long as we have at least one instance of every token that is removed, then the cast is safe. which is what makes my_fn_ptr in this example sound
fn my_fn<S: Copy>(simd: S, v: &mut [f64]) {}
fn my_fn_ptr<S: Copy>(simd: S) -> fn(&mut [f64]) {
assert_eq!(core::mem::size_of::<S>(), 0);
// SAFETY:
// S is ZST + Copy, and we have an instance of S, so the cast is sound
unsafe { my_fn::<S> as _ }
}
fn my_fn_ptr<S: 'static + Copy>(simd: S) -> fn(&mut [f64]) {
assert_eq!(core::mem::size_of::<S>(), 0);
|v| {
// SAFETY: this is logically a copy of the token
// provided to my_fn_ptr
let s: S = unsafe { core::mem::zeroed() };
my_fn(s, v)
}
}
this technically adds the cost of a function call indirection. since often my_fn can't be inlined into the closure due to it having a #[target_feature] attribute in this kind of scenario.
If the ABIs are the same, then the lack of inlining is "just" a quality of implementation issue, and ideally we should be able to teach backends to allow inlining for unconditional calls to different target feature sets.
So with most mechanisms in user space, the one thing still needed is a better bound and the compiler deducing the feature set from that generic parameter during monomorphisation? I'm pretty sure this could be experimented with in a crate to validate the API (and its benefits) against multiversion.
being able to pass along S to get multiversion's dispatch! implicitly, aiding inlining (which passing through the trampolines in the sketch will unfortunately inhibit); and
only needing to monomorphize the multiversions actually used, implicitly collecting those in monomorphization.
That's all true. Given that we now had already seen that most user-visible and observable features can be written standard constructs instead of language features, I was wondering if there's value in validating the interface first. And if validating the interface then we can't (and don't need to) worry about the internal gains yet. That should be implementable as a crate even with the unfortunate trampoline overhead.
I'm also not generally sold on inlining as a major required feature, yet. A good deal of SIMDified code will do large chunks of work and shouldn't suffer too much from some constant overhead in the call paths. Some other code might be inlined anyways due to the match path actually being a constexpr choice post-monomorphization. (Hm, hasn't rustc begun to perform some code flow analysis for such situations already?. Seems limited to non-generic-const for now.) Still unfortunate strain on the compiler though.