We could presumably do this through a deny-by-default lint. For dependencies the lint would be turned off by default but for your crate you’d have to deal with it. That’s probably the best story we have though for adding this in a backwards compatible fashion.
Hm sorry I find it terribly hard to parse all the weird intel names, so I’m not following 100%. So the C headers have three types, one for vectors with f32 elements, one for f64 elements, and one for integer elements of any size? The intrinsics then say which width they take and you can pass in any matching type with the same width?
If the C intrinsics and/or headers are so duck typed this would indeed pose a problem. We may have to develop some form of naming convention to differentiate if we want to do so.
Certainly! So one thing is I would prefer to never have errors come up during monomorphization. That is, if a crate compiles successfully, then 100% of downstream consumers will also compile successfully with it (no matter how they use it). In that sense I personally at least prefer to avoid adding codegen errors wherever possible.
Additionally, this feature doesn’t really enable any more use cases. It’s primarily just a lint for newbies (like me) to make sure things weren’t messed up. That being said that’s also true of the entire simd module I’m thinking of. It’s just a bunch of intrinsics which have terrible names. The module is very much an “expert mode” style. Downstream usage in a more high-level or typesafe fashion is where I think this abstraction would happen.
Given that I can’t personally at least see any plausible route to having this sort of static enforcement in a reasonable time frame that could be stabilized. Stabilization I think is important because if this feature existed it would want to be used by the wrapper crates on crates.io.
I don’t mind if a feature like this is perhaps discussed in parallel, though! It’d certainly be a nice-to-have.
Inline assembly would indeed be nice to have! Unfortunately the stability story for it is much harder than SIMD seems like it might be (even though SIMD itself is not easy). In that sense I think it makes sense to pursue the intrinsic route rather than the inline assembly route as it’s a faster (and perhaps later more ergonomic method) of getting access to SIMD.