I think that you should focus on the underlying features here, rather than SIMD. There may well be other processor-specific instructions sets we want to handle in the same way. In particular, a general purpose intrinsics mechanism, something like target-feature, and something to handle the dynamic CPUID stuff.
I think that beyond the core of SIMD (roughly the SIMD.js subset), there are a huge number of instructions which can’t really have abstract support, we need a mechanism for these to be available to programmers similarly to how we expose platform-specific features which are OS-dependent. A better intrinsic story might be a solution here (together with scenarios or target-feature + CPUID stuff).
Some random thoughts:
- inline assembly or auto-vectorisation is not enough - you need to be able to mark data as SIMD data or you inevitably lose performance marshalling/un-marshalling. This can be seen today - LLVM does some auto-vectorisation, but if you manually use the simd crate you can get significant perf improvements.
- I think the core SIMD stuff (as in Huon’s SIMD crate and SIMD.js) should be on the path to inclusion in std - it needs work, but it is fairly general purpose and should be in the nursery and have a roadmap to at least official ownership, if not quite a physical place in std. The non-core stuff is too platform-specific to be anywhere near std, IMO.
- Given that SIMD is defined by the processors, I think it is fine to have ‘Rust intrinsics’ for them and to assume that any backend will support them (either as intrinsics or assembly or whatever). In fact what Rust offers might end up being a superset of what llvm offers.
- I think that SIMD ‘intrinsics’ don’t really have to be intrinsics at all - they could be just extern functions with a single asm instruction.