Good work, I think this is a first step I can agree with.
Is there a fundamental reason that you omitted 512bit wide vector types in the proposal?
Is there a real demand for AVX512 today? It is available in shipping hardware beyond Xeon PHI?
Yes, there is demand, hence why Clang and GCC already support AVX-512. There are a bunch of 1st gen Xeon Phi (Knights Corner) systems available, the Top 500th list currently contains >10 systems with 2nd gen Xeon Phi's (both Knights Landing and Knights Landing F), and 3rd gen Xeon Phi's systems (Knights Hill, like Argonne's Auroa) will be rolled into production in 2018, with tuning workshops using prototype boards starting as early as this year (2016) already.
These systems perform really bad if AVX-512 is not used, and the alternative ways of using AVX-512 (like OpenMP 4) are not available in Rust. So I think it is important to roll AVX-512 low-level intrinsics from the start.
It is also a good stress test for the RFC, since the "Intel weirdness" with the vector types only gets worse, and AVX-512 also adds AVX-512F, AVX-512CD, AVX-512ER, AVX-512PF, ...
Fused multiply-add (mul_add) is available on many platforms, but not universally, and it is a very expensive operation to emulate. f64x2::mul_add() requires a 104-bit wide multiplication which you really, really want implemented in hardware.
Saturating arithmetic is universally available for 8-bit and 16-bit lanes, but not for i32x4. It turns out that nobody actually needs saturating arithmetic larger than 16 bits. I left it out to avoid the asymmetry.
Leaving things out, avoiding asymmetries, providing a nice interface ... is the job of a high-level SIMD interface.
The whole point of a low-level SIMD interface is to provide a direct map to the hardware, to allow the user to do anything that the hardware can do. That is, by definition, to avoid leaving anything out. Hardware instruction sets are complicated, inconsistent, asymmetric, incompatible, and weird... We can try to make the low-level intrinsics as safe and nice to use as possible, but we should not try to make them a high-level SIMD API. Implementing traits like
Add is for me on the boundary with a high level API, and I don't know how I feel about that. I would be fine with a low-level SIMD API that only provides intrinsics functions, without niceties like