It should in most ways act like a [T; n] array
Is that actually desirable from the point of view of making costs explicit in a systems language?
shufflevector
I thought exposing generic LLVM shuffling had been deemed out of scope for the initial release-channel SIMD feature, and initially only shuffles that map to specific instructions would be provided in a manner similar to C intrinsics even if underneath the API they generated shufflevector in the LLVM IR. Is this not so?
(Generic shufflevector is rather bad at making costs explicit. For example, LLVM uses shufflevector to view the higher or lower half of a NEON quadword register via its doubleword aliasing, but of course such viewing is only needed for type system purposes and there’s no instruction generated at all. OTOH, if you want a shuffle that the ISA doesn’t have a single instruction for, you get a non-obvious number of instructions.)
Anyway, I’d like to avoid slowdowns in getting SIMD to the release channel, so I’m worried about reopening the discussion regarding the types. Firefox is about to drop support for non-NEON ARM, so soon SIMD-in-release-channel-Rust will not only be SSE2-relevant for Firefox but NEON-relevant, too.