The stripmine loop is the part that chunks up your input (arbitrary-length) vector into your architectural (finite-length) vectors, and loads it into the appropriate registers.
The interface growing without bound on some axes (functionality) is unavoidable, but it growing along the vector size axis (at least in the āiterated instructionsā category, and possibly āpermute/combineā as well) is eminently preventable, and has major downsides.
Another preventable axis is āargument length/typeā - RISC-Vās V extension (and I think also ARM SVE) has a manner of addressing this which has no mapping to argument-size being specified by the instruction.
Also, if you read none of the other things I linked, read the slides - they motivate my arguments concisely and thoroughly.
Iāll try.
Also, Iād argue that these concerns are very important to solve before stabilization, or else we will need to introduce a second API which massively overlaps this one (and stabilize it) in order to support certain hardware at all because of assumptions made in the current proposals.
This is not a value judgement; āPacked SIMDā vs. āVector Processorā are terms of art.
The former refers to the general approach taken by NEON, SSE, AVX, etc - that of architecturally-fixed-length vector-registers, with a new instruction set for each length.
The latter refers to Cray-style vector instruction sets, which effectively perform hardware-accelerated iteration using a wide, pipelined engine, applied to a memory vector of arbitrary length. Both ARM SVE and RISC-Vās V extension are members of this family.