The stripmine loop is the part that chunks up your input (arbitrary-length) vector into your architectural (finite-length) vectors, and loads it into the appropriate registers.
The interface growing without bound on some axes (functionality) is unavoidable, but it growing along the vector size axis (at least in the "iterated instructions" category, and possibly "permute/combine" as well) is eminently preventable, and has major downsides.
Another preventable axis is "argument length/type" - RISC-V's V extension (and I think also ARM SVE) has a manner of addressing this which has no mapping to argument-size being specified by the instruction.
Also, if you read none of the other things I linked, read the slides - they motivate my arguments concisely and thoroughly.
Also, I'd argue that these concerns are very important to solve before stabilization, or else we will need to introduce a second API which massively overlaps this one (and stabilize it) in order to support certain hardware at all because of assumptions made in the current proposals.
This is not a value judgement; "Packed SIMD" vs. "Vector Processor" are terms of art.
The former refers to the general approach taken by NEON, SSE, AVX, etc - that of architecturally-fixed-length vector-registers, with a new instruction set for each length.
The latter refers to Cray-style vector instruction sets, which effectively perform hardware-accelerated iteration using a wide, pipelined engine, applied to a memory vector of arbitrary length. Both ARM SVE and RISC-V's V extension are members of this family.