Slower code with "-C target-cpu=native"

Thanks! Turns out, the culprit is AVX: when compiled with -C target-cpu=native -C target-feature=-avx, programs perform as good as without any flags or with -C target-cpu=generic, and they also run slower when compiled with -C target-feature=+avx. I was quite surprised to learn this. So either AVX is slow on my CPU (unlikely) or, more likely, LLVM has decided to use AVX in places where it doesn't bring any benefits, like for unrolling small loops.