Slower code with "-C target-cpu=native"

And I will happily do even more measurements if you provide the code. Sadly, I'm not at all proficient at low-level stuff like that. But I am sure have more that enough free time, so we can benchmark the hell out of it, if you wish :slight_smile: IMHO at this point, it would make more sense to use assembly or intrinsics.

But now, having established that AVX on Piledriver is utterly useless, a practical question arises: how can I deceive programs into believing that my CPU doesn't have AVX? What I actually want is to make it impossible to both use and detect AVX on my CPU. I tried adding clearcpuid=avx to kernel command line, but gnome-clocks still crashes with "illegal instruction", and so do Cargo build scripts when compiled with -C target-cpu=native, which means they still detect AVX support somehow (probably in different ways).

AVX instructions can operate on both 128bit (xmm) and 256bit (ymm) registers. I think it's only the latter one that's slow on bdver2. So if llvm could be convinced to use AVX instructions but only use XMM registers then it should be fine (I don't know if this is true, I don't have a CPU to test it on).

The issue is that is that force-vector-width flag doesn't seem to work for the memcpy. You should file an llvm issue

and so do Cargo build scripts when compiled with -C target-cpu=native

That may be reading out the cpu family instead of the current feature flags and then enable features based on what the family would normally have. Try setting -Ctarget-cpu=native -Ctarget-feature=-avx

1 Like

Right, I was too broad.

To test that, we should use assembly. I say "we" while don't actually knowing anything about AVX :wink: But maybe this topic is the beginning of my new programming obsession, who knows.

I will happily run any benchmark you or anyone else provides. I don't feel confident writing it myself yet, given my lack of SIMD knowledge and endless possibilities to measure the wrong thing.

Will do.

Yeah, I've done it already and it works. Though without AVX, -C target-cpu=native doesn't affect performance at all, so I'll stick to avoiding any flags altogether from now on (except for testing purposes, of course). It's not like I really have to squeeze every last bit of performance from my CPU, this AVX business is basically a holy quest that I must complete just to satisfy my stubbornness and perfectionism.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.