Is pentium4 still an appropriate base CPU for i686?


#1

I noticed that the pentium4 (and pentium-m) target CPUs causes LLVM 6.0 to use different instruction scheduling that e.g. when targeting 32-bit code to the generic x86-64 CPU.

I’d expect the primary use case of the i686 targets to be less about supporting Pentium 4-era hardware and more about supporting users who have a newer 64-bit CPU but who are stuck with a 32-bit OS for whatever reason (were told 64-bit has no benefits unless the computer has RAM above a certain threshold, were told Flash didn’t work on Linux and have been updating the system in place ever since, installed Windows when 32-bit Windows was the norm, have Chrome OS pull 32-bit Android binaries from the Play Store).

The changeset that the pentium4 default comes from said that clang has the same default. Is that still true as of clang 6.0?

Should the default be adjusted so that it still doesn’t go beyond SSE2 in terms of what instruction set extensions are OK to generate but so that it causes instruction scheduling to be optimized for newer CPUs?

(I’ve been told that -mtune in clang is a no-op, so I don’t know what the appropriate way is to request newer instruction scheduling from LLVM without having to manually turn off a bunch of instruction set extensions. I’m not sure what instruction set extensions the x86-64 CPU implies, precisely. It is possible to compile 32-bit code for it just like for any non-virtual 64-bit-capable target CPU.)


#2

Pentium 4 is never an appropriate base CPU for i686. i686 code is expected to work on a Pentium Pro.


#3

(I’ve been told that -mtune in clang is a no-op, so I don’t know what the appropriate way

Maybe -march=native ?

Should the default be adjusted so that it still doesn’t go beyond SSE2 in terms of what instruction set extensions are OK to generate but so that it causes instruction scheduling to be optimized for newer CPUs?

I don’t know what the point of the defaults is, but there are relatively modern Intel CPUs (e.g. Larrabee 1st gen) that support, for example, AVX-512, but don’t support MMX and SSE, and not even AVX2. So IMO the moment those CPUs were released the defaults (and plans like feature hierarchies) stopped making any sense.

I would prefer to keep x86_64 as x86_64_sse2 and use -C target-feature manually for the rest. We could add x86_64_sse42, x86_64_avx2, … targets that ship pre-compiled versions of std for those targets and enable appropriate features, but at the same time I’d wish we had a bare x86_64 target without SSE2, SSE, and MMX, to be able to target some of these Intel CPUs with it. Maybe Intel never releases any more AVX CPUs without SSE ever again, but they already did it once.

All in all, reality is a mess.


#4

In the GCC case, yes. In the Rust context, i686 means SSE2 (i.e. Pentium 4 / Pentium M) and i586 means without SSE.

It’s very useful for Rust’s default 32-bit x86 target to support SSE2, because SSE2 enables IEEE floating-point math and basic 128-bit SIMD, and bikeshedding the naming wouldn’t really be useful.

No, that would enable whatever instruction extensions the build host (typically a Xeon server) has.

Googling for “Intel Larrabee” suggests that Larrabee was a canceled GPGPU design. What Larrabee do you mean? An x86_64 CPU without SSE2 support doesn’t seem like a useful mainstream product, since SSE2 is a mandatory part of x86_64. Do you have a pointer to more info about x86_64 CPUs without SSE2?


#5

Larrabe is the cpu design behind the knf (Knights Ferry) and knc (Knights Corner), and basically was merged into silvermont afterwards which is what knl (Knights Landing) and knm (Knights Mill) use; basically Intel Atom cores with AVX-512. The knf and knc CPUs support AVX-512, but don’t support SSE instructions, IIRC they did not support AVX and AVX2 either. The knl and knm CPUs do support the whole SSE, AVX, up to AVX-512 ISA. Clang does not support knc and I don’t know if LLVM supports it, but the Intel C compiler obviously did.

Do you have a pointer to more info about x86_64 CPUs without SSE2?

The Larrabee / Xeon Phi wikipedia page has some more info: https://en.wikipedia.org/wiki/Xeon_Phi And there are also a couple of blog post describing the history of the original Larrabee and AVX-512 in more detail (although the original Larrabee was never released), for example, http://tomforsyth1000.github.io/blog.wiki.html#[[Why%20didn't%20Larrabee%20fail%3F]]


EDIT: https://software.intel.com/en-us/forums/intel-many-integrated-core/topic/477754 so at least knc did not support SSE nor AVX. It only had 512-bit wide registers, and one had to use them as such, so instructions for 128-bit wide and 256-bit wide registers were not available.

EDIT2: clang and LLVM do not support knc, but ISPC which is LLVM based does support it.


#6

It’s not a matter of bikeshedding, it’s a matter of honesty. I can sort-of forgive Linux distros using ‘i386’ to mean modern x86-32 hardware excluding actual 80386; apparently at some point they decided to drop actual i386 support, but didn’t want to break compatibility by renaming the architecture. But Rust never had that excuse. If the Rust compiler developers only want to target processors newer than Pentium 4, they should have named that target pentium4 or i786. Or just generically x86 or x86_32, without committing to any particular sub-architecture.

I was bitten by this a while ago; I installed cargo through APT on a Pentium Pro laptop (as i686 as it gets), and it just threw a segmentation fault at me; rustup did the same. I had to spend several hours debugging why (Is it the package builder’s environment leaking? Maybe an actual honest-to-$DEITY pointer bug?). If the target were clearly labelled pentium4, I’d have found that regrettable, but at least honest. Right now, the name of the i686 target is a transparent lie. (I have scrapped that machine eventually, so it doesn’t apply to me as much any more. But the general point still stands.)

By the way, the x87 FPU handles IEEE floats just fine.


#7

I agree with @felix.s here: we can do whatever we want with x86_32 and x86_64 “generic” targets, but the i{386,586,686,786,...} targets should work on the architectures that they are actually targeting “without buts”.

This does not impact users exclusively, parts of std (like stdsimd) have some messy code to deal with the “differences” between the Rust targets and “real life”. Just because Rust says that all i686 targets have SSE2 enabled (like the Pentium 4 does) does not make it that way (e.g., Pentium Pro, Pentium 2, and Pentium 3 are all i686 CPUs and they don’t have SSE2).

I don’t know who thought that any of this was a good idea and I haven’t been able to find any comments / discussions about why things are the way they are, but it has only led to pain down the road.

This is also why I am a bit skeptic about -C target-feature hierarchies: Intel does not assume a hierarchy, and some Intel CPUs have violated this in the pass and might do so again. I would be more comfortable with adding targets that enable multiple features (and document which feature they enabled), than about making avx512 implicitly enable avx2, for example. If anyone wants to pursue feature hierarchies they should do so with different names, like we currently do for the crypto feature. For example simd128 to enable ssesse4.2, or simd256 to enable avx, avx2, and simd128.


#8

The original question was about instruction scheduling, not instruction set. AFAIK scheduling optimized for modern architectures does not break anything, it just gives sub-optimal performance on older CPUs rather than newer CPUs.


#9

Indeed. This thread wasn’t meant to be about relitigating what the Rust target is called (i686).

knc seems to be characterized as a “coprocessor” on Intel’s site. Does it boot Windows, macOS or a GNU/Linux distro? If not, it seems irrelevant to the x86_64 and i686 targets for normal operating systems. Also, as I said elsewhere, AVX512 is weird in terms of not having a proper additive hierarchy. Its existence doesn’t disprove the SSE levels and non-512 AVX having an additive hierarchy.


#10

Yes, it boots Linux.