The IEEE 754 floating point standard enjoys widespread acceptance and broad hardware support. It is so ubiquitous that Rust has (rightly) opted to define f32
and f64
as conforming to this standard. It’s a great standard, with much thought put into it and many useful guarantees. Unfortunately, the standard is more what you’d call “guidelines” than actual rules. Too many implementations take liberties with the tight precision specified by the standard, with the behavior in edge cases, with the support for subnormal numbers, with the set of supported operations, and so on.
The consequence is that even relatively simple numeric algorithms can give significantly different results on different platforms. Sometimes even basic arithmetic operations are incorrectly rounded. More commonly, subnormal numbers are treated as zero and subtleties like signed zeros and NaNs are “simplified”. Most of the time these things don’t matter, but when they do, it can be really painful.
Although the picture on modern desktop hardware is pretty okay, there are still many circumstances where one can encounter such problems in 2015. See the appendix for a non-exhaustive list of examples, but I figure most people are at least vaguely aware that this is a thing that happens (which is why it’s shoved in an appendix).
So what can we do about it? Not much. Such is the reality of writing cross-platform floating point code and Rust can only solve so many impossible problems at once. Besides, a lot of code mostly works despite these imperfections. However, we can give users the tools to work around problems if and when they face them.
The first such user would be libcore, by the way: the string to float conversion has a fast path that crucially depends on the accuracy of multiplication and division to fulfill its promise of correctly rounded results. (The result would occasionally be off by one bit, on a platform where all other operations are already occasionally inaccurate! Oh the humanity!)
The obvious solution goes through cfg
, but how exactly?
Proposed solution: #[cfg(target_float="foo")]
This mirrors the existing target_*
cfg values, especially target_env
. I like that because it allows targeted workarounds for specific problems of specific platforms. Possible values might be: soft_float
(which can mean something different depending on target arch and OS), x87
, sse2
, vfp
, neon
, etc. and possibly something about the libm
used (perhaps in a second cfg value). It doesn’t have to encode every possible target platform (especially since we already have target_arch
and target_os
) but it should distinguish significantly different floating point implementations within one arch+OS combination.
This would offer a more canonical and more robust than everyone creating and passing their custom --cfg
flags or inspecting only the arch and the OS. For example, if libcore float parsing wanted to be strictly correct, it would currently have to disable the fast path on any 32 bit x86 target, even though all in-tree targets do have SSE2 support. The compiler can more easily tell what code LLVM will generate, and it hopefully knows which support libraries (e.g. soft float, libm
) will be linked in.
It could also be “abused” for disabling floating point support with a value of target_float="none"
. This is particular interesting for libcore, which might be used in applications that don’t have and don’t want any float support at all (not even soft float). Issue #27702 is tangentially related. Right now one can compile and use libcore without libm and just not use the remaining float-related code (e.g. parsing and formatting), but it’s a lot of redundant code being included in the binary and not necessarily being optimized out.
Alternative: #[cfg(float_is_broken)]
This is a drastic statement and, strictly speaking, this flag would have to be set on almost any target. There’s not much one can do with this information, we’d have to restrict what we mean by “broken”. For example, the obvious choice on x86 is to enable it when x87 code will be generated and disable when you have SSE2 support. Still, it’s under-specified and if the threshold for “broken” is set “incorrectly” the attribute could become useless. I don’t like this alternative very much, but it’s a possibility and has been thrown around before (e.g. in IRC discussion between me and @arielb1 and @cmr).
Thoughts?
In particular, are there other alternatives? Do you strongly prefer one of the alternatives above, and if so, why?
Appendix
-
Old x86 hardware (roughly, pre-Pentium4) doesn’t have SSE and SSE2 and thus requires using the x87 instruction set. Although this FPU can be configured to do otherwise, it will normally calculate and round for 80 bits of precision, which can occasionally result in wrong rounding compared to doing all intermediate steps with
f64
. Changing this is an ABI-breaking change. If we generate code using x87 instructions, we’re pretty much bound to 80 bit precision. I don’t think we can change this aspect of the ABI without adding overhead to C FFI calls. -
Actual living breathing people are going through great pains trying (and so far, failing) to run Rust on i386 hardware.
-
Even on current hardware, x87 is still used by default by some compilers and some Linux distributions. Clang and rustc will assume SSE2 unless explicitly instructed otherwise, but Debian for example supports pre-P4 hardware. This was also mentioned in Perfecting Rust Packaging. I don’t know if an eventual Debian
rustc
package would change code generation to use x87, but in any case it’s conceivable that some packager would do this. -
Moving on from x86, support for subnormals is very slow (and needs software support hooks) on some (possibly many) ARM chips. Consequently, phones based on those chips treat subnormals as zero. I found nothing about Android, but I didn’t look very hard and would be surprised if those disable flush-to-zero. Cursory googling indicates that (some) MIPS hardware has similar troubles. I also know that early (ca. 2008) CUDA hardware did the same.
-
Transcendental functions are usually implemented purely or mostly in software, and consequently their accuracy varies by vendor/library. I don’t know off hand if any blatantly miss the accuracy demanded by the standard, but it’s another source of cross-platform differences that can cause serious problems.
-
In general, LLVM will not break too much code that relies on subtle IEEE semantics unless explicit instructed to (
-ffast-math
), but it’s far from perfect. The GCC wiki lists some things that will not be respected in its default mode, I assume it’s similar or worse for LLVM.