What is the problem
After that comment on reddit, I think about the effect of potential optimizations which we prevent by making ffast-math intrinsics like fadd_fast or fmul_fast unstable.
I already met few advices like "if you need performance don't rely on optimizer and write SIMD manually". Writing SIMD instructions by hand is very error-prone because user need to check alignments, sizes, don't forget remaining elements which doesn't fit into SIMD registers, check instructions availability and similar things. Also, user need to repeat a lot of code because different CPU architectures have different instructions.
Additionally, lack of -ffast-math
can prevent some C/C++ users, especially from gamedev, from adoption Rust.
On my computer, using of instrinsics (without manual vectorizing) speeds up calculation of dot product of 20k f64s from 155 microseconds to 69 in release build, and to 54 if I use -C target-cpu=native
Possible solution
I suggest to add some new type FastFloat(T) similar way as Wrapping for integers. It would just have implementations for f32 and f64, which replaces all simple arithmetical operations by calls to instrinsics and therefore generate faster code. Code which uses such intrinsics utilizes modern SIMD extensions automatically and programmers wouldn't need to write SIMD code for audio/video processing manually.
Another option
Just add methods like fast_add
, fast_mul
, etc. to the float types. I like this option less because I prefer when properties of program described by types (using different type would better separate IEEE-754 floats from fast ones) but other people may like this option more.
What do you think?