There are several situations where giving the compiler the freedom to reorder mathematical operations in ways that may lead to numerical imprecision also leads to big gains in performance. Here’s the open issue about it:
https://github.com/rust-lang/rust/issues/21690
And here’s a matrix multiplication benchmark that shows that it’s easy to find 20-30% speed improvements just by enabling that compiler flag:
To try and get an initial discussion going here are a few options of how this could be added to rust.
Option 1: Add a compiler flag
The most similar thing to C/C++ would be to have a compiler flag that enables that optimization (by passing that option to LLVM) for everything being compiled.
Advantages
- Most similar to what programmers are used to in other languages
Disadvantages
- Suffers from the same issues that
-ffast-math has in other languages that make it dangerous in several cases. For example end-users will enable it in cases where it’s unsafe to do so, get broken software, and then proceed to blame the software authors for it.
Option 2: Specific tagging for this in functions
Add a tag that can be used in functions that sets the relevant compilation flag in LLVM. Something like:
#[fast-math]
fn myfunc() {
//... your f32/f64 using code goes here
}
Advantages
- Works like other rust features
- Allows only the original author to tag only the specific bits of code where this is safe to do
Disadvantages
- It’s another feature specific tag that polutes this namespace
Option 3: Use the target_feature machinery for this
In some of ways -ffast-math is just another special CPU feature that can be used or not as there are even special instructions that can only be used if you allow reordering of math. See the target_feature discussion for details but this would be something like:
#[target_feature = "fast-math"]
fn myfunc() {
//... your f32/f64 using code goes here
}
Advantages
- Doesn’t introduce a new tag but instead just reuses a similar concept
- Should work nicely together with other
target_feature uses like enabling certain SIMD features in CPUs as often you will only get the best benefit from those extra instructions in non-handrolled code by allowing the optimizer to reorder math operations
Disadvantages
- Stretches the concept too far?
Option 4: Use a wrapper type
Wrappers for f32/f64 would be used that imply that feature:
fn myfunc(a: f32, b:f32) -> f32{
let afast = FastFloat(a);
let bfast = FastFloat(b);
//... your f32/f64 using code goes here
result.into()
}
Advantages
- Doesn’t introduce anything into the language itself
Disadvantages
- Requires a bunch of wrapping/unwrapping which makes for ugly code
- It’s much harder to compose with other things you might want to do. If you’re using
f32x4 that needs a wrapper type too. If you’re using ordered_float you need a way to wrap twice.
Discussion
From my view Option 3 of using target_feature for this seems the best. The syntax looks good and it composes well with other things that will be done with it like the SIMD use cases.
Curious as to what other people think, if there are entirely new options I haven’t thought of or if there are advantages and disadvantages to these ones that I haven’t considered.