TL;DR: 11f32.div_euclid(2.2f32) yields 5 but should be 4. I'm trying fix it.
I had asked deepseek about how to fix the rounding behaviour of div_euclid function.
Here shows what I got:
#![feature(core_float_math)]
use core::f32::math;
use std::time::Instant;
fn div_euclid(a: f32, b: f32) -> f32 {
let res = (a / b).floor();
if math::mul_add(b, res, -a) <= 0f32 {
res
} else if b > 0f32 {
res.next_down().floor()
} else {
res.next_up().ceil()
}
}
fn accurate_div_euclid(a: f32, b: f32) -> f32 {
(a as f64).div_euclid(b as f64) as f32
}
const RANGE: std::ops::Range<i32> = -5000..10001;
fn main() {
let tester = [
1f32,
1.1f32,
1.21f32,
1.331f32,
1.4641f32,
1.61051f32,
1.771561f32,
1.9487171f32,
-1f32,
-1.1f32,
-1.21f32,
-1.331f32,
-1.4641f32,
-1.61051f32,
-1.771561f32,
-1.9487171f32,
]
.into_iter()
.flat_map(|x| {
RANGE.flat_map(move |y| {
let base = y as f32 * x;
if math::mul_add(y as f32, x, -base) >= 0f32 {
[(base.next_down(), x), (base, x)]
} else {
[(base, x), (base.next_up(), x)]
}
})
})
.collect::<Vec<_>>();
let now = Instant::now();
let cum = tester
.iter()
.map(|&(x, y)| x.div_euclid(y) as i64)
.sum::<i64>();
println!("got {cum}, cost {:?}", now.elapsed());
let now = Instant::now();
let cum = tester
.iter()
.map(|&(x, y)| div_euclid(x, y) as i64)
.sum::<i64>();
println!("got {cum}, cost {:?}", now.elapsed());
let now = Instant::now();
let cum = tester
.iter()
.map(|&(x, y)| accurate_div_euclid(x, y) as i64)
.sum::<i64>();
println!("got {cum}, cost {:?}", now.elapsed());
for (x, y) in tester {
if div_euclid(x, y) as i64 != accurate_div_euclid(x, y) as i64 {
println!(
"{x}/{y}: left = {}, right = {}",
div_euclid(x, y) as i64,
accurate_div_euclid(x, y) as i64
)
}
}
}
I tried tested this code and surprisingly found that convert to f64 yields faster result than call div_euclid directly.
Thus I have little confident whether my code is better than the original one.