Performance of isqrt

Neutron3529 · July 14, 2025, 1:46pm

I'm really curious about the performance issues about isqrt, since they are quite long.

There is a very simple idea come into my mind: Will it faster to convert u32 into f64, calculating its square root directly, and convert its result back, is faster than calling .isqrt() directly?

$ cat main.rs && cargo rr
fn fsqrt(u: u32) -> u32 {
    (u as f64).sqrt() as u32
}
fn isqrt(u: u32) -> u32 {
    u.isqrt()
}
fn timing(f: fn(u32) -> u32, range: impl Iterator<Item = u32>) {
    let now = std::time::Instant::now();
    let mut res = 0;
    for i in range {
        res += f(i);
    }
    println!("res = {res} {:?}", now.elapsed());
}
fn main() {
    timing(fsqrt, 0..=u32::MAX);
    timing(isqrt, 0..=u32::MAX);
}
    Finished `release` profile [optimized] target(s) in 0.00s
     Running `/me/test-isqrt/target/release/test-isqrt`
res = 715816960 8.357294192s
res = 715816960 18.820145928s

I have no idea whether it deserved to create a PR about this trick.

Vorpal · July 14, 2025, 1:49pm

I feel I have seen a thread on this already here, a year ago or so. Wasn't the issue raised (and never addressed) that not all targets have hardware floating point?

Neutron3529 · July 14, 2025, 1:57pm

If the issue is "not all targets have hardware floating point", we could pick all the target that isqrt is slower than convert it to f64 directly.

for example:

#[cfg(or(target_arch = "x86_64", target_arch = "x86"))]
fn isqrt(i: u32) -> u32 {
    (i as f64).sqrt() as f32
}
#[cfg(not(or(target_arch = "x86_64", target_arch = "x86")))]
fn isqrt(i: u32) -> u32 {
    // old impl
}

scottmcm · July 14, 2025, 3:07pm

IIRC part of the problem is that it's a full target question, not just an arch problem. There was something like the linux kernel not wanting to use SSE (which rust does for f64 operations like this) in kernel mode even on x64 chips that all have it.

Vorpal · July 14, 2025, 3:10pm

Some previous discussion of this: Do the square root intrinsics work on all platforms?

Note that the discussion only start part way through, it starts out as a different question and... meanders. As often happens.

cuviper · July 14, 2025, 3:49pm

This is mostly about not wanting to save and restore floating point state from the userspace process, especially since interrupts and preemption can happen at any time. There are no callee/caller-save ABI rules to limit state here, so the kernel has to save everything that it might clobber.

the8472 · July 14, 2025, 8:46pm

We already implement f32::midpoint in platform-dependent way.

But I think there might be a much bigger variance between integer and float behavior compared to the gap between f32 and f64 (with the assumption that both of the latter are hardfloat).

the8472 · July 14, 2025, 8:56pm

see also, stabilization report of the isqrt feature: Tracking Issue for `{u8,i8,...}::isqrt` · Issue #116226 · rust-lang/rust · GitHub

Topic		Replies	Views
Do the square root intrinsics work on all platforms?	43	1962	January 15, 2024
Suggestion: FFastMath intrinsics in stable libs	60	5881	January 24, 2022
Help Us Benchmark Saturating Float Casts!	19	8375	March 25, 2019
Unsafe float math by default?	9	999	January 19, 2022
Pre-RFC: Dealing with broken floating point language design	27	12873	March 25, 2019

Performance of isqrt

Related topics