Do the square root intrinsics work on all platforms?

Oh! Thanks for pointing that out.

Though that gives me an interesting idea: f80 has a 64-bit mantissa, so might work for u64::isqrt...

the Linux Kernel disables SSE and AVX:

except that, on MSVC, the precision is set to 53 bits, not full precision.

Be careful about applying the pigeonhole principle here. While f64 can't represent all 64 bit integers, it can represent all 64 bit square roots. (my_u64 as f64).sqrt() as u64 might not always be exactly the number you want, but it should be very close (possibly always ±1), and it's an extremely good guess for the first iteration of another algorithm.

Pretty sure you shouldn't use x87 these days, the CPU isn't optimised for it. But I'm not a compiler writer, so take what I say with a grain of salt.

I think some CPUs had massive setup cost when switching between x87 and vector instructions as well, though that might have been with MMX (where the register backend storage overlapped iirc).

Also, so far we have only talked about x86. What about ARM (where you would presumably use VPF or NEON I think?), ARM64, PPC, or RISCV?

Presumably an analysis and benchmark would need for each of the tire 1/2 targets at least? Especially messy on arm with their hf versions etc. And so many more vendors to benchmark on than on x86 too!

I guess it depends on what you use it for if +/-1 is suitable.

As a programmer I know floating point is approximate. But I expect integer operations to be exact (though possibly rounded, usually towards zero) as long as the value didn't overflow.

To be frank, I'm not sure what I would use integer square root for, I don't personally have a use for it in any code I write. But if we use an approximation we should check that won't break any expectations.

Is there for example any search algorithm that uses an integer square root to select an index? Could something like that break for large indices (probably the u32/f32 case, I don't expect anything to use more than 53 bits to represent anything in memory, as far as I know we haven't reached that point yet).

For u8 to u32, the result is exact from the start. For u64, it can be fully corrected by adding -1, 0, or 1 (and there may be a way to prove that only -1 and 0 are needed). For u128, I use the Karatsuba square root algorithm after using the u64 method on the most significant bits, which is proven to work.

3 Likes

If you are taking the square root of an integer, yes it is suitable. Unless you mean you want the floating point sqrt of an integer, in which case sure you need f80 or f128 or some kind of bigfloat.

I expect integer sqrt to always give exactly floor(sqrt(x)).

You can just exhaust since there are 2^32 perfect squares representable as u64 and square root is monotonically increasing. If ((x*x) as f64).sqrt() as u64 >= x that should be enough.

1 Like

I've tested it without +1 and I'm confident that it works on x86-64 Linux or so. My concern was that on a different target triple or whatever, isqrt might give incorrect results, and I wanted to avoid that for something in core.

I've included an #[ignore]d test in core that does what you suggest (it tests perfect squares, perfect squares minus one, and halfway between neighboring perfect squares). Do all rustc-supported platforms run all the tests, including the ignored ones, every so often, and would people notice if it failed on a platform?

well, turns out (v as f64).sqrt() as u64 gives exact results for all perfect squares that fit in u64: Compiler Explorer (may have to refresh until you get an executor with avx512)

3 Likes

I think that it'll fail to satisfy isqrt(x² - 1) = x - 1 for all x > 0. Never mind, I see that this was to show that +1 isn't needed.

Proof: Let round(x) = f64-rounding of x.

Let 2k ≤ n < 2k+1, 0 ≤ k < 32.

If n2 can't be represented exactly, then:

round(n2) ≥ n2 + 1 - 22k+1-53 > n2 - 22k-52 + 22k-106 ≥ n2 - 2k-52n + 22k-106 = (n - 2k-53)2

round(n2) ≤ n2 + 22k+1-53 < n2 + 22k-52 + 22k-106 ≤ n2 + 2k-52n + 22k-106 = (n + 2k-53)2

n - 2k-53 < √(round(n2)) < n + 2k-53

round(√(round(n2))) = n

It follows from this that if you want floor(sqrt(x)) for x in u64, then compute in f64, round down to u64, and subtract 0 or 1.

2 Likes

This part seems to be false, with the counterexample n = 111130675.

1 Like

I think the original claim is pretty subtle because it is using flooring. Rounding is more forgiving in this case. If n is an integer then n^2 might get rounded down, but the square root of n^2(1 - e) is very close to n(1 - e/2); the actual error is slightly worse, but the difference is O(e^2) and n(1 - e/2) has to round to n.

Indeed, that inequality only works for k > 26. So the case k=26 remains. For k<26 squares can be represented exactly.

I'm assuming n is odd (otherwise we can reduce n by a power of 2 and use smaller k), so n > 2k.

For k = 26:

round(n2) ≥ n2 - 1 ≥ n2 - 2-26(n-1) > n2 - 2-26n + 2-54 = n2 - 2k-52n + 22k-106 = (n - 2k-53)2

We were all taking about the same claim. Note that when I say "round(...)", I mean rounding to the nearest f64, not rounding to the nearest integer. The claim that I have proved (assuming there are no more mistakes) is that ((n * n) as f64).sqrt() == n as f64 always, exactly, whenever n*n fits in u64.

From this it follows that (x as f64).sqrt() as u64 will always be either the floor or (in rare cases) the ceiling of the exact answer and thus, if you're looking for the floor, it's enough to do one extra check at the end. One also has to be careful about possible overflow for xs close to u64::MAX.

Sure, but as u64 is flooring. What I meant was that if you are doing an approximate computation with f64 that should have an integer result, you are going to get a f64 close to that integer. Rounding to the nearest integer is more forgiving than flooring since with flooring you need your error to be nonnegative. The claim is true in this case because even though the error in rounding n*n can be negative, the relative error in sqrt(nearest(n*n)) is smaller and so the nearest floating-point number is n. But if you were working with a function that mapped integers to integers but had a larger derivative you could be in trouble.

I agree that inserting a "round-to-nearest integer" operation after sqrt makes it obviously work. But turns out to be unnecessary.

Yes it's very subtle. Your argument:

isn't totally convincing because your rounding error "e" is not a constant: it changes by a factor of 2 depending on what the number is. 1.99 has about 2x better precision than 2.01.