Oh! Thanks for pointing that out.

Though that gives me an interesting idea: `f80`

has a 64-bit mantissa, so might work for `u64::isqrt`

...

Oh! Thanks for pointing that out.

Though that gives me an interesting idea: `f80`

has a 64-bit mantissa, so might work for `u64::isqrt`

...

the Linux Kernel disables SSE and AVX:

except that, on MSVC, the precision is set to 53 bits, not full precision.

Be careful about applying the pigeonhole principle here. While `f64`

can't represent all 64 bit integers, it *can* represent all 64 bit square roots. `(my_u64 as f64).sqrt() as u64`

might not always be exactly the number you want, but it should be very close (possibly always ±1), and it's an extremely good guess for the first iteration of another algorithm.

Pretty sure you shouldn't use x87 these days, the CPU isn't optimised for it. But I'm not a compiler writer, so take what I say with a grain of salt.

I think some CPUs had massive setup cost when switching between x87 and vector instructions as well, though that might have been with MMX (where the register backend storage overlapped iirc).

Also, so far we have only talked about x86. What about ARM (where you would presumably use VPF or NEON I think?), ARM64, PPC, or RISCV?

Presumably an analysis and benchmark would need for each of the tire 1/2 targets at least? Especially messy on arm with their hf versions etc. And so many more vendors to benchmark on than on x86 too!

I guess it depends on what you use it for if +/-1 is suitable.

As a programmer I know floating point is approximate. But I expect integer operations to be exact (though possibly rounded, usually towards zero) as long as the value didn't overflow.

To be frank, I'm not sure what I would use *integer* square root for, I don't personally have a use for it in any code I write. But if we use an approximation we should check that won't break any expectations.

Is there for example any search algorithm that uses an integer square root to select an index? Could something like that break for large indices (probably the u32/f32 case, I don't expect anything to use more than 53 bits to represent anything in memory, as far as I know we haven't reached that point yet).

For `u8`

to `u32`

, the result is exact from the start. For `u64`

, it can be fully corrected by adding -1, 0, or 1 (and there may be a way to prove that only -1 and 0 are needed). For `u128`

, I use the Karatsuba square root algorithm after using the `u64`

method on the most significant bits, which is proven to work.

3 Likes

If you are taking the square root of an integer, yes it is suitable. Unless you mean you want the floating point sqrt of an integer, in which case sure you need f80 or f128 or some kind of bigfloat.

I expect integer sqrt to always give exactly `floor(sqrt(x))`

.

You can just exhaust since there are `2^32`

perfect squares representable as `u64`

and square root is monotonically increasing. If `((x*x) as f64).sqrt() as u64 >= x`

that should be enough.

1 Like

I've tested it without +1 and I'm confident that it works on x86-64 Linux or so. My concern was that on a different target triple or whatever, `isqrt`

might give incorrect results, and I wanted to avoid that for something in `core`

.

I've included an `#[ignore]`

d test in `core`

that does what you suggest (it tests perfect squares, perfect squares minus one, and halfway between neighboring perfect squares). Do all `rustc`

-supported platforms run all the tests, including the ignored ones, every so often, and would people notice if it failed on a platform?

well, turns out `(v as f64).sqrt() as u64`

gives exact results for all perfect squares that fit in `u64`

:
Compiler Explorer (may have to refresh until you get an executor with avx512)

3 Likes

~~I think that it'll fail to satisfy isqrt(x² - 1) = x - 1 for all x > 0.~~ Never mind, I see that this was to show that +1 isn't needed.

Proof: Let round(x) = f64-rounding of x.

Let 2^{k} ≤ n < 2^{k+1}, 0 ≤ k < 32.

If n^{2} can't be represented exactly, then:

round(n^{2}) ≥ n^{2} + 1 - 2^{2k+1-53} > n^{2} - 2^{2k-52} + 2^{2k-106} ≥ n^{2} - 2^{k-52}n + 2^{2k-106} = (n - 2^{k-53})^{2}

round(n^{2}) ≤ n^{2} + 2^{2k+1-53} < n^{2} + 2^{2k-52} + 2^{2k-106} ≤ n^{2} + 2^{k-52}n + 2^{2k-106} = (n + 2^{k-53})^{2}

n - 2^{k-53} < √(round(n^{2})) < n + 2^{k-53}

round(√(round(n^{2}))) = n

It follows from this that if you want floor(sqrt(x)) for x in u64, then compute in f64, round down to u64, and subtract 0 or 1.

2 Likes

I think the original claim is pretty subtle because it is using flooring. Rounding is more forgiving in this case. If `n`

is an integer then `n^2`

might get rounded down, but the square root of `n^2(1 - e)`

is very close to `n(1 - e/2)`

; the actual error is slightly worse, but the difference is `O(e^2)`

and `n(1 - e/2)`

has to round to `n`

.

Indeed, that inequality only works for k > 26. So the case k=26 remains. For k<26 squares can be represented exactly.

I'm assuming n is odd (otherwise we can reduce n by a power of 2 and use smaller k), so n > 2^{k}.

For k = 26:

round(n^{2}) ≥ n^{2} - 1 ≥ n^{2} - 2^{-26}(n-1) > n^{2} - 2^{-26}n + 2^{-54} = n^{2} - 2^{k-52}n + 2^{2k-106} = (n - 2^{k-53})^{2}

user16251:

I think the original claim is pretty subtle because it is using flooring. Rounding is more forgiving in this case.

We were all taking about the same claim. Note that when I say "round(...)", I mean rounding to the nearest `f64`

, not rounding to the nearest integer. The claim that I have proved (assuming there are no more mistakes) is that `((n * n) as f64).sqrt() == n as f64`

always, exactly, whenever `n*n`

fits in `u64`

.

From this it follows that `(x as f64).sqrt() as u64`

will always be either the floor or (in rare cases) the ceiling of the exact answer and thus, if you're looking for the floor, it's enough to do one extra check at the end. One also has to be careful about possible overflow for `x`

s close to `u64::MAX`

.

tczajka:

Note that when I say "round(...)", I mean rounding to the nearest

`f64`

, not rounding to the nearest integer.

Sure, but `as u64`

is flooring. What I meant was that if you are doing an approximate computation with `f64`

that should have an integer result, you are going to get a `f64`

close to that integer. Rounding to the nearest integer is more forgiving than flooring since with flooring you need your error to be nonnegative. The claim is true in this case because even though the error in rounding `n*n`

can be negative, the relative error in `sqrt(nearest(n*n))`

is smaller and so the nearest floating-point number is `n`

. But if you were working with a function that mapped integers to integers but had a larger derivative you could be in trouble.

I agree that inserting a "round-to-nearest integer" operation after sqrt makes it obviously work. But turns out to be unnecessary.

Yes it's very subtle. Your argument:

user16251:

If

`n`

is an integer then`n^2`

might get rounded down, but the square root of`n^2(1 - e)`

is very close to`n(1 - e/2)`

; the actual error is slightly worse, but the difference is`O(e^2)`

and`n(1 - e/2)`

has to round to`n`

.

isn't totally convincing because your rounding error "e" is not a constant: it changes by a factor of 2 depending on what the number is. 1.99 has about 2x better precision than 2.01.