A few years ago there was a discussion about the analog of C++'s std::remainder in Rust. The discussion suggests to open a ticket on the Rust repo if one feels that this is something the standard library ought to provide.
To summarize some arguments from the discussion:
std::remainder is one of the functions that are "required" by the IEEE 754-2008 standard.
std::remainder can always be represented exactly with no rounding error, whereas rem_euclid has a rounding error.
A practical use case: std::remainder(x, 2 * pi) will normalize an angle to between -pi and pi, that's sometimes useful.
Another example use case: The Faust language compiles a domain-specific DSP (digital signal processing) language to various target languages including Rust. The code generation relies on the availability of the remainder function in the target language, which creates some inconvenience for the Rust code generation (currently the Faust compiler uses the work-around proposed by ExpHP, which is unfortunately platform specific).
Any opinions on whether adding it to the standard library is feasible?
For your usecase you can fallback to the libm for other platforms but this emulates floating point with integers if I see it correctly. Apparently, there is currently no way to emit the fprem1 instruction on x86 in rustc without inline assemly.
BTW: gcc inlines the fprem1 instruction and calls the proper corner-case handling implementation only when required while clang (and rustc for the comparable fmod) always do a function call: Compiler Explorer. I smell some optimization potential here.
Fprem1 seems to be an x87 instruction, so I'm surprised it is being used at all on x86_64. I would expect SSE instructions to be used instead. Converting too/from x87 is expensive as I understand it. Also x87 uses 80 bit long doubles, so correct 32/64 bit floating point behaviour may not be guaranteed, depending on when the value is rounded down.
It makes a lot of sense for this particular function. Floating point remainder is essentially doing integer division where the dividend can be very large. E.g. in f64::MAX % PI the integer quotient would be a 1023-bit integer. The FPREM1-instruction (partially) reduces the dividend in-place and sets a flag for "reduction incomplete", so the function needs to repeat it until done. Accuracy is a non-issue: The remainder is exact and there is no rounding.
Doing the same on SSE is not so simple, as it doesn't have an instruction for remainder, so you'd most likely end up with something way more complex. (Although I haven't looked for alternative implementations, so I might be wrong on that.)
The running example is bad, because π is an irrational number; you cannot make the error of your rational approximation ε = |π − pi| be zero and that means std::remainder(x, 2 * pi) will be garbage for sufficiently large x. For IEEE double, the difference between pi and π is on the order of 2 × 10−16, so if you're doing trigonometry, std::remainder(x, 2 * pi) gives you a reduced angle that is off by more than 1° when x > 7 × 1013. Turning it around, to not get garbage from cos(x) for any real number x representable by IEEE double requires something like a 4800-bit approximation to π in the worst case (if I've understood the poorly documented guts of GNU libc's math library correctly).
Only 1000 bits or so is required. libc stores more to support long double.
Reducing mod 2π with good rounding can be done in constant time by storing the 1/2π constant with large precision (turns out you only need a short sub-slice of those bits). Whereas std::reminder not only doesn't work properly for large numbers modulo 2π, it also requires a slow loop.
Right, it was a bad choice of example. But even then, std::remainder(x, 2 * pi) will be the exact remainder of x by the specific rational number 2 * pi (not to be confused with the irrational 2π). I should have used f64::MAX % 1234567887654321.0 instead.
The point was that while * or / might take the CPU something like 3 or 15 cycles respectively, the cost of evaluating f64 % f64 grows with the difference in exponent up to something like 2000 cycles using the x87 fprem-instruction, or 21000 cycles with glibc's fmod which Rust's % ends up calling (admittedly it is a rather old version on my system). Running the same benchmark on the compiler explorer, I measure 700ns and 6000ns respectively.
I stand corrected. Still, I think my overall point is not affected by the exact number here. I feel like std::remainder is potentially an API that encourages people to write incorrect code when irrational numbers are involved.
You really shouldn't be using x87 instructions for your tests. I doubt Intel or AMD bothers to squeeze cycles out of those anymore, and there's also massive overhead involved in using them at all anymore (the OS doesn't bother to allocate context switch state for them until they're used, that chunk of the core is powered down most of the time, etc).
I'm not a chip designer, so I could be wrong here, and I know division is intrinsically harder than multiplication, but I also know an integer divide circuit can spit out both quotient and remainder simultaneously, and I don't see any reason why floating point divide should be different.
Much like multiplication, a floating point quotient (N * 2^k) / (M * 2^h) is just the quotient of the significands with the exponent adjusted (N / M) * 2^(k-h), which is why the cost of floating point division is in the same ballpark as integer division of comparable size.
On the other hand, the remainder does not decompose the same way: (N * 2^k) % (M * 2^h) becomes ((N * 2^(k-h)) % M) * 2^h, where the integer remainder has the dividend shifted left by k-h bits. That is, near the worst case, f64 % f64 is kind of like u2000 % u54.
The tail of 0-bits does allow an asymptotically faster algorithm since its just modular exponentiation: You can compute (2^n) % M in O(log(n)) steps, and in a quick test that is another order of magnitude faster for my previous benchmark. Then again, that benchmark was just measuring the worst case which is rather unrealistic. Typically the difference in exponent shouldn't be large.