Currently for signed integers, MIN % -1 panics. This is, as far as I know, because LLVM specifies that that operation is undefined behaviour, probably because the division itself would overflow.
However, for remainder the answer would be zero. Since the Rust core library does the check anyway to panic instead of falling through to UB,would it make sense for the remainder operation to return 0 instead of panicking?
It would be probably result in even better generated code, as two comparisons (lhs compared to MIN, rhs compared to -1) would be replaced by one comparison (just rhs compared to -1).
That sounds potentially reasonable as a result. However, defining that result wouldn't change the result of unchecked_rem, which would still invoke UB in that case. And I don't think we want to add a conditional to unchecked_rem.
I've got mixed feelings about unchecked_rem. On the one hand, I want something that I know will maps to a single intrinsic or instruction. On the other hand, MIN % -1 isn't inherently illformed, unlike X % 0, so making unchecked_rem UB on this technically well-formed computation is also undesirable (in some scenarios, at least).
FWIW, IIUC the x86 idivl instruction that's used for unchecked_rem actually does the division and raises INTEGER OVERFLOW ERROR for a divisor of -1, so "just make it defined" sadly isn't an option.
It's worth noting that unchecked_rem is still just an intrinsic (unlike e.g. unchecked_add which is available under a feature flag). I don't think it's too much to just say that this case is UB for unchecked_rem but produces a defined value in the normal version. (We already act similarly for e.g. as casting with floats.)
I think it's very important that checked_foo, overflowing_foo, and unchecked_foo share the same "this is what I'm checking or not checking".
While certainly we could say "well you should have read the safety comment better", I don't think I could really blame someone who said "well I know what checked_rem does, and I've proved those cases can't happen here, so I'm clear for this use of unchecked_rem".
(And note that someone who wants i32::MIN % -1 to be 0 can just use wrapping_rem, which might be good to make that clear in the code anyway.)
I would prefer if unchecked methods are not in the same structure as {checked,wrapping,saturating,overflowing}. In fact I would prefer if for example the experimental unchecked_add is renamed to add_unchecked, as usually _unchecked goes in the end of method names, for example f32::to_int_unchecked and slice::get_unchecked.
In the case of this thread, the method would be named unsafe fn rem_unchecked, and that would be a clear indication that it is not in the same group as fn checked_rem and fn wrapping_rem.
With checked, saturating, etc., you get a different result for the overflowing cases which is not directly related to implementation details. With unchecked it's different: it is more efficient because it is the unchecked raw implementation that is exposed, but the overflowing cases are disallowed.
No, my point was that float as i32 gives defined results (saturating cast) now, but we offer an unsafe unchecked cast which makes this UB. float as iN is defined now.
The point of this comparison is that it counters the argument that MIN % -1 shouldn't work normally and be UB in the unchecked version, as that same "defined (non panicking) result normally, UB when unchecked" semantics are already present in the language for casting from floating point to integrals.
I suppose there still is a difference in that I'm pretty sure we don't have a checked analogue for the float cast.
One inconsistency my original proposal would cause is that MIN.overflowing_rem(-1) == (0, true), that is, in at least one stable API, MIN % -1 is actually specified as overflow.
Another alternative would be for MIN % -1 to return 0 when debug assertions are disabled and to panic when debug assertions are enabled. That would kinda make remainder more consistent with most other arithmetic operations (all except division I think).
The advantage would be to save one comparison in release mode by wrapping, while still panicking in debug mode, so that it is still consistent with overflowing_rem and all.
The main disadvantage I can think of is that it would be a little less consistent with division itself, where MIN / -1 does not wrap in release mode. However, division could be made to wrap and return MIN in release mode as well, which would actually be slightly more efficient according to my interpretation: for the likely path, wrapping results in the following instructions:
cmp edi, MIN
jne (taken)
test esi esi
je (not taken)
mov eax, edi
cdq
idiv esi
ret
while for the current division, the likely path would require an extra push and pop.
push rax ;extra instruction
test esi, esi
je (not taken)
cmp edi, MIN
jne (taken)
mov eax, edi
cdq
idiv esi
pop rcx ;extra instruction
ret
This is one of multiple examples of CPU-specific arithmetic quirks leaking into the language definition. In the long term this will probably start causing more harm than good, when future CPUs may no longer have these quirks but Rust will be stuck with them. Actually I think they already cause more harm than good because programmers have to work around those quirks, sacrificing both usability and performance.
Other examples:
(-3) % 10 == -3. I don't know of any practical algorithm where -3 is the useful answer, rather than 7. There is rem_euclid, but it's not the "default" operation.
On the performance point; one option is to check for divisor == -1 first and dividend == MIN only in those special cases (the current implementation is the other way around). This is just as cheap, but allows skipping the actual division in all of those cases as a / -1 == -a.
Furthermore, this allows efficiently combining the branch with the necessary one for divisor == 0 as divisor.wrapping_add(1) as uN < 2. Using this, the common case would only have one branch.
I did a bit of ad hoc benchmarking, using as a workload essentially sum += a[i] % b[i] in a loop, with the inputs coming from shuffled Vecs.
Results show some potential; a 10% improvement when the special cases don't happen. I would expect the worst case for this change would be causing additional branch misses for the x / -1 cases, and indeed if those are unpredictable and very frequent (~20% or more), performance does drop notably. On the other hand, the current implementation suffers similarly with frequent MIN/y, though that's arguably a weird case to see.