Wrapping_rem should allow modulo 0

I don't think it's similar at all.

u32::MAX.checked_mul(2) gives me None, because the infinite-precision result overflowed the representable values.
u32::MAX.checked_add(u32::MAX) gives me None, because the infinite-precision result overflowed the representable values.
u32::MAX.checked_shl(1) give me Some, despite having exactly the same problem.

That's weird.

(Said otherwise, it's that normally it's about the output, but for shifts it's about one of the inputs.)

I'd probably only support u32 an the RHS of shifts, TBH, just like how today checked_shl and wrapping_shl and unchecked_shl all only take u32, not anything else.

Rather like Dellvmize some intrinsics (use `u32` instead of `Self` in some integer intrinsics) by WaffleLapkin ¡ Pull Request #124003 ¡ rust-lang/rust ¡ GitHub

This, yes. Especially when things like count_ones and friends can give BITS.

(Same as how slice split_at takes 0..=n, and it would be really annoying if it only supported 0..n or 1..=n or 1..n.)

6 Likes

I had trouble convincing myself with this example. It felt kind of like saying 3u32.checked_div(2) gives Some(1). Of course!

Rather, think about 0u32.checked_shl(32). It is pretty clear that the infinite-precision result should be 0 (which obviously fits in a u32). Yet, this returns None because one of the inputs is out of range.

3 Likes

This is why the (unstable) thing that doesn't accept that is called exact_div instead of unchecked :slightly_smiling_face:

error: Undefined Behavior: exact_div: 3_u32 cannot be divided by 2_u32 without remainder
 --> src/main.rs:3:14
  |
3 |     unsafe { std::intrinsics::exact_div(3_u32, 2) };
  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ exact_div: 3_u32 cannot be divided by 2_u32 without remainder
  |

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=4b27e9f0aadcf256575802aaa87dc09e

(In LLVM there's shr exact and div exact for this "didn't lose any low bits" property, vs add nuw/nsw and mul nuw/nsw (and some others) for the "above the high bit" stuff.)

1 Like

:100: , the semantics of our shifts are a huge pain and (compared to the other basic arithmetic operations) take a lot more "entropy" to describe. Everything else is just "do the operation in infinite precision, then wrap to bring things back in-bounds". (This is talking about the underlying MIR operations which always wrap; the overflow check is already handled before that. But it fits the framework nicely by replacing the final wrap with "panic if out-of-bounds".) Shl/shr are the one exception where you have to know the bitwidth as part of executing the operation itself, not just for the final bringing-it-in-bounds. I also highly doubt here are many (if any) situations where "shift by n % BITS" is the semantics anyone wants, in contrast to wrapping arithmetic. This also shows up as a bunch of practical pain:

  • It took us multiple iterations to arrive at the current docs for the shift MIR ops.
  • I've rewritten the code handling shl/shr in const-eval and Miri from scratch at least twice, and I doubt it's going to get any better, but it's still by far the ugliest of the operators.
  • The code handing these in the LLVM codegen backend is quite non-trivial to keep it reasonably performant, which means it needs extensive comments and will be wrong if we ever have a u256 type (but nowadays at least there are assertions that will fire then).
2 Likes

It’s occasionally useful (as a micro optimization) to let the CPU’s instruction handle the masking of the shift distance so it matches LHS::BITS. This comes up most commonly when the shift distance is a bit field you’re extracting from an u32 or u64, as in shift-based DFAs. But this is not very common, and compilers are very good at eliding the explicit masking of the shift amount that’s needed in C and C++ to avoid UB, so I just write the masking in Rust as well for clarity.

However, I do care that the default semantics (1) are efficient when overflow checks are disabled, and (2) don’t make manipulation of bit vectors unnecessarily difficult to write correctly. The semantics where a shift is equivalent to an infinite precision multiplication by a power of two and then wrapped into the target type’s range or panics doesn’t do either. It’s more expensive to implement this kind of wrapping on most CPUs. And when viewing integers as a bag of bits, it’s pretty common to deliberately discard some of those bits after they were used. I don’t want to rule out that Rust could have trained everyone to be explicit about when they want that, but intentionally shifting out bits is much more common than intentionally having an addition or multiplication wrap around.

5 Likes

It's true that shl instructions typically wrap around at BITS. However this only applies to integers that match architectural bit width. If we get support for arbitrary width integers like i<11>, we'll have implement this wrapping ourselves.

Arm A64:

is the left shift amount, in the range 0 to 63

x86(-64):

The count is masked to 5 bits (or 6 bits with a 64-bit operand). The count range is limited to 0 to 31 (or 63 with a 64-bit operand).

There are also architectures that mask the shift amount for 32 and 64 bit integers differently (e.g., 32-bit Arm, and SIMD extensions sometimes behave differently from the scalar instructions), those don’t get the wrapping that Rust chose for free. But an explicit instruction for shamt as u32 % LHS::BITS is still cheaper than “infinite precision shift, then wrap around” because it’s a single bitwise and with an immediate.

Pondering: should be delete Shl & Shr from MIR entirely? We could pretty easily have MIR building lower << to ShlUnchecked, just like it lowers to Div that's also unchecked in MIR.

(Now that we have MIR optimizations the extra BitAnd would probably mir-opt away for constants, too.)

Note that that's already something we leave to LLVM. Shl in Rust gets lowered to and+shl in LLVM, and it's the thing that then elides the extra masking on targets like x86 that include the masking in the shift. But that also means that on targets where shifts work like unbounded_shl, us emitting the select+shl is also right, and the most optimal.

Hm, that is not the semantics I had in mind. What I had in mind was doing shifts on an unbounded-bitstring representation of the integer. (Negative integers are represented with infinitely many leading 1s, so they shift in 1s for right-shifts.) But maybe that is equivalent?

For right-shifts this becomes something like if rhs >= BITS { if lhs < 0 { -1 } else { 0 } } else { hardware_shr(lhs, rhs) }. Is your point that there's no way to compile this efficiently?

True, I don't think the "panic if wrapping fails" behavior makes much sense for shifts. They should probably never panic, i.e., behave exactly like the unbounded_* that I just learned exist.

Hm, maybe? In terms of specifying Rust this would shift the burden from the core language to the lowering function, not sure if that is a good tradeoff. But for rustc it could mean having that logic once instead of in each codegen backend, which could be nice.

2 Likes

If there were any relevant target like that, yes. Discussion on the unbounded_sh* ACP pointed out x86 SIMD and Nvidia PTX, but neither of those are particularly relevant for Rust. New scalar instruction sets (AArch64, RISC-V, LoongArch) seem to consistently pick the "shift N-bit src1 by log2(N) least significant bits of src2" option.

That's not quite equivalent, I assumed that there would a panic/wraparound toggle for this as for other arithmetic. I read too much into the earlier discussion of MIR semantics.

Yes, it's more expensive on pretty much every architecture I've looked at: Compiler Explorer

But it's also not always intended. There are a some patterns that can be sensibly extrapolated to distance = LHS::BITS (stuff like (1 << N) - 1 to construct a mask of N 1 bits, although in a lot of code N < LHS::BITS by construction) but I think most instances of distance > LHS::BITS are probably just plain bugs. It's annoying that we probably can't have one without the other. But I like the current trade-off where testing can tell you (via panics) where you need to use unbounded_sh*, as long as you have some test coverage for "shift by LHS::BITS" case, in return for flagging the definitely-bogus x << 1000.

Cranelift already defines the ishl/ishr instructions to wrap, so both rustc and Cranelift would do the masking in that case.

Do we have to be consistent? It’s an error condition anyway, and if these are new types, there is no concern of breaking existing code.

Consistency is valuable, especially if we want the existing uN and iN types to be aliases for new u<N> and i<N> with the same N. Besides, 1u8 << d shifting by d.rem_euclid(8) bits also doesn’t match any existing CPU instruction exactly, but it’s defined that way for consistency with u32 and u64 shifts.

I've wanted a bi-directional shift more often than I've wanted to wrap the shift amount. The current behavior of wrapping shifts exposes unexpected hardware details rather than doing the expected default (then again, I'd argue that signed division does the same, and that's less common viewpoint).

a >> b should be equivalent to (a / 2^b) % 2^N, with euclidean or flooring division (which doesn't matter because the denominator is always positive, and with both positive and negative values of b, or equivalently even if 2^b is a non-integer. b being unsigned is fine since it makes the negative case impossible, but if b were signed, this is the behavior I'd expect from when it's negative. (I specifically used the right shift mainly to specify the rounding behavior, which is more clear when dividing than when multiplying by a non-integer.)

It looks like this discussion of shifts spawned from a footnote in my prior comment? One that outright gave a different suggestion than what I was pointing towards in the rest of the comment. And unlike the shift behavior, the math in that division by 0 case is still as consistent with the rules as any other finite result would be.

Why would we have an operation defined as a particular form of division? That makes no sense to me, just write a division if that's what you want.

The name "shift left" / "shift right" is extremely clear; the underlying definition of this operation must be about shifting digits, or else it fails to capture the intuitive concept that led to the name. It may be a fact that that definition is equivalent to some sort of division, but that should be a theorem one proves, not the source of truth itself.

That formula is in fact equivalent to a bitshift in two's complement arithmetic, so it's just another way of saying the same thing.