Signed difference of unsigned integers

Zobean · November 16, 2023, 5:44am

I'd like to be able to represent a signed difference of two unsigned integers (correctly handling overflowing, wrapping, checked and saturating variants).
There's quite a few possible operations on mixed-sign integers involving addition and subtraction, and only a few are missing:

Additions:

uX + uX = uX: uX::*_add
uX + uX = iX: seems trivial and not especially useful
uX + iX = uX: uX::*_add_signed
uX + iX = iX: iX::*_add_unsigned
iX + uX = uX: uX::*_add_signed
iX + uX = iX: iX::*_add_unsigned
iX + iX = uX: seems trivial and not especially useful
iX + iX = iX: iX:*_add

Subtractions:

uX - uX = uX: uX::*_sub
uX - uX = iX: hard to implement and useful to have
uX - iX = uX: could be implemented in terms of uX::*_add_signed, though iX::MIN would need special handling
uX - iX = iX: could be implemented in terms of iX::*_add_unsigned, though iX::MIN would need special handling
iX - uX = uX: seems trivial and not especially useful
iX - uX = iX: iX::*_sub_unsigned
iX - iX = uX: seems trivial and not especially useful
iX - iX = iX: iX::*_sub

I propose adding the following family of methods on unsigned integers (names updated based on @quaternic's suggestion):

impl uX {
	checked_signed_sub(self, uX) -> Option<iX>
	overflowing_signed_sub(self, uX) -> (iX, bool)
	saturating_signed_sub(self, uX) -> iX
	wrapping_signed_sub(self, uX) -> iX
}

This fills what I see as the one remaining useful and non-trivial mixed-sign addition/subtraction operation on integers.

I made a related crosspost on stackoverflow, with answers covering several possible implementations: rust signed difference of unsigned integers - Stack Overflow

Along similar lines as Representing difference between unsigned integers

quaternic · November 16, 2023, 1:31pm

I agree this operation would be useful, with subtraction being the natural source of signed numbers in the first place. For naming, I might prefer an ordering like checked_signed_sub, since keeps the fundamental operation "signed_sub" together, and all the "checked_" operations share the prefix.

See also this recent issue requesting the same: Add method for computing the difference of two unsigned integers in a signed · Issue #117476 · rust-lang/rust · GitHub

pitaj · November 16, 2023, 3:58pm

I think *_sub_signed would be useful as well. BTW, you should open an ACP for this on the libs-team repo.

uX - uX = iX: hard to implement and useful to have

Isn't this just wrapping_sub(..) as iX?

quaternic · November 17, 2023, 12:01am

Assuming you meant wrapping_sub(..) as iX, the wrapping result is indeed the same, and that is an argument for not having that variant.

However, for the variants that care about overflow, there is a significant difference in where the overflows happen. For example with u8 - u8 = i8:

signed_sub:
200 - 0 = 200  (no unsigned overflow, signed saturates to 127)
100 - 0 = 100  (no overflow)
0 - 100 = -100  (unsigned saturates to 0, no signed overflow)
0 - 200 = -200  (unsigned saturates to 0, signed saturates to -128)

My suggested implementation is:

// Map 0..=u64::MAX to i64::MIN..i64::MAX,
// preserving their difference
let a = (i64::MIN).wrapping_add_unsigned(a);
let b = (i64::MIN).wrapping_add_unsigned(b);

a - b   // or checked/saturating/overflowing respectively

That optimizes well since the compiler can replace those adds with just toggling the sign bit, and the generated code is the same as for:

let a = a as i64 ^ (1 << 63);
let b = b as i64 ^ (1 << 63);
a - b

Nokel81 · November 17, 2023, 1:23am

iX - uX = uX:

Shouldn't this be iX?

Zobean · November 17, 2023, 1:41am

Not sure what you mean, I'm pretty sure I've got all the permutations in the list

Nokel81 · November 17, 2023, 12:50pm

I mean that in general a signed integer minus an unsigned integer has to produce a signed integer.

Example: -5 - 10 would produce -15

quaternic · November 17, 2023, 2:03pm

The meaning seems clear when you consider the context as part of an exhaustive list of the possible combinations, including the assessment for that case:

iX - iX = uX: seems trivial and not especially useful

For one example where you might want that, the length of a range of signed integers is the unsigned difference end - start. (See steps_between in range.rs - source). A hypothetical checked_unsigned_sub(a: iX, b: iX) -> uX would be

if a >= b {
    Some(a.wrapping_sub(b) as uX)
} else {
    None
}

That's not too difficult, but also not obvious that you do need the wrapping_sub in there. To be clear, I'm not proposing that function for std, but this is making me wish for a more generic solution instead of having a ton of inherent methods with different names.

jcranmer · November 18, 2023, 9:19pm

To be clear, I'm not proposing that function for std, but this is making me wish for a more generic solution instead of having a ton of inherent methods with different names.

Since you've mentioned this, here is a brain dump of some off-the-cuff thinking here.

The basic primitive here to think about is a generalized arithmetic operation, which takes in potentially multiple arithmetic types (including floating-point, but I won't discuss further except via footnotes), and returns Option<(V, bool)>. The types used in the signature need not be homogeneous, and of course the heterogeneous mixing of signed and unsigned is what's being discussed in this thread. The result is None if the operation in some sense isn't properly defined [1]--division by zero is the classic example here, but after some thought, I think out-of-bounds shift values also feel like they're returning None in this framework rather than being classified as overflow. Otherwise, the result is (value, false) if the operation did not overflow, or (wrapped value, true) if it did overflow [2].

Unary generalized arithmetic operations are conversions (which are inherently heterogeneous already), and negation. Binary generalized arithmetic operations are our standard friends +, -, *, /, %, <<, and >>. Ternary operations are fma with various sign permutations. Most of these operations don't have to worry about invalidity--only shifts, division, and its remainder/modulo friend, although for where invalidity matters, overflow tends not to be a concern.

Focusing then on +, -, and *, it's worth noting that hardware primitives tend to only give us op(T, T) -> (T, bool) primitive, so we have to construct heterogeneous operations on top of that. If sign is homogeneous, then the basic implementation is easy:

fn heterogenous(a: T, b: U) -> (V, bool) {
  type C = max_size<T, U, V>;
  let (v: C, o1) = op(a as C, b as C);
  let (v2, o2) = v.checked_convert<V>();
  (v2, o1 | o2)
}

If the signs are heterogeneous and the sizes are homogeneous, well, we enter the question posed by this thread. I think for the case where signs and sizes are both heterogeneous, I think a variant of the above method (zext/sext as appropriate for sign) would work, but I would not want to commit to it without testing it to make sure.

After working all of this out, though, I think a usable generic solution isn't immediately at hand. One of the troubles is that there's no clean way to specify the type of the return value. The only interface I could think of that might come close to usable is a overflows!((a + b) as u32) macro that is able to replace every operator with a call to a potentially heterogeneous generalized arithmetic operation, but that's not an easy macro to write.

[1] Incidentally, the floating-point INVALID exception is what motivated to add in the Option to overflow result. Of course, floating-point exceptions are even more multifaceted than two boolean flags (there's exactness as well to consider), so maybe this isn't the best framework to approach the problem, although I personally struggle to find reasons to care about the INEXACT exception.

[2] Maybe more generally, it's "somewhat sane representation". For FP, the result is saturation, not wrapping, on overflow, but I digress.

T4r4sB · November 20, 2023, 9:32pm

It's just a wrapping_sub.
The problem is to detect overflow with less possible count of branches.
I can do it only using two branches...

if a>=b {
  let result =a.wrapping_sub(b);
  if result  > isize::MAX as usize {
    // overflow
  }
  result as isize
} else {
  let result = b.wrapping_sub(a) - 1;
  if result  > isize::MAX as usize {
    // underflow
  }
  -result as isize - 1
}

Zobean · November 21, 2023, 12:14pm

Yes, the integer result is trivial to obtain, but the overflow handling is less trivial and is the main point of my post. It's been commented on enough that I've made an edit to make it a little clearer upfront.

user16251 · November 23, 2023, 2:51am

It seems like this would be completely covered by stabilizing carrying_add and borrowing_sub. For every possible operation you can think of you'd sign-extend the inputs appropriately (i.e. 0 for u64 and x >> 63 for i64), do the two-limb computation, and then check if sign-extending the result gives you the high word you computed. (In fact this is only necessary for 128-bit computations; everything else would be covered by converting into and out of i128).

Zobean · November 25, 2023, 12:41am

I don't know enough to say one way or another, but my gut feel is that conversion to/from i128/u128 would have an additional runtime overhead compared to just doing the operations on the appropriate-width integers directly. Is that not the case?

user16251 · November 25, 2023, 1:25am

All that really matters is how the code is compiled and LLVM is typically really good at optimizing this kind of code.

user16251 · November 25, 2023, 4:51am

Or maybe not: I was surprised to see how much the assembly diverged for

pub fn v1(x: u64, y: u64) -> bool {
    i64::try_from(i128::from(x) - i128::from(y)).is_ok()
}

pub fn v2(x: u64, y: u64) -> bool {
    const MSB: u64 = 0x8000_0000_0000_0000;
    ((x ^ MSB) as i64).checked_sub((y ^ MSB) as i64).is_some()
}

pub fn v3(x: u64, y: u64) -> bool {
    let (z, borrow) = x.overflowing_sub(y);
    let w = -(borrow as i64);
    ((z as i64) >> 63) == w
}

pub fn v4(x: u64, y: u64) -> bool {
    let z = x.wrapping_sub(y) as i64;
    (x < y) ^ (z >= 0)
}

T4r4sB · November 27, 2023, 8:06pm

That's awesome variant, why couldn't I think of that!

user16251 · November 28, 2023, 2:03am

@quaternic posted it earlier in this thread. I originally learned this technique from chapter 2 of Hacker's Delight by Henry Warren.

I think that detecting overflow for any combination of types in this post is going to come down to some logical function of the carry into bit 63 and the most significant bits of the operands. (The carry out is the majority vote of these three, but it's usually easier to get with add-with-carry than explicitly computing the ternary logic function.) On an architecture such as RISC-V that seems hostile to multiprecision arithmetic, computing the ternary function might be the easiest way to find the result.

system · February 26, 2024, 2:04am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Representing difference between unsigned integers libs	12	5812	March 15, 2021
Unsigned integer type inference language design	24	4429	March 25, 2019
Comparisons between signed and unsigned integers?	6	1676	April 1, 2023
Heterogeneous math language design	14	2381	March 25, 2019
Revisiting the unsigned ideas (deprecated)	5	2063	March 25, 2019

Signed difference of unsigned integers

Related topics