Possible u128::div_mod_64 function


#1

Rust std lib has some handy functions that often map to single CPU instructions, like u64::count_ones, leading_zeros, rotate_left, etc. Some CPUs also have a divq operation that divides a 128 bit number by a 64 bit number, and returns a pair of 64 bit quotient and remainder. I think currently (unlike mulq and addq [1]) it’s not easy to access this specific instruction from Rust code (unless you use asm). So is it a good idea to somehow add such function to rust std lib?

Instead of adding a special function the usual alternative solution is to improve the compiler so it’s able to spot code like:

let (q, r) = ((a / u128::from(b)) as u64,
              (a % u128::from(b)) as u64);

This seems even harder than spotting cases where the programmer meant to perform a rotate_left and rotate_right (as D compiler sometimes fails to do).

A possible name:

let a: u128 = ...;
let b: u64 = ...;
let (q, r): (u64, u64) = a.div_mod_64(b);

[1] Example:

fn foo(x: u64, y: u64) -> u64 {
    ((u128::from(x) * u128::from(y)) >> 64) as u64
}

Gives:

foo:
        mov     rax, rsi
        mul     rdi
        mov     rax, rdx
        ret

#2

One challenge is overflow – x86 DIV raises a divide-error CPU exception if the quotient is too large.


#3

How much large is that “too large”?


#4

For u128/u64, the quotient has to fit in a u64.


#5

Maybe we should make div_mod generic so that A::div_mod(A, B) -> (A, B). That way if I divide by (eg.) a u8 I know, at the type level, that the modulo will be a u8.


#6

The other pre-condition of div_mod_64 is b != 0. It’s the same with the other operations, like mul, its inputs should not overflow the bit length of the given integers. There are un-proofed pre-conditions for most rust operators. The situation with div_mod_64 doesn’t look much different.

Regarding the idea of a more general cross-type div_mod, it could help removing some explicit casting from Rust code, like here:

fn digits_sum(mut n: u128) -> u32 {
    let mut total = 0;
    while n != 0 {
        let (n2, r) = n.div_mod(10_u32);
        total += r;
        n = n2;
    }
    total
}

#7

Because division by zero is UB / traps, we insert a runtime check for zero. We’d have to do the same for this precondition, but is there a way to do that without making check+divq more expensive than computing the divmod the way it’s currently done?


#8

Can we use an approximation? Rather than doing a division for the check, we could check the bit counts.


#9

See also: https://crates.io/crates/specialized-div-rem