256-bit and 512-bit integers

jjpe · July 24, 2021, 8:43am

Which is what I've been arguing for all along, with larger floats as a fallback.

The ideal would be hardware-accelerated, arbitrary-precision decimals.

It was also designed with the relatively limited knowledge of the time. So unless hardware-accelerated decimals are fundamentally impossible (which it might be if you want arbitrary precision, which I do), surely it's possible to do better in 2021 than some decades-old standard.

Not when you look at it from my perspective: it shouldn't be the programmer's job to accommodate a design that was never intended for human beings (if it was, we wouldn't have to perform pretty arbitrary tricks just to get a correct answer). But to be fair, indeed this issue wouldn't be solved by larger float types.

It just might. Remember, a good idea needs to be implemented once for people to see, and then it proliferates like nuclear weaponry in a world without nonproliferation, merely due to competitive pressure. Simple example: the C++ people went "oh @&#@)" once Rust appeared and started shoring up their static analyses. It's not quite as good as Rust due to backwards compat, but they clearly saw the value proposition of borrowck.

Now hardware is an additional barrier, but there's plenty of precedent there too. What do you think of modern day gfx cards? Those started out as mere pipes for a frame buffer to be written to a screen. As software demanded more and more, the hardware manufacturers accommodated that.

Taking this at face value, then it's perhaps time to think about throwing out floats altogether and replace them with something fundamentally better: true decimals.

In the beginning, no. But commonly used languages implementing such types make a great case for hardware support. Much more so than library support. So while that's a good first step, I'd say lang support should definitely follow at some point.

dhardy · July 24, 2021, 8:53am

Concern: 256- and 512-bit types may not get uniform support leading to more portability headaches.

Until somewhat recently Emscripten had no support for 128-bit integers. Rust has had an emscripten target for considerably longer than this, most of the time without i128 / u128 types. As a maintainer of the rand crates, this was a considerable nuicance, especially having cfgs to swap out two different generators using u128 for Emscripten targets.

zackw · July 24, 2021, 2:28pm

I'm fairly sure that the abstract Right Thing for doing arithmetic in ℝ on a computer is not arbitrary precision decimals, but rather Gosper-style arbitrary precision continued fractions. There are some awkward problems (like, how do you handle the fact that any single term of a continued fraction expansion might need to be arbitrarily large?) but unlike floating-point arithmetic in any particular base this doesn't just move the problems around.

Aloso · July 24, 2021, 10:35pm

Just because numbers are usually represented in decimal doesn't mean that calculation in binary is broken. Floats wouldn't exist if they couldn't do the job properly. Whether you round a number in decimal or in binary doesn't affect the precision.

I shouldn't have to explain why numeric types are usually represented in binary; even with hardware support for decimal numbers, operations on decimal numbers would be slower than operations on binary numbers, because computers operate on bits.

Hardware support doesn't make things magically fast; note that a single cpu instruction can require many cycles if it performs a complex task.

Why do you think that precision must be arbitrary? For most operations, the necessary precision has an upper bound. If that bound is x, you can choose a type with a fixed precision of least x. If you have unbounded precision, you also have unbounded memory consumption and unbounded computation time for arithmetic operations. This is not what most people want by default.

Old doesn't imply outdated. This standard has survived decades and nobody managed to come up with something fundamentally better in the meantime.

mbrubeck · July 24, 2021, 11:30pm

Proposals for new hardware features are not really on-topic here. Can we steer this back to the original question about wider integer types in the Rust programming language (or if there is nothing more to say about that, then drop it)?

programmerjake · July 25, 2021, 6:18pm

If you want exact value arithmetic for a large subset of real numbers (more than just rational numbers), you can use the algebraic numbers library I wrote, it can handle all (within available memory) real algebraic numbers, so it can do things like do exact calculations involving sqrt(2) or cbrt(4 + sqrt(5)) or 1234675435^(35/4). It uses advanced algorithms for factoring polynomials (needed to end up with canonical representations for algebraic numbers), so isn't too horribly slow for complex cases (factoring polynomials quickly is hard).

H2CO3 · July 26, 2021, 10:44am

The question is: how long are we willing to extend the set of primitive types? If we have u512 without hardware support, why should we stop there? Then why not bake in u1024, u2048, and so on?

I think at this point it's unreasonble to expect extending the set of primitive integers even further ad infinitum. There should instead be a single, const generic, preferably non-heap-allocating, arbitrary-width integer type for those who want to do big math. It could use intrinsics to lower to LLVM's arbitrary-width integers as well. This would resolve the endless debates as to what does and doesn't belong in the core language.

gilescope · July 26, 2021, 5:42pm

Maybe with 2021 edition we could deprecate use of u8 as a var name? No good can come of it.

jhpratt · July 26, 2021, 5:57pm

The cutoff for edition changes has long passed.

jhpratt · July 26, 2021, 7:28pm

For what it's worth I'm still a fan of ranged integers. int<5, 10> could represent an integer between 5 and 10 inclusively. The backing type would be up to the compiler.

Uther · July 27, 2021, 2:25pm

But the size of your int type would be limited by the type of its const generics parameters

jhpratt · July 27, 2021, 5:11pm

Correct, but I'd rather not discuss the pros and cons in this thread.

Aloso · July 27, 2021, 11:42pm

u8 is a type like any other, and types can be shadowed or share the same name with variable names. This is true for primitive types just like custom types such as Result or Box. Why do you think that primitive types should get special treatment? And if primitive types become reserved words, doesn't this imply that other lang items such as Drop, Deref or PhantomData should be reserved too?

CAD97 · July 28, 2021, 12:36am

Here's the requisite link:

github.com

rust-lang/rust/blob/2faabf579323f5252329264cc53ba9ff803429a3/src/test/ui/weird-exprs.rs#L87-L107

    
      
          fn u8(u8: u8) {
              if u8 != 0u8 {
                  assert_eq!(8u8, {
                      macro_rules! u8 {
                          (u8) => {
                              mod u8 {
                                  pub fn u8<'u8: 'u8 + 'u8>(u8: &'u8 u8) -> &'u8 u8 {
                                      "u8";
                                      u8
                                  }
                              }
                          };
                      }
          
          
            u8!(u8);
                      let &u8: &u8 = u8::u8(&8u8);
                      ::u8(0u8);
                      u8
                  });
              }
          }

ckaran · August 2, 2021, 1:36pm

That's probably the most reasonable approach to the problem. Done right, it would also bypass the naming issues that @bjorn3 explained earlier and that @zackw expanded on. I would still like to see many of the fundamental types get reserved so that they can't be shadowed, but using primitive can serve as a stopgap for the security conscious.

I like the concept, but I have some questions about how to implement this. Will you permit operations between different types? That is, if I have an instance of type int<5, 10> and another of type int<10, 15>, can they be added together? If so, what is the resulting type? If not, is it possible to add the instances together? E.g.:

let i: int<5, 10> = 6;
let j: int<10, 15> = 14;
i + (j as int<5, 10>); // Can't do this cast, mathematically illegal
(i as int<10, 15>) + j; // Can't do this cast either
let k: int<5, 15> = i + j; // Auto cast i and j to int<5, 15>???

Once the details about the formal logic were worked out, I'd be interested in this.

Also, my personal preference is that this would use the various range types, so it would be int<5..10>, etc.

Comments on floating point numbers

Also, and with apologies to @mbrubeck and his earlier post, floating point numbers are the tool of the devil. Mathematically, they're aren't even a proper subset of the real number line (NaN doesn't show up on the real number line). At this point, I'd much rather work with a big rational library, down converting to floats for simplified output representation only.

Tom-Phinney · August 2, 2021, 4:13pm

ckaran:

let i: int<5, 10> = 6;
let j: int<10, 15> = 14;
i + (j as int<5, 10>); // Can't do this cast, mathematically illegal
(i as int<10, 15>) + j; // Can't do this cast either
let k: int<5, 15> = i + j; // Auto cast i and j to int<5, 15>???

and you get an immediate representation overflow, since (6 + 14) > 15. Perhaps you meant that the output range should be the sum of the input ranges, separately summing the two input lower bounds and the two input upper bounds:

let k: int<5, 25> = i + j; // Auto cast i and j to int<5, 25>???

ckaran · August 2, 2021, 6:52pm

Actually, I did mean let k: int<5, 15> = i + j;, but I forgot to add the overflow problem to my comment!

The issue is that if the compiler is automatically choosing types, it also needs to take into account what the operations are. Here are the minimum ranges possible for various types and various operations (check my math, please!).

let i: int<5..=10> = 6; // Using the syntax I suggested earlier
let j: int<10..=15> = 14;

//// int<min + min, max + max> doesn't work!  Can't cast `i` or `j`
// let k: int<15..=25> = (i as int<15..=25>) + (j as int<15..=25>);
//// This works
let k: int<5..=25> = (i as int<5..=25>) + (j as int<5..=25>);

//// int<min * min, max * max> doesn't work!  Can't cast `i` or `j`
// let l: int<50..=150> = (i as int<50..=150>) * (j as int<50..=150>);
//// This works
let l: int<5..=150> = (i as int<5..=150>) * (j as int<5..=150>);

//// int<min - min, max - max> doesn't work!  Can't cast `i` or `j`
// let m: int<-10..=0> = (i as int<-10..=0>) - (j as int<-10..=0>);
//// This works
let m: int<-10..=15> = (i as int<-10..=15>) - (j as int<-10..=15>);

//// int<min / min, max / max> doesn't work!  Can't cast `i` or `j`
// let n: int<0..=1> = (i as int<0..=1>) / (j as int<0..=1>);
//// This works
let n: int<0..=15> = (i as int<0..=15>) / (j as int<0..=15>);

We can extend this to any other operators we care to name, or if you want to be cruel to the compiler team that needs to decide what the backing store for this is going to be:

let o: int<-1..=1> = 1;
let mut p: int<-1..=1> = 0;
for p in -1..=1 {
    let q: int<???, ???> = o / p;
}

In the first iteration, the backing store needs to be single signed bit, in the second you need the integer version of infinity, and in the third you can store it in a single unsigned bit, so... what now?

The alternative that I chose (but failed to properly explain, for which I apologize!) is to generate a new range from the infimum and supremum of the union of the set of values that can be represented by all ranges involved (operators, and the results of those operators). This leads to questions about what to do on overflow, how to handle casting, etc., etc., etc... OK, I think that I'm now convinced that @H2CO3's suggestion is the most practical one.

CAD97 · August 2, 2021, 7:36pm

You could in theory get the "correct" bounds out by saying

impl<
    const LhsMin: iN, const LhsMax: iN,
    const RhsMin: iN, const RhsMax: iN,
>
ops::Add<iN<RhsMin, RhsMax>> for iN<LhsMin, LhsMax> {
    type Output = iN<
        { LhsMin + RhsMin },
        { LhsMax + RhsMax },
    >;

    fn add(self, rhs: iN<RhsMin, RhsMax>) -> Self::Output {
        type Intermediate = iN<
            { min(LhsMin, RhsMin) },
            { LhsMax + RhsMax },
        >;
        let lhs: Intermediate = self.into();
        let rhs: Intermediate = rhs.into();
        unsafe {
            iN::unchecked_add(lhs, rhs)
        }.try_into().unwrap_or_else(|| unreachable!())
    }
}

(Here I assume iN is a compile time only integer, and iN<Min, Max> is a inclusively bounded integer representation.)

That said, bounded integers like this rarely play out well in practice, I've found. It's easiest to either bound to a machine int (or some other power-of-2 range), or to actually just use a dynamic BigInt type.

ckaran · August 2, 2021, 8:25pm

I agree that it works for addition, but what about exponentiation? E.g.,

let i: int<4294967296..=18446744073709551616> = 2.pow(63);
let j: int<4294967296..=18446744073709551616> = 2.pow(63);
let k: int<4294967296..=18446744073709551616> = 2.pow(63);
let l: int<4294967296..=18446744073709551616> = 2.pow(63);
i.pow(j.pow(k.pow(l.pow(2.pow(63)))))

Unless I'm wrong, that is a Very Big Number. Large enough that trying to ensure that overflow won't happen isn't really possible, so either we're looking at a compile-time error because the compiler has a hard upper limit on how big a range can get, or a runtime panic when we run out of stack/heap space for the value. Ideally, we'd have a compile-time error, with a really good error message explaining why the error occurred.

The nice thing about @H2CO3's suggestion is that programmers are already familiar with the power-of-two integer types, along with how they behave. This will just be an extension to ever larger types.

Edit

Fixed my broken code example above

SkiFire13 · August 2, 2021, 8:36pm

In practice you'll likely get a compile time error anyway because it's impossible to store the type of that expression. That or the compiler hangs until it runs out of memory.

Topic		Replies	Views
Pre-RFC: Generic integers (uint<N> and int<N>) language design	50	7704	March 25, 2019
Pre-RFC: Arbitrary bit-width integers language design	49	8900	February 21, 2022
Pre-RFC: Generic integers v2	83	3305	February 17, 2026
Pre-RFC Introduction of Half and Quadruple Precision Floats (f16 and f128) language design	38	10925	March 25, 2019
pre-RFC: primitive abstraction traits libs	20	1754	September 28, 2023

256-bit and 512-bit integers

Comments on floating point numbers

Edit

Related topics