256-bit and 512-bit integers

Correct, but I'd rather not discuss the pros and cons in this thread.

u8 is a type like any other, and types can be shadowed or share the same name with variable names. This is true for primitive types just like custom types such as Result or Box. Why do you think that primitive types should get special treatment? And if primitive types become reserved words, doesn't this imply that other lang items such as Drop, Deref or PhantomData should be reserved too?

8 Likes

Here's the requisite link:

5 Likes

That's probably the most reasonable approach to the problem. Done right, it would also bypass the naming issues that @bjorn3 explained earlier and that @zackw expanded on. I would still like to see many of the fundamental types get reserved so that they can't be shadowed, but using primitive can serve as a stopgap for the security conscious.

I like the concept, but I have some questions about how to implement this. Will you permit operations between different types? That is, if I have an instance of type int<5, 10> and another of type int<10, 15>, can they be added together? If so, what is the resulting type? If not, is it possible to add the instances together? E.g.:

let i: int<5, 10> = 6;
let j: int<10, 15> = 14;
i + (j as int<5, 10>); // Can't do this cast, mathematically illegal
(i as int<10, 15>) + j; // Can't do this cast either
let k: int<5, 15> = i + j; // Auto cast i and j to int<5, 15>???

Once the details about the formal logic were worked out, I'd be interested in this.

Also, my personal preference is that this would use the various range types, so it would be int<5..10>, etc.

Comments on floating point numbers

Also, and with apologies to @mbrubeck and his earlier post, floating point numbers are the tool of the devil. Mathematically, they're aren't even a proper subset of the real number line (NaN doesn't show up on the real number line). At this point, I'd much rather work with a big rational library, down converting to floats for simplified output representation only.

and you get an immediate representation overflow, since (6 + 14) > 15. Perhaps you meant that the output range should be the sum of the input ranges, separately summing the two input lower bounds and the two input upper bounds:

let k: int<5, 25> = i + j; // Auto cast i and j to int<5, 25>???

Actually, I did mean let k: int<5, 15> = i + j;, but I forgot to add the overflow problem to my comment! :sweat_smile:

The issue is that if the compiler is automatically choosing types, it also needs to take into account what the operations are. Here are the minimum ranges possible for various types and various operations (check my math, please!).

let i: int<5..=10> = 6; // Using the syntax I suggested earlier
let j: int<10..=15> = 14;

//// int<min + min, max + max> doesn't work!  Can't cast `i` or `j`
// let k: int<15..=25> = (i as int<15..=25>) + (j as int<15..=25>);
//// This works
let k: int<5..=25> = (i as int<5..=25>) + (j as int<5..=25>);

//// int<min * min, max * max> doesn't work!  Can't cast `i` or `j`
// let l: int<50..=150> = (i as int<50..=150>) * (j as int<50..=150>);
//// This works
let l: int<5..=150> = (i as int<5..=150>) * (j as int<5..=150>);

//// int<min - min, max - max> doesn't work!  Can't cast `i` or `j`
// let m: int<-10..=0> = (i as int<-10..=0>) - (j as int<-10..=0>);
//// This works
let m: int<-10..=15> = (i as int<-10..=15>) - (j as int<-10..=15>);

//// int<min / min, max / max> doesn't work!  Can't cast `i` or `j`
// let n: int<0..=1> = (i as int<0..=1>) / (j as int<0..=1>);
//// This works
let n: int<0..=15> = (i as int<0..=15>) / (j as int<0..=15>);

We can extend this to any other operators we care to name, or if you want to be cruel to the compiler team that needs to decide what the backing store for this is going to be:

let o: int<-1..=1> = 1;
let mut p: int<-1..=1> = 0;
for p in -1..=1 {
    let q: int<???, ???> = o / p;
}

In the first iteration, the backing store needs to be single signed bit, in the second you need the integer version of infinity, and in the third you can store it in a single unsigned bit, so... what now?

The alternative that I chose (but failed to properly explain, for which I apologize!) is to generate a new range from the infimum and supremum of the union of the set of values that can be represented by all ranges involved (operators, and the results of those operators). This leads to questions about what to do on overflow, how to handle casting, etc., etc., etc... OK, I think that I'm now convinced that @H2CO3's suggestion is the most practical one.

You could in theory get the "correct" bounds out by saying

impl<
    const LhsMin: iN, const LhsMax: iN,
    const RhsMin: iN, const RhsMax: iN,
>
ops::Add<iN<RhsMin, RhsMax>> for iN<LhsMin, LhsMax> {
    type Output = iN<
        { LhsMin + RhsMin },
        { LhsMax + RhsMax },
    >;

    fn add(self, rhs: iN<RhsMin, RhsMax>) -> Self::Output {
        type Intermediate = iN<
            { min(LhsMin, RhsMin) },
            { LhsMax + RhsMax },
        >;
        let lhs: Intermediate = self.into();
        let rhs: Intermediate = rhs.into();
        unsafe {
            iN::unchecked_add(lhs, rhs)
        }.try_into().unwrap_or_else(|| unreachable!())
    }
}

(Here I assume iN is a compile time only integer, and iN<Min, Max> is a inclusively bounded integer representation.)


That said, bounded integers like this rarely play out well in practice, I've found. It's easiest to either bound to a machine int (or some other power-of-2 range), or to actually just use a dynamic BigInt type.

2 Likes

I agree that it works for addition, but what about exponentiation? E.g.,

let i: int<4294967296..=18446744073709551616> = 2.pow(63);
let j: int<4294967296..=18446744073709551616> = 2.pow(63);
let k: int<4294967296..=18446744073709551616> = 2.pow(63);
let l: int<4294967296..=18446744073709551616> = 2.pow(63);
i.pow(j.pow(k.pow(l.pow(2.pow(63)))))

Unless I'm wrong, that is a Very Big Number. Large enough that trying to ensure that overflow won't happen isn't really possible, so either we're looking at a compile-time error because the compiler has a hard upper limit on how big a range can get, or a runtime panic when we run out of stack/heap space for the value. Ideally, we'd have a compile-time error, with a really good error message explaining why the error occurred.

The nice thing about @H2CO3's suggestion is that programmers are already familiar with the power-of-two integer types, along with how they behave. This will just be an extension to ever larger types.

Edit

Fixed my broken code example above

In practice you'll likely get a compile time error anyway because it's impossible to store the type of that expression. That or the compiler hangs until it runs out of memory.

Just a quick mention of some prior art:

zig uses arbitrary width integers:

const byte: u8 = 8;
const ascii: u7 = 7;
const big: u512 = 512;
const max_int: i65535 = 65535;
const PackedStruct = struct {
   packed1: u38,
   packed2: u17,
   packed3: u9,
};
4 Likes

Not sure about the type, but the compiler running out of memory is a hard upper limit! :wink:

How is it not a zero cost abstraction? We just need a good enough software emulation implemented in the compiler that you can't practically beat by writing the emulation yourself in the Rust code.

I'm guessing @yigal100 meant zero-cost in the way eg newtype are zero-cost: they don't exist at runtime and so cost nothing at runtime. Using that definition emulation Indeed isn't zero cost.

1 Like

By that definition adding two i32 is not zero cost either, the computation takes a while at runtime. I can't see how this definition is useful while discussing proper hardware support for integer types. If the hardware doesn't directly supports 256 byte integer operations the only way of performing them is some kind of emulation.

1 Like

Bignum arithmetic is challenging to optimize, even just on integers, and it often needs to be re-tuned for each processor generation. I invite you to peruse GMP developers' corner to get an idea of how much work it can be. Pay attention to the last-modification dates on each page; GMP is chronically understaffed.

Having bignums built into the language is a tempting idea, but I'm leery of dumping this much additional work on the stdlib team.

7 Likes

The generally accepted definition of a "zero cost abstraction" or a "zero overhead abstraction" is one where you do not pay for what you don't use and it is not possible to write it more efficiently by hand.

3 Likes

No doubt it is. I kind of assumed LLVM would already have some optimizations around arbitrary size integers since I remember reading that LLVM IR supports integer types in the form of i[0-9]+. Obviously I don't have much knowledge about how all of that plays out in the generated code.

I know, this is the reason I wrote my first reply.

12 posts were split to a new topic: Glob imports can shadow built-in types

FWIW RustCrypto has been working on a crate like that:

As a general observation it's tricky. Many algorithms (e.g. for modular arithmetic) need direct access to "limbs" and are optimized for certain limb sizes. Effectively leveraging things like SIMD in the context of such algorithms can be quite nuanced.

I wouldn't necessarily be opposed to there being a general purpose type for this in core/std, but IMO I probably wouldn't use it and would expect crates can do a better job entirely at the library level.

1 Like

@Tom-Phinney suggested to me that I should drop the proposal of reserving (i|u|f)(8|16|32|64|128|size) in a PM (that he graciously allowed me to reference here to the group as a whole); I'm inclined to agree with him, as others have shown that there are good reasons to keep the current behavior. However, since @mbrubeck split this topic and created the Glob imports can shadow built-in types, I'll post my thoughts on shadowing over there, and limit my comments here to be about bit integers.

@bascule reminded me of a serious issue concerning larger numbers that I'd completely forgotten about; the 'best' algorithm to use for a big integer library depends on the values that are being computed, as well as the environment and conditions that they're being computed within. A pure scientific application might only care about speed, but cryptographic applications may need to do a tradeoff between speed and designs that defeat sidechannel attacks that leak the keys, while embedded systems that are operating on battery might choose slower, but less power-hungry implementations to increase their battery life. Selecting which algorithm to use (including the choice of using hardware-provided operations, or emulating them in software) requires domain knowledge that we as rust-internals members lack.

To make things worse, these choices may be mixed within the same code. One part of the code may be handling cryptographic secrets, while another part might be doing pure science, so we can't really select a single algorithm.

Long story short, I'm now convinced that the only practical way forwards would be via crates that the user can select amongst.

3 Likes