256-bit and 512-bit integers

Hello together,

Rust already supports 128-bit large integers, but depending on the CPU architecture there are already 256-bit (AVX2) and 512-bit (AVX-512) large registers like YMM and ZMM.

Therefore I would like to see an implementation of the types u256, u512, i256 and i512.

Hope others would find this appropriate as well.

Best regards Robin Lindner

What's the primary use case here?

@jhpratt I assume along the same lines as i/u128, and as such, I'd welcome these types.

My main concern here is slowness. Barely any cpus have the hardware for this on board, and thus it would degrade to some kind of emulation, at a potentially hefty performance penalty.

2 Likes

IMO a good reason to have u128/i128 is that they allow using the hardware for multiplying two u64 and get a whole u64 for the overflowing part (at least on x64 this is the case). Without them you would have to emulate the multiplycation using u32s, and that takes much more instructions. The same reason however doesn't apply to 256 and 512 bit integers

1 Like

All that means is that cpu designers need to get their act together and implement such functionality already :wink:

1 Like

Why would they? Are there enough people that would pay more for a processor that does 256-bit arithmetic?

2 Likes

This kind of stuff is always a chicken-and-egg problem. I suspect if some game studio or engine came out with "10% better FPS on avx512-capable processors", demand would go up. However, I don't know if that can be expected without prevalent enough hardware for game developers to put the time into making such improvements either.

1 Like

Note that YMM/ZMM registers aren't single integers. They're arrays of small integers like i64x8, and Rust already has that.

Games generally don't need large integers, and rather have an opposite problem: need to minimize memory bandwidth by using integers as small as possible.

AVX-512 speedups come from processing of bigger batches of small integers and floats, and from bigger selection of instructions for shuffling, packing, etc.

22 Likes

This doesn't seem like a fundamental core type that should be added to core Rust. At least not yet.

Usually people are advised to experiment and gain experience with their own crate before suggesting additions to stdlib. How about a similar approach here?

  1. Have a way to define this type as an intrinsic in a crate, or
  2. Make this type available conditionally, with a cfg / cargo feature?

It just seems counter productive to expose a type that isn't widely available on all mainstream hardware and add software emulation which makes it a no longer a zero cost abstraction.

To add to what people have already said, remember that all multiplication algorithms are super-linear (quadratic-ish). ALU implementations tend to convert time complexity into area complexity, and area is always at a premium. An open-source chip we're designing has a 256-bit multiplier in it and omg so much area.

No one needs 256-bit (or 512-bit, for that matter) multiplication unless they're implementing RSA, and why on earth are you doing RSA when there's a perfectly good Ed25519 (or ECDSA if you can't use it because reasons) right there?

5 Likes

Scientific simulations just might be able to make use of a u512/i512/f512, assuming they're truly hardware-accelerated. Granted though, relative to the size of the CPU market, that is somewhat niche.

Do you have a specific example where this is necessary? According to Wikipedia, the observable universe is about 8.8×1026 m in diameter. Rounding that up to 1027 m and dividing by 2128 yields about 3x10-12 m, or 3 picometers. A helium atom is about 62 picometers in diameter, so with u128 you can already simulate the entire observable universe down to the individual atoms within it with great accuracy. I've done a fair amount of simulator engine design for both school and work, and I haven't yet had to do simulations with that kind of precision... :wink:

That said, please don't take this as my being against the higher values; if the hype around quantum computing ever became reality, we might need these values. It may be better to simply state that rust reserves all identifiers that match the regex (i|u|f)[0-9]+ for future use and leave it at that.

10 Likes

I used 256 bit integers recently. I'm working on a 64-bit rational library (64-bit numerator, 64-bit denominator). 128-bit intermediate values are useful when operating on rationals of differing denominators, but some operations can overflow a 128-bit integer. Using a 256-bit integer was the simplest way to handle these edge cases.

I think that higher floating point types (e.g. f128/f256/f512) could be particularly useful for at least 2 things:

  1. Increase the ranges that are representable. There are numbers that floating types simply cannot represent, this could alleviate that to some extent.
  2. Reduce the inaccuracies that can result with current float types

It wouldn't eliminate either issue however, merely move the goalposts in a similar way as moving from f32 to f64. Of course none of these hypothetical types could or would adhere to any IEEE or ISO standard. But that itself might be an opportunity to handle certain edge cases in a somewhat more sane manner, at least for the common use cases of those types.

let i8 = 0; and struct u64; are already valid. Precisely to avoid breakage when introducing u128 and i128.

I thought "valid" had to be a typo and then I tried it and not only is it not a typo, [ifu][0-9]+ aren't even reserved as type names. This compiles without complaint:

#![allow(non_camel_case_types)]

pub struct i64 { a: f32, b: f32, }
pub struct i128 { a: f64, b: f64, }

pub fn test_i64() -> i64 { i64 { a: 0., b: 0., } }
pub fn test_i128() -> i128 { i128 { a: 0., b: 0., } }

Not something I ever expected to see in a modern programming language, but it does make adding more scalar types easy!

Cool, thank you for the counterexample.

I understand where you're coming from with this, but after having crawled out of a hell of my own devising where the inaccuracies of f64 accumulated over time, I think that if you're doing any scientific computing that requires accuracy you're better off with a big rational library.

:flushed: My shock and revulsion that this is true is effectively infinite. I know that there is no way to fix this as it's stable, but the number of ways that this could be abused is horrifying.

For the record, I know that the compiler won't be confused, but if I see u128, it has a very specific meaning to me, so if it's been redefined as something else...

The problem is when do we stop ? Once we added f128, f256 and f512, why not do it for f1024, then f2048, ... The f64 type is already pretty huge : it can represent numbers larger than the number of atoms in the observable universe or smaller than the plank constant. If you suffer from inaccuracies with 64 bit floating point, what you actually need is probably not a bigger type but a different kind of way to handle numbers, like arbitrary precision arithmetic. There are already libraries to handle this.

IEEE is important since it implies better hardware support. Large types would not benefit much from being part of the language since nearly no hardware are able to handle them. Even if there are 512 bit registers on some processors, they are used to process sets of 64 bit numbers (or even smaller). Types beyond 128 bit are not supported by common hardware, even 128 bit is not supported that much.

Since there is no benefit from direct compiler support, and the usage of these type is pretty rare, they will be better handled by a library. Since in Rust operators can be overloaded, a f512 type from an external library would be nearly transparent.

2 Likes

That isn't the issue. The issue is that is cannot represent a whole bunch of numbers in between the maximum and minimum.

Either when we find a way to replace the likes of IEEE 754 with something fundamentally better (i.e. something that doesn't suffer from representational issues nor from the extremely weird and, at least from first principles, undesirable arithmetic rules it uses. So essentially full-blown hardware-accelerated decimal calculations, if possible), or it'll be a perpetual game of "moving the goalpost" until economics pretty much permanently dictates it isn't worth it anymore (I don't believe we're at that point yet).

It's 64 bits = 8 bytes. That's nothing in 2021, except in specialized situations (e.g. deep learning where you rather need lots of the likes of an f16, for efficiency reasons).

Which is what I'm arguing for above (except hardware accelerated), but that assumes it's possible in the first place (which reminds me, if it is possible, why hasn't it been done yet?). But note that likely none of those libraries are fully hardware accelerated, and so are not competitive in terms of performance, and that alone might make them unusable for certain use cases.

It also fundamentally broke the decimal calculation model, introducing something in its place (floats) that just can't do the job properly, thus requiring all kinds of hacks from the programmer to figure out how to get a decent approximation. For that reason alone, if at all theoretically possible, it should be replaced.

This argument has been made before here, and it's not exactly convincing. You see, it's a chicken and egg problem. If demand picks up, so will supply. If chip builders started building supply right now, there'd be demand for it in no time. It just requires somebody to take the first step and get some recognition for it.

That would be an intermediate stage, not the end of the road. Doing all that in software will likely tank performance, reinforcing the need for hardware acceleration.

But that's what float is about : approximate values. If you want arbitrary precision or accurate decimal representation, you just need something else. Bigger float type just pushes back a part of the problem, and the question is how far should we push them back since there is no end. 64bit is currently a pretty good limit since it is enough for most use cases and it has hardware support.

IEEE 754 has many flaws and they are known since the beginning. It was build on consensus between performance and usability.

Hardware support won't appear because Rust chose to implement a type. It's just not how it works. Others tried unilateral attempts to improve floating point, it just made floating numbers an even worse mess.

If you want to improve number support (and I really believe that there is room for it), compiler support is the last step.

Types beyond 128bits (and even 64 bits on a lot of arch) won't be handled directly by the hardware too.