Pre-RFC: Generic integers v2

clarfonthey · August 7, 2024, 11:00pm

Since folks have expressed wanting to revive this, I decided to take the old RFC and edit it so that it isn't horribly out of date. There are still probably nuances I've missed or prior discussions that should be linked, which is why I want to share it here before actually opening a new RFC. I also haven't been following the discussion on bitfields since then, and I know that they've resurfaced considering how Rust for Linux is a priority.

The biggest change is that it removes the N <= 128 restriction that plagued the last version, since it's been well-established that this isn't super necessary and would just make things more complicated.

Please feel free to share feedback here or submit PRs directly to my fork on the generic-integers-v2 branch.

github.com

clarfonthey/rust-rfcs/blob/generic-integers-v2/text/0000-generic-integers.md

- Feature Name: `generic_integers`
- Start Date: 2024-08-07
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Adds the builtin types `u<N>` and `i<N>`, allowing integers with an arbitrary size in bits.

# Motivation
[motivation]: #motivation

## Generalising code for integers

Right now, there's a *lot* of boilerplate for implementing methods for integer primitives. The standard library itself is a great example; almost the entirety of `core::num` uses some gnarly macros to define all sorts of traits for all the
integer types. One example is `Shl` and `Shr`, which are defined for not just every integer type, but every *combination* of integer types. We could easily do this with const generics instead:

```rust
impl<const N: usize, const M: usize> Shl<u<M>> for u<N> {

This file has been truncated. show original

EDIT: This has an RFC now: Generic Integers V2: It's Time by clarfonthey · Pull Request #3686 · rust-lang/rfcs · GitHub

CAD97 · August 7, 2024, 11:36pm

It's worth mentioning C23 _BitInt, even if only to mention whether our types are/aren't expected to be ABI compatible with them (e.g. how does the improper_ctypes lint act).

#[derive(Bitfield)]
struct MipsInstruction {
    opcode: uint<6>,
    rs: uint<5>,
    rt: uint<5>,
    rd: uint<5>,
    shift: uint<5>,
    function: uint<6>,
}

Derives can't change the structure of a type. You probably just want to write #[bitfield] instead, or a functionlike macro.

[uint<N>/int<N>] will alias to existing uN and iN types wherever possible.

I think it probably makes more sense to have uN/iN be aliases to the generic types instead. They don't even really need to be primitive names anymore, either, just type aliases defined in core::primitive and pub used in the prelude.

The reason to keep them primitive is if there's an unbounded set of uN types.

There is no official limit on the size of N,

Technically, isize::MAX is a limit, since all types have a maximum size of isize::MAX (plus an architecture dependent size limit which is usually smaller).

clarfonthey · August 7, 2024, 11:46pm

Yes, I agree. Adding a section on ABI & C23 to prior art… I initially had mentioned ABI but must have removed it while editing. ><

I knew I wasn't doing this correctly when I hastily rewrote it to be a proc macro instead of repr(bitpacked). Fixed to be a macro invocation instead.

So, I worded this a bit too loosely, and modified it to just say they're equivalent. What's an alias of what doesn't really matter, although you're right that it would be that way around, most likely.

This was actually in the initial edit before I realised that uint<usize::MAX> is always smaller than isize::MAX bytes, because of the factor of 8. So, no, N isn't limited besides being a usize.

CAD97 · August 8, 2024, 12:12am

of course. Although,

There is no official limit on the size of N, although one may be imposed for large N (above u32::MAX) to avoid compiler crashes and other issues. I can't personally imagine someone needing an integer larger than 4 GiB in size, but maybe I don't have that great of an imagination.

int<u32::MAX> is only 0.5 GiB because of that same factor of 8.

Actually, N should probably just be of type u32. int::BITS, count_ones, shifts, etc. all use u32.

clarfonthey · August 8, 2024, 12:17am

Yeah, I genuinely dunno how to feel on this one. My main aversion to using u32 in const generics is the fact that it makes turning it into an array index nontrivial without complex const generic expressions, although I guess that was the case anyway because of the factor of 8.

Of course, you're right that the presence of u32 in all those methods means we probably should use u32 here. Hmm.

scottmcm · August 8, 2024, 1:00am

I haven't actually read the update yet, but at least in concept, I'm a big fan

One particular motivation would be to avoid the request for traits for SignedInteger and UnsignedInteger, because instead of trying to do blankets over those -- and immediately hitting overlap issues -- it would be way nicer to be able to just impl<const BITS: u32> MyTrait for u<BITS>.

Some further talk about that in this comment: https://github.com/rust-lang/libs-team/issues/371#issuecomment-2123360137

Yeah, agreed. If some things need some temporarily-compiler-magic associated types or something to make the signatures of to_ne_bytes and friends expressible, that's ok.

clarfonthey · August 8, 2024, 1:03am

FWIW, things haven't substantially changed since last time and I do think that uint<N> and int<N> are the right names for these, since the variable i is exceptionally common and well, it feels wrong to just have primitives with single-letter variable names this late into the language's release cycle.

That said, auto-magically making uN and iN work where N is an integer seems pretty okay, and could be a good alternative to, say, uint<7>.

In the RFC text, I actually just punt those off until complex generic expressions are stable, but we can still do whatever we want in the unstable compiler. I just worry that, since AFAIK not substantial work has been done on them in some time, they're still not good enough to use even in this limited use case.

Mokuz · August 8, 2024, 3:24am

The proposal seems very good, and is something I have wanted for a long time.

Should we implement math traits for all combinations of integers or just those who share same amount of bits (as is the case now)? If this is something that people might want we should use the bounds-based implementation.

impl<const A: u32, const B: u32> Add<uint<B>> for uint<A> {
    // this gets messy quickly
    type Output = uint<{ (2u32.pow(A) + 2u32.pow(B)).next_power_of_two().ilog2() }>
}
// vs
impl<const A0: i128, const A1: i128, const B0: i128, const B1: i128> Add<int<B0, B1>> for int<A0, A1> {
    type Output = int<{ (A0 + B0)..(A1 + B1) }>;
}

type uintbytes<const B: u32> = int<{ 0..2i128.pow(B) }>;

// treat integer literals as unit types?
let foo: int<7..8> = 7;

but thats probably unreasonable.

pitaj · August 8, 2024, 4:42am

Pretty sure that can be just max(A, B) + 1

jhpratt · August 8, 2024, 4:43am

I personally feel that int should be held for range-bound integers, such as int<5, 10> being guaranteed to be in the range 5..=10. Given that iN and uN could work at a language level as noted, do you have any ideas for forward compatibility with ranged integers name-wise?

pitaj · August 8, 2024, 4:45am

Would we even need to have unsigned and signed in that case? Couldn't you have just one int<MIN, MAX> and it's signed if MIN < 0?

Then you could have the following names:

uint<BITS: u32> unsigned integer with given bitwidth
sint<BITS: u32> signed integer with given bitwidth
int<MIN, MAX> integer with given minimum and maximum values

Another minor benefit of uint and sint is that they are the same length, unlike uint and int.

scottmcm · August 8, 2024, 4:58am

I think that they have to be the "same bits only, with the same wrapping/checked/etc we do today" in order to have type u32 = Unsigned<32>; work out.

I'd also like an Integer<MIN, MAX> type where we could have Integer<A, B> + Integer<C, D> -> Integer<{A+C}, {B+D}>, and thus operators would never overflow (there'd just be wrapping/saturating/checked/etc constructors instead) but those are fundamentally different types from the ones we have today, and probably need way more const generics work before they're anywhere close.

Mokuz · August 8, 2024, 5:04am

I suppose you are right, and that would also be breaking change since we'd have to change the output of u32 + u32 to be u33

jdahlstrom · August 8, 2024, 5:49am

It would also make u32 += u32 either inexpressible or have semantics different from u32 + u32 (and bog-standard mutating assignments like u32 = u32 + u32 would also become inexpressible without adding implicit narrowing conversions to the language, so, uh, maybe not feasible )

scottmcm · August 8, 2024, 6:03am

But things like u32 = (u32 + 2 * u32 + u32) / 4 would work, and would actually give the correct value even in the edge cases, no narrowing conversions required.

So I really want that type at least as an option (eventually) because I'd much rather worry about narrowing only at assignment than in every intermediate value the way it is today.

jhpratt · August 8, 2024, 6:27am

Yeah, it would be int<min, max> in my example. I typed it up quickly and without thinking.

This is exactly what I want in the future. It's definitely doable.

SkiFire13 · August 8, 2024, 7:17am

u32 also makes sense because it's the smallest native integer type that is big enough to express the actual maximum bit width in LLVM, which is 2^23.

However making the generic parameter any integer type has the issue of circular reasoning: u32 would be defined as uint<32u32>, where 32u32 has type u32 but that's what's being defined!

DragonDev1906 · August 8, 2024, 9:34am

One option could be to name them intb<MIN,MAX> or bint for "bounded (unsigned) integer". Alternatively intb<BITS,MIN,MAX>, which additionally would allow having a u32 bounded to a smaller range: intb<32, 5, 10> (not sure if that is needed unless they have to look like C types).

It would be nice if we could have both at the same time: impl Add<u32> for u32 { Output = u32 } with the current behavior (panic in debug, wrapping in release) and a second impl Add<u32> for u32 { Output = u33 }. Unfortunately that is not representable with the current Add trait, but it would allow to have a lint that forbids the use of the first one for crates that care about not panicking in that situation. I believe both are useful to have.

What about code that relies on u32 = (u32 + 2 * u32 + u32) / 4 using wrapping behavior (for some reason), even though it should use wrapping_add? Would it be a breaking change if the code relies on wrapping behavior without using wrapping_add?

DragonDev1906 · August 8, 2024, 9:50am

I really like that you put in the future possibility for repr(bitpacked). I've recently used nom (and manual bit shifting) for parsing tcp/ip headers which works but isn't ideal. Having the ability to specify those in a repr(bitpacked) struct (plus the ability to convert arbitrary bytes into it if there are no further constraints) would have been really useful there. Especially if there'd be some way to indicate if a number should be big or little endian, perhaps in the method that converts bytes into this bitpacked struct. That would effectively simplify all of this into a single transmute (or similar) call like it can be done in C (where you still have to do the bitshifting yourself as far as I know.

kornel · August 8, 2024, 10:03am

A bare type with a range in it would carry these numbers everywhere, which would be equivalent of having "magic numbers" instead of constants.

I think ranged numbers would usually be type aliases, so the range definition can be an orthogonal feature:

type NumberOfFingers = u8 in 0..=10;
// could use uint<4> too

Topic		Replies	Views
Pre-RFC: Generic integers (uint<N> and int<N>) language design	50	7200	March 25, 2019
Pre-RFC: variadic tuples attempt #80973022 language design	8	693	November 13, 2024
Variadic Generics ideas (deprecated)	7	3314	March 25, 2019
Pre-RFC: Integer Templating internals	35	12618	March 25, 2019
This week's older RFCs	3	1226	March 25, 2019

Pre-RFC: Generic integers v2

Related topics