Since folks have expressed wanting to revive this, I decided to take the old RFC and edit it so that it isn't horribly out of date. There are still probably nuances I've missed or prior discussions that should be linked, which is why I want to share it here before actually opening a new RFC. I also haven't been following the discussion on bitfields since then, and I know that they've resurfaced considering how Rust for Linux is a priority.
The biggest change is that it removes the N <= 128 restriction that plagued the last version, since it's been well-established that this isn't super necessary and would just make things more complicated.
Please feel free to share feedback here or submit PRs directly to my fork on the generic-integers-v2 branch.
It's worth mentioning C23 _BitInt, even if only to mention whether our types are/aren't expected to be ABI compatible with them (e.g. how does the improper_ctypes lint act).
Derives can't change the structure of a type. You probably just want to write #[bitfield] instead, or a functionlike macro.
[uint<N>/int<N>] will alias to existing uN and iN types wherever possible.
I think it probably makes more sense to have uN/iN be aliases to the generic types instead. They don't even really need to be primitive names anymore, either, just type aliases defined in core::primitive and pub used in the prelude.
The reason to keep them primitive is if there's an unbounded set of uN types.
There is no official limit on the size of N,
Technically, isize::MAX is a limit, since all types have a maximum size of isize::MAX (plus an architecture dependent size limit which is usually smaller).
Yes, I agree. Adding a section on ABI & C23 to prior art… I initially had mentioned ABI but must have removed it while editing. ><
I knew I wasn't doing this correctly when I hastily rewrote it to be a proc macro instead of repr(bitpacked). Fixed to be a macro invocation instead.
So, I worded this a bit too loosely, and modified it to just say they're equivalent. What's an alias of what doesn't really matter, although you're right that it would be that way around, most likely.
This was actually in the initial edit before I realised that uint<usize::MAX> is always smaller than isize::MAX bytes, because of the factor of 8. So, no, N isn't limited besides being a usize.
There is no official limit on the size of N, although one may be imposed for large N (above u32::MAX) to avoid compiler crashes and other issues. I can't personally imagine someone needing an integer larger than 4 GiB in size, but maybe I don't have that great of an imagination.
int<u32::MAX> is only 0.5 GiB because of that same factor of 8.
Actually, N should probably just be of type u32. int::BITS, count_ones, shifts, etc. all use u32.
Yeah, I genuinely dunno how to feel on this one. My main aversion to using u32 in const generics is the fact that it makes turning it into an array index nontrivial without complex const generic expressions, although I guess that was the case anyway because of the factor of 8.
Of course, you're right that the presence of u32 in all those methods means we probably should use u32 here. Hmm.
I haven't actually read the update yet, but at least in concept, I'm a big fan
One particular motivation would be to avoid the request for traits for SignedInteger and UnsignedInteger, because instead of trying to do blankets over those -- and immediately hitting overlap issues -- it would be way nicer to be able to just impl<const BITS: u32> MyTrait for u<BITS>.
Yeah, agreed. If some things need some temporarily-compiler-magic associated types or something to make the signatures of to_ne_bytes and friends expressible, that's ok.
FWIW, things haven't substantially changed since last time and I do think that uint<N> and int<N> are the right names for these, since the variable i is exceptionally common and well, it feels wrong to just have primitives with single-letter variable names this late into the language's release cycle.
That said, auto-magically making uN and iN work where N is an integer seems pretty okay, and could be a good alternative to, say, uint<7>.
In the RFC text, I actually just punt those off until complex generic expressions are stable, but we can still do whatever we want in the unstable compiler. I just worry that, since AFAIK not substantial work has been done on them in some time, they're still not good enough to use even in this limited use case.
The proposal seems very good, and is something I have wanted for a long time.
Should we implement math traits for all combinations of integers or just those who share same amount of bits (as is the case now)? If this is something that people might want we should use the bounds-based implementation.
impl<const A: u32, const B: u32> Add<uint<B>> for uint<A> {
// this gets messy quickly
type Output = uint<{ (2u32.pow(A) + 2u32.pow(B)).next_power_of_two().ilog2() }>
}
// vs
impl<const A0: i128, const A1: i128, const B0: i128, const B1: i128> Add<int<B0, B1>> for int<A0, A1> {
type Output = int<{ (A0 + B0)..(A1 + B1) }>;
}
type uintbytes<const B: u32> = int<{ 0..2i128.pow(B) }>;
// treat integer literals as unit types?
let foo: int<7..8> = 7;
I personally feel that int should be held for range-bound integers, such as int<5, 10> being guaranteed to be in the range 5..=10. Given that iN and uN could work at a language level as noted, do you have any ideas for forward compatibility with ranged integers name-wise?
I think that they have to be the "same bits only, with the same wrapping/checked/etc we do today" in order to have type u32 = Unsigned<32>; work out.
I'd also like an Integer<MIN, MAX> type where we could have Integer<A, B> + Integer<C, D> -> Integer<{A+C}, {B+D}>, and thus operators would never overflow (there'd just be wrapping/saturating/checked/etc constructors instead) but those are fundamentally different types from the ones we have today, and probably need way more const generics work before they're anywhere close.
It would also make u32 += u32 either inexpressible or have semantics different from u32 + u32 (and bog-standard mutating assignments like u32 = u32 + u32 would also become inexpressible without adding implicit narrowing conversions to the language, so, uh, maybe not feasible )
But things like u32 = (u32 + 2 * u32 + u32) / 4would work, and would actually give the correct value even in the edge cases, no narrowing conversions required.
So I really want that type at least as an option (eventually) because I'd much rather worry about narrowing only at assignment than in every intermediate value the way it is today.
u32 also makes sense because it's the smallest native integer type that is big enough to express the actual maximum bit width in LLVM, which is 2^23.
However making the generic parameter any integer type has the issue of circular reasoning: u32 would be defined as uint<32u32>, where 32u32 has type u32 but that's what's being defined!
One option could be to name them intb<MIN,MAX> or bint for "bounded (unsigned) integer". Alternatively intb<BITS,MIN,MAX>, which additionally would allow having a u32 bounded to a smaller range: intb<32, 5, 10> (not sure if that is needed unless they have to look like C types).
It would be nice if we could have both at the same time: impl Add<u32> for u32 { Output = u32 } with the current behavior (panic in debug, wrapping in release) and a second impl Add<u32> for u32 { Output = u33 }. Unfortunately that is not representable with the current Add trait, but it would allow to have a lint that forbids the use of the first one for crates that care about not panicking in that situation. I believe both are useful to have.
What about code that relies on u32 = (u32 + 2 * u32 + u32) / 4 using wrapping behavior (for some reason), even though it should use wrapping_add? Would it be a breaking change if the code relies on wrapping behavior without using wrapping_add?
I really like that you put in the future possibility for repr(bitpacked). I've recently used nom (and manual bit shifting) for parsing tcp/ip headers which works but isn't ideal. Having the ability to specify those in a repr(bitpacked) struct (plus the ability to convert arbitrary bytes into it if there are no further constraints) would have been really useful there. Especially if there'd be some way to indicate if a number should be big or little endian, perhaps in the method that converts bytes into this bitpacked struct. That would effectively simplify all of this into a single transmute (or similar) call like it can be done in C (where you still have to do the bitshifting yourself as far as I know.