Pre-RFC: Generic integers (uint<N> and int<N>)

@gbutler Storing signed types sign-extended was my original hunch, but I quickly discarded it when I realized that the sign extension scatters the contiguous range of invalid values (it needs to be contiguous for layout optimizations) all over the place. But on second thought, there is a reasonbly large contiguous niche, though it is smaller than for unsigned types and still seems pretty messy.

For example, for i6, there is the niche 0b00100000..=0b11011111. None of the bit patterns in that range are valid sign extensions since the first three bits differ from each other. However, I don’t see the general pattern yet (for how to compute this niche for a given N), and I can imagine much more fun things than working it out.

FWIW I added a sort-of attempt to explain the niche optimizations, although I’m not 100% sure how these work so it probably needs work. I’m a bit confused why the niche space needs to be continuous, though; would you be able to explain @hanna-kruppe?

I believe it's mostly to make it tractable to compute optimized layouts. However, I'm not an expert on this, I'm just going by the fact that all the layout code currently uses a single niche and some vague memories of IRC discussions with @eddyb.

The restriction to one range is both an optimization of the niche/valid-range tracking logic and an optimization of enum encoding (because disjoint values would require more checks to decode the discriminant).

3 Likes

Isn't it just sext(iN::MAX)+1 ..= sext(iN::MIN)-1?

bits value
00000000 0
00000001 1 (i2::MAX)
00000010 niche
00000011 niche
niche
11111100 niche
11111101 niche
11111110 -2 (i2::MIN)
11111111 -1
2 Likes

Fair enough, I was speculating wildly :stuck_out_tongue:

I like the idea of uint<N> and int<N> IFF (when using #[repr(ordered,packed)] (since they wouldn’t have strictly “C” repr so we’d need something else to specify that it’s ordered. or just overload repr(C) but that would be confusing)) they are compatible with ASN.1 UPER.

Mostly because having so would make it trivial to compile ASN.1 into Rust. And I seem to work with those a lot.

It looks like once const generics ships, most of this proposal could be implemented as a library, possibly in slightly modified form. That would make it easier to experiment with different designs and decide which one should be the basis for builtin generic integers – which I do agree should be the ultimate goal. Though even then, parts could be implemented as a library feature in libcore rather than depending on compiler magic; for example, libcore could provide impls of Add, Mul, etc. for large integer types, without having to worry about language items.

Edit: Also, I don’t think it would be a good idea for usize to be an alias for u32 or u64, even if that could somehow be done backwards compatibly. If it were an alias, it would be easy to accidentally write code that only compiles on 64-bit architectures (or whatever kind of architecture you’re developing on), because you passed a u64 variable to something that takes usize or vice versa. In other languages such as C, this is less of an issue because conversions between different integer types happen implicitly, truncating the value if necessary – but from Rust’s perspective that’s considered an anti-feature.

8 Likes

Currently usize = u32 or usize = u64, but that’s only because the i128 RFC did not land usize = u128 for RISC-V with the RV128 base, probably because (I suspect) LLVM doesn’t yet support usize = u128. If uint<N> lands, then Rust is also likely to have usize = u24, probably as a downcast from u32, for the earlier-generation small µCs and SoCs that have only 24-bit addressing.

Postscript: I should add that some parts of Rustc already support usize = u16, though that support is not pervasive. I’ve assumed that usize = u16 was included to extend Rust’s potential domain to 8b and some 16b µCs, which tend to be very inexpensive chips suitable for very high volume applications (e.g., a car door or window controller or a simple countertop kitchen appliance without WiFi).

1 Like

Continuing on the discussion about monomorphization errors, I think that the RFC should have a plan for > 128 such that before the feature is stabilized, arbitrary size const N: usize are supported. The monomorphization errors noted by @hanna-kruppe for [T; 1 + (!0usize) / (1024 * 128)]; are ridiculously large and just won't happen in any form of real code. Meanwhile uint<512> is not unlikely in some code.

I also now noticed:

Additionally, int<0> is not allowed as 20-1 = ½ would no longer be an integer. uint<0> is also not allowed for consistency, although it would be well-defined, as 20-1 = 0. In the future, uint<0> may be defined as a zero-sized zero type, but this RFC opts to avoid that for now.

So.. I'm thinking that int<0> can be well defined by saying that since -½ and ½ are not integers, it is uninhabited, and thus operationally equivalent to ! (never_type).

2 Likes

Since rust has ZSTs, I would be very surprised if uint<0> wasn't just the obvious 0-bit type.

While I agree that int<0> is weird, since it doesn't have the sign bit its name implies it must, there are other properties that can give it a meaning, like the fact that transmute<uint<N>, int<N>>(0) is still zero for all N, so int<0> is also a ZST representing 0. Similarly, int<N>::default() is 0 for all N, so should be for N == 0 too.

5 Likes

Right; I considered ZSTs as well, but I didn't find a rationale for those. Yours is a good one :slight_smile:

1 Like

I agree with your reasoning, @scottmcm. Technically, int<N> doesn’t break the invariant that it lies between the range [-2n-1, 2n-1-1], as zero is the only integer in the range [-½, ½]. I didn’t actually think of it this way until just now.

2 Likes

Continuing on the discussion about monomorphization errors, I think that the RFC should have a plan for > 128 such that before the feature is stabilized, arbitrary size const N: usize are supported.

Would you mind elaborating what you think this plan should have? I started writing something but mostly all that I could think of were implementation details, and guarantees about size_of::<uint<N>>().

I'm not so much interested in the technical details right now (and I'm hardly an expert in this field...) What I'm looking for is a commitment to support N > 128, that's all :slight_smile:

PS: If you can think of implementation details, those are nice to jot down.

Excellent point! I'd written it out with ⌈⌉, but that looked terrible, so deleted it.

nit: You mean [−2n-1, 2n-1), since the formula you have actually generates [-½,-½].

soapbox: Once again, half-open ranges are superior to trying to futz about with ±1 :wink:

3 Likes

An idea for related sugar is to also support supplying the range of values that is required from the integer instead of supplying the number of bits you require and leave it to the compiler to map this to an appropriate underlying N-bit representation. In some cases, this would allow design intent to be expressed more directly.

(This idea is inspired by Ada which supports fairly robust integer subrange functionality. For example, an integer with a range of 10…300 which the compiler then maps to a concrete representation, does runtime range checks on, etc., etc.)

Edit: I see that this suggestion was already made above in this link:

Although it does qualitatively feel different from the intentions of the pre-RFC, there is some overlap especially in terms of the idea of mapping the logical uint type to a concrete underlying hardware type.

This. It kind of seems wasteful to add a core language feature which would be redundant with another, more generic (no pun intended) feature. A library-defined Bits<const expression> would definitely be useful, and a nice example of a use case for const generics. AFAICT no use case mentioned so far would require that these types be primitives/builtins.

+1. Again, such a request seems to stem from the usual "but it's so easy in other languages!" fallacy. usize not being a mere relabelling of u32 or u64 or u(sizeof(pointer)) is not an accidental pain point or a design error; it's a deliberate decision which forces programmers to think about whether they need an exact- and constant-sized integer, or a pointer-sized, variable-width (across platforms) integer.

(Incidentally, the same argument seems to come up almost constantly and every single time the discussion is shifted to a feature in Rust that exposes a problem and forces its users to think about it, instead of silently doing the wrong thing. It's been cited in the case error handling, from()/into() conversions, and who knows what other features. But it's still not a good idea, for the same reasons, to give up correctness for marginal convenience.)

5 Likes

One issue is that making Bits<32> be the same type as u32 wouldn't really work in a library. In theory you could do it by making Bits a type alias for an associated type projection – type Bits<const N: usize> = <N as GetBitsType>::Type – but then impls like impl<const N: usize> Foo for Bits<N> would be disallowed by coherence rules. More practically, it would have to be a separate type.

Yeah, I think I'd prefer true range types as well – though perhaps there should be both. In theory, there could be range and bit-width syntaxes for the same group of types: range<0, 127> could be the same type as uint<7>. However, offhand, I'd expect adding two range<0, 127> values to produce a range<0, 254>, while to be consistent with the existing builtin integer types, adding two uint<7>s would have to yield another uint<7>. An alternative would be to support both range<A, B> and uint<N> as entirely separate groups of types, but that seems like a confusing proliferation of integer types.

2 Likes

Sure, although I'm pretty sure that's a feature, not a bug.