Pre-RFC: Generic integers (uint<N> and int<N>)

hanna-kruppe · May 25, 2018, 9:08pm

@gbutler Storing signed types sign-extended was my original hunch, but I quickly discarded it when I realized that the sign extension scatters the contiguous range of invalid values (it needs to be contiguous for layout optimizations) all over the place. But on second thought, there is a reasonbly large contiguous niche, though it is smaller than for unsigned types and still seems pretty messy.

For example, for i6, there is the niche 0b00100000..=0b11011111. None of the bit patterns in that range are valid sign extensions since the first three bits differ from each other. However, I don’t see the general pattern yet (for how to compute this niche for a given N), and I can imagine much more fun things than working it out.

clarfonthey · May 25, 2018, 9:56pm

FWIW I added a sort-of attempt to explain the niche optimizations, although I’m not 100% sure how these work so it probably needs work. I’m a bit confused why the niche space needs to be continuous, though; would you be able to explain @hanna-kruppe?

hanna-kruppe · May 25, 2018, 10:34pm

I believe it's mostly to make it tractable to compute optimized layouts. However, I'm not an expert on this, I'm just going by the fact that all the layout code currently uses a single niche and some vague memories of IRC discussions with @eddyb.

eddyb · May 25, 2018, 10:54pm

The restriction to one range is both an optimization of the niche/valid-range tracking logic and an optimization of enum encoding (because disjoint values would require more checks to decode the discriminant).

scottmcm · May 26, 2018, 1:03am

Isn't it just sext(iN::MAX)+1 ..= sext(iN::MIN)-1?

bits	value
00000000	0
00000001	1 (`i2::MAX`)
00000010	niche
00000011	niche
⁞	niche
11111100	niche
11111101	niche
11111110	-2 (`i2::MIN`)
11111111	-1

Centril · May 26, 2018, 1:31am

Fair enough, I was speculating wildly

Soni · May 26, 2018, 1:21pm

I like the idea of uint<N> and int<N> IFF (when using #[repr(ordered,packed)] (since they wouldn’t have strictly “C” repr so we’d need something else to specify that it’s ordered. or just overload repr(C) but that would be confusing)) they are compatible with ASN.1 UPER.

Mostly because having so would make it trivial to compile ASN.1 into Rust. And I seem to work with those a lot.

comex · May 26, 2018, 8:04pm

It looks like once const generics ships, most of this proposal could be implemented as a library, possibly in slightly modified form. That would make it easier to experiment with different designs and decide which one should be the basis for builtin generic integers – which I do agree should be the ultimate goal. Though even then, parts could be implemented as a library feature in libcore rather than depending on compiler magic; for example, libcore could provide impls of Add, Mul, etc. for large integer types, without having to worry about language items.

Edit: Also, I don’t think it would be a good idea for usize to be an alias for u32 or u64, even if that could somehow be done backwards compatibly. If it were an alias, it would be easy to accidentally write code that only compiles on 64-bit architectures (or whatever kind of architecture you’re developing on), because you passed a u64 variable to something that takes usize or vice versa. In other languages such as C, this is less of an issue because conversions between different integer types happen implicitly, truncating the value if necessary – but from Rust’s perspective that’s considered an anti-feature.

Tom-Phinney · May 26, 2018, 9:18pm

Currently usize = u32 or usize = u64, but that’s only because the i128 RFC did not land usize = u128 for RISC-V with the RV128 base, probably because (I suspect) LLVM doesn’t yet support usize = u128. If uint<N> lands, then Rust is also likely to have usize = u24, probably as a downcast from u32, for the earlier-generation small µCs and SoCs that have only 24-bit addressing.

Postscript: I should add that some parts of Rustc already support usize = u16, though that support is not pervasive. I’ve assumed that usize = u16 was included to extend Rust’s potential domain to 8b and some 16b µCs, which tend to be very inexpensive chips suitable for very high volume applications (e.g., a car door or window controller or a simple countertop kitchen appliance without WiFi).

Centril · May 27, 2018, 10:47am

Continuing on the discussion about monomorphization errors, I think that the RFC should have a plan for > 128 such that before the feature is stabilized, arbitrary size const N: usize are supported. The monomorphization errors noted by @hanna-kruppe for [T; 1 + (!0usize) / (1024 * 128)]; are ridiculously large and just won't happen in any form of real code. Meanwhile uint<512> is not unlikely in some code.

I also now noticed:

Additionally, int<0> is not allowed as 2^0-1 = ½ would no longer be an integer. uint<0> is also not allowed for consistency, although it would be well-defined, as 2^0-1 = 0. In the future, uint<0> may be defined as a zero-sized zero type, but this RFC opts to avoid that for now.

So.. I'm thinking that int<0> can be well defined by saying that since -½ and ½ are not integers, it is uninhabited, and thus operationally equivalent to ! (never_type).

scottmcm · May 27, 2018, 7:45pm

Since rust has ZSTs, I would be very surprised if uint<0> wasn't just the obvious 0-bit type.

While I agree that int<0> is weird, since it doesn't have the sign bit its name implies it must, there are other properties that can give it a meaning, like the fact that transmute<uint<N>, int<N>>(0) is still zero for all N, so int<0> is also a ZST representing 0. Similarly, int<N>::default() is 0 for all N, so should be for N == 0 too.

Centril · May 27, 2018, 11:49pm

Right; I considered ZSTs as well, but I didn't find a rationale for those. Yours is a good one

clarfonthey · May 28, 2018, 12:07am

I agree with your reasoning, @scottmcm. Technically, int<N> doesn’t break the invariant that it lies between the range [-2^n-1, 2^n-1-1], as zero is the only integer in the range [-½, ½]. I didn’t actually think of it this way until just now.

clarfonthey · May 28, 2018, 2:41am

Continuing on the discussion about monomorphization errors, I think that the RFC should have a plan for > 128 such that before the feature is stabilized, arbitrary size const N: usize are supported.

Would you mind elaborating what you think this plan should have? I started writing something but mostly all that I could think of were implementation details, and guarantees about size_of::<uint<N>>().

Centril · May 28, 2018, 3:02am

I'm not so much interested in the technical details right now (and I'm hardly an expert in this field...) What I'm looking for is a commitment to support N > 128, that's all

PS: If you can think of implementation details, those are nice to jot down.

scottmcm · May 28, 2018, 4:17am

Excellent point! I'd written it out with ⌈⌉, but that looked terrible, so deleted it.

nit: You mean [−2^n-1, 2^n-1), since the formula you have actually generates [-½,-½].

soapbox: Once again, half-open ranges are superior to trying to futz about with ±1

ibkevg · May 28, 2018, 6:13pm

An idea for related sugar is to also support supplying the range of values that is required from the integer instead of supplying the number of bits you require and leave it to the compiler to map this to an appropriate underlying N-bit representation. In some cases, this would allow design intent to be expressed more directly.

(This idea is inspired by Ada which supports fairly robust integer subrange functionality. For example, an integer with a range of 10…300 which the compiler then maps to a concrete representation, does runtime range checks on, etc., etc.)

Edit: I see that this suggestion was already made above in this link:

Although it does qualitatively feel different from the intentions of the pre-RFC, there is some overlap especially in terms of the idea of mapping the logical uint type to a concrete underlying hardware type.

H2CO3 · May 28, 2018, 6:40pm

This. It kind of seems wasteful to add a core language feature which would be redundant with another, more generic (no pun intended) feature. A library-defined Bits<const expression> would definitely be useful, and a nice example of a use case for const generics. AFAICT no use case mentioned so far would require that these types be primitives/builtins.

+1. Again, such a request seems to stem from the usual "but it's so easy in other languages!" fallacy. usize not being a mere relabelling of u32 or u64 or u(sizeof(pointer)) is not an accidental pain point or a design error; it's a deliberate decision which forces programmers to think about whether they need an exact- and constant-sized integer, or a pointer-sized, variable-width (across platforms) integer.

(Incidentally, the same argument seems to come up almost constantly and every single time the discussion is shifted to a feature in Rust that exposes a problem and forces its users to think about it, instead of silently doing the wrong thing. It's been cited in the case error handling, from()/into() conversions, and who knows what other features. But it's still not a good idea, for the same reasons, to give up correctness for marginal convenience.)

comex · May 28, 2018, 8:37pm

One issue is that making Bits<32> be the same type as u32 wouldn't really work in a library. In theory you could do it by making Bits a type alias for an associated type projection – type Bits<const N: usize> = <N as GetBitsType>::Type – but then impls like impl<const N: usize> Foo for Bits<N> would be disallowed by coherence rules. More practically, it would have to be a separate type.

Yeah, I think I'd prefer true range types as well – though perhaps there should be both. In theory, there could be range and bit-width syntaxes for the same group of types: range<0, 127> could be the same type as uint<7>. However, offhand, I'd expect adding two range<0, 127> values to produce a range<0, 254>, while to be consistent with the existing builtin integer types, adding two uint<7>s would have to yield another uint<7>. An alternative would be to support both range<A, B> and uint<N> as entirely separate groups of types, but that seems like a confusing proliferation of integer types.

H2CO3 · May 29, 2018, 5:29am

Sure, although I'm pretty sure that's a feature, not a bug.

Topic		Replies	Views
Pre-RFC: Generic integers v2	82	2208	August 26, 2024
Pre-RFC: Integer Templating internals	35	12623	March 25, 2019
Pre-RFC: Type system seeing numbers as trait implementors	7	484	August 25, 2024
Pre-RFC: Variadic Generics language design	29	6625	July 25, 2020
Restarting the `int/uint` Discussion internals	197	24982	March 13, 2015

Pre-RFC: Generic integers (uint<N> and int<N>)

Related topics