Pre-RFC: Arbitrary bit-width integers

Ah, so the heap allocation would only occur in the compiler?

  1. I would like to know why this feature has to be in the core language. As far as I can see, there's nothing stopping it from being in a crate, and it's exotic enough that I'm not sure it belongs to the core.
  2. I don't like the idea of all the new types having alignment 1. What if, down the road, hardware supports u256 directly but it needs alignment > 1?
  3. Please don't entangle this feature with bitfields. Bitfields are under-specified in every language I've ever seen — not even Ada with its representation clauses covers all of the corner cases — it would be better to say that, for the time being, arbitrary bit-width integers cannot be used in anything with a #[repr(...)] annotation, thus dodging the entire issue until it can be studied and specified properly.

I think point 1 has already been addressed, as well as point 2. I disagree with point 3. I am not entangling this with bit-fields, I'm saying its a good enough replacement. (Though I'm also not really sure what "corner cases" your talking about, since the only major ones I know are that C at least doesn't specify them at all, and Ada/Zig do). What I was getting at with the bit-fields point is that you could directly read/write byte representations of these structures with Uint<N> and Int<N> and they would come out correct. It (can't) be in a crate because of the LLVM support, which needs the compiler to use, if I understand it right.

repr(C) is intended to mean "the same layout C would use". If we can't guarantee that, I don't think we should support bitfields in repr(C).

(That doesn't mean we can't support bitfields in repr(Rust) or similar.)

1 Like

It might be possible to implement in a crate with enough hackery, but using the built in LLVM arbitrary-width integers would probably result in much more optimized code.

1 Like

Point 1 needs to be addressed explicitly in the text of the RFC, not just in offhand remarks halfway down the thread.

I haven't seen point 2 addressed at all so far, can you point me at what you're thinking of?

Re bitfields, the problem is that this statement...

is fractally wrong. Furthermore, if you allow these arbitrary-bitwidth integers to be used as struct fields in any context where the resulting layout might be externally visible, without adding a whole bunch of other layout control features to the language at the same time, you're going to create unfixable bugs, and 20 years from now I'll be stuck telling people not to use Rust's bitfields, just like I have to tell people not to use C bitfields today.

1 Like

How do these integers behave on overflow/underflow? Would the integer value just be undefined, since we can't specify 2's complement without additional logic?

1 Like

While you can view integers this way, if your definition of iX breaks for i0 and my does not then why choose yours? Your variant naturally defines how you compute NiX as iY (just take c_0 to c_{Y-1} as bits of the result in the below formula), but as you said it does not work for X = 0 and also getting actual number is complicated as far as I understand your definition:

$$ N = \lim_{n \to \infty} \sum_{i = 0}^{n-1} c_i \left(-1\right)^{\left(n-i\right)!} 2^i\ \mathrm{where}\ c_i = \begin{cases}b_i,&\text{if $i < X$}\ b_{X-1},&\text{if $i \geq X$}\end{cases} $$

(X is number of bits, b_i is value of i’th bit). With mine you get number by computing just

$$ N = \sum_{i = 0}^{X-1} b_i \left(-1\right)^{\left(X-i\right)!} 2^i $$

and you now only need to define how sign extension works separately from defining what iX means. (Using factorials as nice functions which give you odd number for 0 and 1 and even number for any integer bigger then 1, removing the need for special-casing bit X - 1.)

Unfortunately, as much as I love indulging in quibbles about what's the most elegant math, I don't think I should spend more of this thread's time on it when I'm not sure if anyone is seriously proposing we implement an i0 in the first place :sweat_smile:

I like the idea of arbitrary-width integer types, but they can lead to unintended consequences, some of which were covered in this post the last time we talked about (quasi) arbitrary width integers. Would this proposal include some method of resolving those issues as well?

If uN covers the range [0, 2N) and iN covers the range [−2N − 1, 2N − 1), then u0 should cover [0, 1), while i0 should cover [−½, ½). The only integer in either range is zero. This removes the ambiguity. (Shamelessly stolen from scottmcm’s post.)

Alternatively, if we require that <T as Default>::default().into<U>() == <U as Default>::default(), that iN : Default and that iN : Into<iM> for N ≤ M, then we also must have <i0 as Default>::default() be zero.

Back to the issue at hand: I would favour a solution where irregular-bit-width integers behave like ordinary types with respect to ABI, while bitpacking is solved by an orthogonal feature.


This could be solved by also adding a lang-itemed Int<N> type in the standard library that can be used to add impls, or introduce a new syntax like i{N} or a builtin macro primitive_int!(N) for a const generic variable N.

Well it certainly could if the alignment is defined in a compatible manner and i32 is just seen as a buildin type alias for Int<32>

Does it? 1st The const generic parameter might also be a u8 rather them a usize. 2nd const generic constrains are likely something that should be added eventually and 3rd the compiler might formally allow impls for such types but rejects creating actual instances for them. (This however might conflict with const generic expressions).

These types could just live in the core library using lang-items just like other special types like PhantomData.

Keep in mind that Zig does not support generic types or operator overloading, so their design is influenced by this. Rust does not define complex number types like C do because you can just define them with a generic type in a library and overload operators for them.

I think there should definitily a way that end users can define their own impl for these types. So we either introduce a generic capture as mentioned above or

My personal feeling is that generic integers are something that should not be in the language but in the standard library, as it is a nice type, that benefits from having const generics. My example is PhantomData or to a less degree the nonzero types: Types, that also lives in core, but have a big compiler support. The compiler may either ensure that the build in integer types match Int<8>, Int<16>, etc. in the same way as type aliases, or treat these types as second class citisens, different from the buildin integer types


For what it's worth, in nightly Rust it is already quite possible to express constraints on const generic parameters by lifting them to the type level:


struct Assert<const B: bool>;

trait True {}

impl True for Assert<true> {}

struct Int<const W: u8>
    Assert<{1 <= W}>: True,
    Assert<{W <= 24}>: True,

fn main() {
    let a = Int::<17>{}; // Ok
    let b = Int::<0>{};  // Fails to compile
    let c = Int::<25>{}; // Fails to compile

I assume pattern matching with these arbitrary bit-width integers would be exhaustive? I could cut quite a few unreachable!()s from my match expressions if so...

I like the idea of having separate types, but in terms of API design, when I want a function to accept, for example, an 8-bit unsigned integer, should I use Uint<8> or u8? What if that same function also requires a 7-bit unsigned integer -- should I mix u8 and Uint<7> in that case? (These are open questions not directed at @kpreid :slight_smile:).

What's a use case for i0 or u0? We certainly can support them, although I'm not sure the added mental complexity is worth the effort. They would be isomorphic to (), which AFAIK is already fairly useless outside return types and generic parameters.

I know I've brought this up before, but are we certain that arbitrary bit-width integers is preferred over ranged integers? I feel like the latter would be far more useful and would subsume the entire use case of the former.


struct Foo; is also isomorphic to (), but is useful enough that it even has its own dedicated syntax.

I think it's useful for the same kinds of generic reasons that () is useful, and for the same kinds of reasons that [_; 0] exists as a type.

For example, u0 is the perfect always-in-bounds type for indexing a [T; 0] , the same way that u8 is the perfect always-in-bounds type for indexing a [T; 256].


I'd much rather have Integer<0..10> for indexing a [T; 10], for example.

Though there might be value in both -- I'd like Integer<A..B> + Integer<C..D> -> Integer<A+C..B+D>, but that means that Integer<0..256> would need to be a different type from u8.


-> Integer<A+C..B+D-1> (and assuming C>A, D>B).

I think there are overlapping yet distinct reasons to have all of primitives, arbitrary POD bit-widths, and ranged versions. Which is to say, I feel the types should be distinct from one another.

1 Like

I don't see an issue having Integer<0, 255> (I prefer an inclusive end) being different than u8. A u8 could actually be defined in terms of the former.

The deranged crate is my general thought for an API based on this. There are additional things that would be nice, such as arithmetic in the manner mentioned (it's just not stable yet).

1 Like

Yeah, that's fair -- and the only option right now for deranged since it wants to support the full range of the underlying type.

I always default to half-open, but since this is actually about interval arithmetic, full-closed is probably easier to represent. Integer<A..B> * Integer<C..D> -> Integer<A*B..(B*D-B-D+1)> would be no fun.

EDIT: Oops, +1. Thanks quinedot

Intuition-wise I'd expect adding 1 to an Integer<0, 255> gives me an Integer<1, 256>. I don't quite expect that of u<8>, but perhaps that's just conditioning.

1 Like