Pre-RFC: Arbitrary bit-width integers

Point 1 needs to be addressed explicitly in the text of the RFC, not just in offhand remarks halfway down the thread.

I haven't seen point 2 addressed at all so far, can you point me at what you're thinking of?

Re bitfields, the problem is that this statement...

is fractally wrong. Furthermore, if you allow these arbitrary-bitwidth integers to be used as struct fields in any context where the resulting layout might be externally visible, without adding a whole bunch of other layout control features to the language at the same time, you're going to create unfixable bugs, and 20 years from now I'll be stuck telling people not to use Rust's bitfields, just like I have to tell people not to use C bitfields today.

1 Like

How do these integers behave on overflow/underflow? Would the integer value just be undefined, since we can't specify 2's complement without additional logic?

1 Like

While you can view integers this way, if your definition of iX breaks for i0 and my does not then why choose yours? Your variant naturally defines how you compute NiX as iY (just take c_0 to c_{Y-1} as bits of the result in the below formula), but as you said it does not work for X = 0 and also getting actual number is complicated as far as I understand your definition:

$$ N = \lim_{n \to \infty} \sum_{i = 0}^{n-1} c_i \left(-1\right)^{\left(n-i\right)!} 2^i\ \mathrm{where}\ c_i = \begin{cases}b_i,&\text{if $i < X$}\ b_{X-1},&\text{if $i \geq X$}\end{cases} $$

(X is number of bits, b_i is value of i’th bit). With mine you get number by computing just

$$ N = \sum_{i = 0}^{X-1} b_i \left(-1\right)^{\left(X-i\right)!} 2^i $$

and you now only need to define how sign extension works separately from defining what iX means. (Using factorials as nice functions which give you odd number for 0 and 1 and even number for any integer bigger then 1, removing the need for special-casing bit X - 1.)

Unfortunately, as much as I love indulging in quibbles about what's the most elegant math, I don't think I should spend more of this thread's time on it when I'm not sure if anyone is seriously proposing we implement an i0 in the first place :sweat_smile:

I like the idea of arbitrary-width integer types, but they can lead to unintended consequences, some of which were covered in this post the last time we talked about (quasi) arbitrary width integers. Would this proposal include some method of resolving those issues as well?

If uN covers the range [0, 2N) and iN covers the range [−2N − 1, 2N − 1), then u0 should cover [0, 1), while i0 should cover [−½, ½). The only integer in either range is zero. This removes the ambiguity. (Shamelessly stolen from scottmcm’s post.)

Alternatively, if we require that

  • <T as Default>::default().into<U>() == <U as Default>::default(),
  • iN : Default, and
  • iN : Into<iM> for N ≤ M,

then we also must have <i0 as Default>::default() be zero.


Back to the issue at hand: I would favour a solution where irregular-bit-width integers behave like ordinary types with respect to ABI, while bitpacking is solved by an orthogonal feature.

4 Likes

This could be solved by also adding a lang-itemed Int<N> type in the standard library that can be used to add impls, or introduce a new syntax like i{N} or a builtin macro primitive_int!(N) for a const generic variable N.

Well it certainly could if the alignment is defined in a compatible manner and i32 is just seen as a buildin type alias for Int<32>

Does it? 1st The const generic parameter might also be a u8 rather them a usize. 2nd const generic constrains are likely something that should be added eventually and 3rd the compiler might formally allow impls for such types but rejects creating actual instances for them. (This however might conflict with const generic expressions).

These types could just live in the core library using lang-items just like other special types like PhantomData.

Keep in mind that Zig does not support generic types or operator overloading, so their design is influenced by this. Rust does not define complex number types like C do because you can just define them with a generic type in a library and overload operators for them.

I think there should definitily a way that end users can define their own impl for these types. So we either introduce a generic capture as mentioned above or

My personal feeling is that generic integers are something that should not be in the language but in the standard library, as it is a nice type, that benefits from having const generics. My example is PhantomData or to a less degree the nonzero types: Types, that also lives in core, but have a big compiler support. The compiler may either ensure that the build in integer types match Int<8>, Int<16>, etc. in the same way as type aliases, or treat these types as second class citisens, different from the buildin integer types

2 Likes

For what it's worth, in nightly Rust it is already quite possible to express constraints on const generic parameters by lifting them to the type level:

#![feature(generic_const_exprs)]

struct Assert<const B: bool>;

trait True {}

impl True for Assert<true> {}

struct Int<const W: u8>
where
    Assert<{1 <= W}>: True,
    Assert<{W <= 24}>: True,
{}


fn main() {
    let a = Int::<17>{}; // Ok
    let b = Int::<0>{};  // Fails to compile
    let c = Int::<25>{}; // Fails to compile
}
4 Likes

I assume pattern matching with these arbitrary bit-width integers would be exhaustive? I could cut quite a few unreachable!()s from my match expressions if so...

I like the idea of having separate types, but in terms of API design, when I want a function to accept, for example, an 8-bit unsigned integer, should I use Uint<8> or u8? What if that same function also requires a 7-bit unsigned integer -- should I mix u8 and Uint<7> in that case? (These are open questions not directed at @kpreid :slight_smile:).

What's a use case for i0 or u0? We certainly can support them, although I'm not sure the added mental complexity is worth the effort. They would be isomorphic to (), which AFAIK is already fairly useless outside return types and generic parameters.

I know I've brought this up before, but are we certain that arbitrary bit-width integers is preferred over ranged integers? I feel like the latter would be far more useful and would subsume the entire use case of the former.

16 Likes

struct Foo; is also isomorphic to (), but is useful enough that it even has its own dedicated syntax.

I think it's useful for the same kinds of generic reasons that () is useful, and for the same kinds of reasons that [_; 0] exists as a type.

For example, u0 is the perfect always-in-bounds type for indexing a [T; 0] , the same way that u8 is the perfect always-in-bounds type for indexing a [T; 256].

+1

I'd much rather have Integer<0..10> for indexing a [T; 10], for example.

Though there might be value in both -- I'd like Integer<A..B> + Integer<C..D> -> Integer<A+C..B+D>, but that means that Integer<0..256> would need to be a different type from u8.

5 Likes

-> Integer<A+C..B+D-1> (and assuming C>A, D>B).

I think there are overlapping yet distinct reasons to have all of primitives, arbitrary POD bit-widths, and ranged versions. Which is to say, I feel the types should be distinct from one another.

1 Like

I don't see an issue having Integer<0, 255> (I prefer an inclusive end) being different than u8. A u8 could actually be defined in terms of the former.

The deranged crate is my general thought for an API based on this. There are additional things that would be nice, such as arithmetic in the manner mentioned (it's just not stable yet).

1 Like

Yeah, that's fair -- and the only option right now for deranged since it wants to support the full range of the underlying type.

I always default to half-open, but since this is actually about interval arithmetic, full-closed is probably easier to represent. Integer<A..B> * Integer<C..D> -> Integer<A*B..(B*D-B-D+1)> would be no fun.

EDIT: Oops, +1. Thanks quinedot

Intuition-wise I'd expect adding 1 to an Integer<0, 255> gives me an Integer<1, 256>. I don't quite expect that of u<8>, but perhaps that's just conditioning.

1 Like

For clarity, I meant having it as an opaque wrapper type, not a type alias. The semantics would not change from current.

1 Like

Really overengineered idea:

type u8 = Wrapping<UInt<0..=255>>;
3 Likes

The knock-on effects of this don't sit well with me. What is the type of Wrapping<UInt<A..=B>> + Wrapping<UInt<C..=D>>? How does this mesh with the return type of UInt<A..=B> + UInt<C..=D>?

A type error, just like u8 + u32. I'm not saying it's necessarily a good idea (I have no concrete idea), just an interesting one.

The biggest issue is, well, u8 isn't actually Wrapping<UInt<0..=255>>, because that's Wrapping<u8>. u8 is instead WrappingOrPanicking<UInt<0..=255>>. And since Wrapping<u8> works, we'd need to support both Wrapping<UInt<0..=255>> and Wrapping<WrappingOrPanicking<UInt<0..=255>> or do some sort of magic to make Wrapping<u8> not do that nesting, even in the face of generics.

Interesting idea, but probably not a very viable one. Thus why I called it overengineered.

Or we just make UInt<A..=B> have the conditionally-wrapping-or-panicking modular arithmetic behavior, then add e.g. Growing<UInt<A..=B>> for the growing bounds behavior.

I still don't really like it, but that seems at least a possible solution.

I agree about this not being the greatest path, but thinking about this some more…is this something we'd be OK with?

let a: Wrapping<10..=20> = 15;
let b = a + 7;
assert_eq!(b, 11); // originally had 12, but `20` eats up a step, so it should be 11

Maybe instead of a completely free choice of a range, instead, there should be Biased<Bias, Size> which is an integer which can represent Size distinct values which are offset from 0 by Bias. I would suppose that something like type UInt<A..=B> = Biased<A, Len<A..=B>::SIZE>; might be possible with enough const magic. It'd make 1-based indexing APIs (e.g., Lua or Julia bindings) easier to work with I suppose…