Summary
Add arbitrary uXX
and iXX
primitive integer types.
Motivation
Some algorithms need to work with very large numbers, or numbers that do not have 2n bits.
For example, the twelve_bit
crate has u12
, which is "useful
for implementing Chip-8 assemblers and interpreters safely" from its description. The Ethereum Virtual
Machine's scalar type is 256 bits, and ethnum
was created to make
it easier to work with those types.
ethnum
has a feature to enable uses of LLVM intrinsics to help with performance, but it needs to use
clang to compile and requires enabling LTO for optimal performance, since FFI calls are not inlined without LTO.
Having arbitrary bitwidth numbers support makes use cases above easier and removes the need to write software arithmetic implementations for integers with any number of bits in the future, since LLVM already has them. If an author of a library needs to change the implementation to be more specialized in a specific domain (for example, embedded targets, scientific computing), they could use a wrapper type and focus on the functionalities they need to change, and use LLVM's implementations for insignificant functionalities.
Guide-level explanation
Arbitrary bitwidth primitives can be sized any number of bits, specified by its number, for example,
i24
represents a signed integer with 24 bits (3 bytes) of storage.
Arbitrary bitwidth integer primitives are very similar to existing primitives.
Below is an example of declaring u24
literals:
// Works.
let u24_max = 0b11111111_11111111_11111111u24;
// Errors.
let u24_invalid = 0b1_11111111_11111111_11111111u24;
Arbitrary bitwidth integers always have an alignment of 1.
These primitives used outside of #[repr(packed)]
structures will round bits up to a multiple of 8 for its size.
Reference-level explanation
Valid range of bits
Arbitrary integers use 1 bit at minimum, and can be as large as 8388608 (223) bits, as supported by LLVM.
Any type with a specified number of bits outside this range are not recognized as a valid type.
// invalid
let u0: u0 = 0;
// 2^23 + 1 = 8388609
// invalid
let u8388609: u8388609 = 0;
Signed Integers
Signed integers are negative when the most significant bit is set. This means an i1
has two
possible values: -1
and 0
, i2
has three: 0
, 1
, -2
and -1
, and so on.
Literals
The compiler now allocates an arbitrary precision integer on the heap to allow writing arbitrary large numbers with arbitrary bit-width integers.
Layout
Arbitrary bitwith integer types have a fixed alignment of 1. To specify an alignment, downstream users
should define a newtype struct with #[repr(packed(ALIGN))]
.
Irregular types
Types are irregular when their size in bits is not a multiple of 8.
repr(Rust)
structures are never irregular.
repr
s
Irregular types used as fields in repr(Rust)
structures will round up to the next multiple of 8 bits for
its size. This means struct S { i: i1 }
will take 1 byte instead of just 1 bit.
#[repr(bitpacked)]
will merge fields that have irregular bit-widths. If the sum of bits of the fields are
not a multiple of 8, the structure remains being irregular.
Casting
The types conform to the semantics for numeric casting:
-
Casting between integers of the same size (e.g.
i4
->u4
) is a no-op. -
Casting from a larger integer to a smaller integer (e.g.
u9
->u8
) will truncate -
Casting from a smaller integer to a larger integer (e.g.
u7
->u14
) will-
zero-extend if the source is unsigned
-
sign-extend if the source is signed
-
-
Casting from an integer to float will produce the closest possible float
-
if necessary, rounding is according to roundTiesToEven mode
-
on overflow, infinity (of the same sign as the input) is produced
-
Drawbacks
Hard to write inherent impl
s
We cannot add inherent functions to these types as there are a lot of them and we do not have a trivial solution to generalize adding functions to arbitrary bitwidth integers.
Trait impl
s must be handled by the compiler
There are a lot of arbitrary bitwidth integers, and we cannot just name all of them by hand and start
writing trait impl
s. This could lead to a lot of work required for the compiler to support trait
implementations and lowering them to codegen IR.
Rationale and alternatives
Const generic definitions
The initial design for this RFC involved having const generic parameters (e.g. Int<4>
for i4
) to
specify its bit width.
We could write inherent and trait impls for the types in the standard library if we used this approach, but it had its own drawbacks:
-
Int<32>
is not necessary the same asi32
, as the alignment for arbitrary bit-width integers is always 1. -
The const generic parameter has a valid range. You cannot declare
Int<0>
orInt<1000000000000000>
because we cannot support them. It requires a lot of refactoring for the current trait system in place to detect them. -
Having const generic parameters on a primitive can be confusing, especially when we use angle brackets.
Prior art
The Zig programming language supports arbitrary bitwidth integer types.
Their approach is very similar to this RFC.
Unresolved questions
impl
s
How do we define inherent methods and trait impls? Should this be special cased by the compiler?
Hybrid approach?
Can we use a hybrid approach to resolve the impl
s problem above? For example, i4
is what an
end user uses, and ab(u)int<BITS>
is what libcore uses for defining impls. This way we only need to resolve the type in a special way instead of requiring codegen to handle trait implementations.