To improve usize (and isize) handling in Rust

Hi, I think the way Rust handles integral values is well designed, but I have some problems with usize types, caused by them being variable size on different systems. The problem is that code written to work on 64 bit system may raise overflows on a 32 system if the programmer isn't extra careful in using usize values. I'd like a way to face this problem in a more principled/automatic way. Below I explain the problem in more details and I propose one possible solution. My solution could be bad. Perhaps you can suggest better solutions.

In my code I put a const_assert because it assumes size_of<usize>::() >= 4. (I know usize could be 2 bytes only, but most times I am not going to target that). I write the code on a 64 bit system. The code contains probably 10_000 or more operations among two usize arguments, like multiplications. I try to keep values inside such 64 bit usizes below 32 bit, but sometimes by mistake I may end up having numbers larger than that in my usize values. The program works (because those numbers are less than 64 bits) but they are bugs (and cause overflow panics) if I compile the code on a 32 bit in debug mode. I'd like to have a way to spot such mistakes on a 64 bit system, so my code is portable to 32 bit systems too. A solution is to use u32 and cast only at the last moment to usize, but this introduces a ton of casts in my code that I want to avoid (even if I use safe casts with a macro). In some cases I do this, because this is the right solution, but in several other cases I'd like to avoid all those casts.

One solution could be to add standard features like:

#![feature(debug_overflow_usize_past_u32_max)] #![feature(debug_overflow_usize_past_u16_max)]

Using the first one the compiler keeps using usize values as large as before (this means on my system usize keeps being 8 bytes) but that first feature adds in debug builds overflow tests that disallow values past 32::MAX inside usize values (even when usize is 8 or more bytes long). This way if there's a mistake in my code, and a 64 bit usize variable gets a 33 bit value, it panics at run-time and I can fix the bug. What do you think about my solution? (Perhaps the only/best solution is just to compile to a 32 bit target in debug mode, and fix the overflow bugs).

If your code works under miri, you can run any supported target on any supported host via miri. That will serve to find any tested overflow bugs.

Just adding a bunch of checks for sticking a too-large value in usize, however, is intractable, because the other point of usize is to be able to hold a pointer value. While those "don't have a numerical value", they still can (and likely do) use sizes beyond u32 on a 64 bit platform in just normal operation.

2 Likes

If you want your code to work on a 32-bit target, that seems like your best option. If you're running on an x86-64 system, you should be able to run code for the corresponding 32-bit target to test.

1 Like

If you are developing on x86_64-unknown-linux-gnu you should have no problem testing also with the i686-unknown-linux-gnu target. You can even get fuzzing to work to some degree. It should be simple two-step process:

rustup target add i686-unknown-linux-gnu
cargo test --target i686-unknown-linux-gnu
// If you are using `cargo fuzz`, just run without address sanitizer
cargo fuzz run --target i686-unknown-linux-gnu your_fuzz_target -s none
1 Like

I personnally prefer what C++ does: there are types that are exactly x bits (for binary compatibility), at least x bits (whatever is fastest), and the smalest valid size to store x bits (in case that you want a 16 bit number on a 64 bits architecture that support only numbers of 8, 32 and 64 bits). I'm really surprised that in rust, u8 is exactly 8 bits, and not whatever is fastest. I'm even more surprised that the 3 version can't be expressed (at least I didn't saw it).

When optimizing for single variable targets, which don't have to fit compactly into struct and enum and union structures, the compiler backend (LLVM) will chose the storage size that maximizes performance. But do remember that in Rust u8 has potential overflow semantics, so that in debug mode 0xff + 1 will panic for a u8 target but become simply 0x0100 for a u16 or larger target.

C++ gets away with a larger target only by ignoring potential overflow.

3 Likes