In code that stores and manipulates indexes into slices (custom hash maps, some compression code, string searching/indexing, etc) it is common to manipulate and store these indexes in types that are not usize. It is unnecessarily verbose and potentially error-prone to litter all the indexing operations with as usize
.
Since indexing with other kinds of integer types is well defined and usually[1] has the same runtime overheads, I think SliceIndex should be implemented for all integer types. Rust tries to make integer casting explicit and annoying for a reason, but indexing with bounds checks is well defined and should not look scary.
Implementing this would look different depending on the type:
- Indexing with u8s: All values are clearly feasible, upcast to usize and do the normal bounds check.
- Indexing with u64s: The value may exceed the range of usize. Current code that converts to usize could be wrapping accidentally. We should do the bounds check before downcasting to usize.
- Indexing with negative integers: Negative values should fail the check. So first cast to usize then performing the normal check. Negative numbers underflow into very large usize values that are way out of bounds. This is currently what happens when code is littered with
as usize
for all indexing operations.
[1] The only case that I'm aware of where this breaks is with i32 indexing on a 32-bit system with an array of bytes > 2GiB. Some negative numbers may underflow into valid indexes. We could actually fix this current potential for bugs by encouraging i32-manipulating code to use direct slicing without coercing to usize. There would be an additional bounds check for under zero in the special case of negative index type on u8 slices on under-64-bit systems. People don't like extra bounds checks though, so I'm open to other options here.
Previous activity I've seen about this topic are about auto-coercing other integer types to usize before indexing (e.g. https://github.com/rust-lang/rfcs/issues/1842). I'm not entirely sure why those have been abandoned, but this is a slightly different solution that allows different implementations for different types for correctness and performance.