Allowing slice indexing with non-usize integer types

edre · December 23, 2020, 9:41pm

In code that stores and manipulates indexes into slices (custom hash maps, some compression code, string searching/indexing, etc) it is common to manipulate and store these indexes in types that are not usize. It is unnecessarily verbose and potentially error-prone to litter all the indexing operations with as usize.

Since indexing with other kinds of integer types is well defined and usually[1] has the same runtime overheads, I think SliceIndex should be implemented for all integer types. Rust tries to make integer casting explicit and annoying for a reason, but indexing with bounds checks is well defined and should not look scary.

Implementing this would look different depending on the type:

Indexing with u8s: All values are clearly feasible, upcast to usize and do the normal bounds check.
Indexing with u64s: The value may exceed the range of usize. Current code that converts to usize could be wrapping accidentally. We should do the bounds check before downcasting to usize.
Indexing with negative integers: Negative values should fail the check. So first cast to usize then performing the normal check. Negative numbers underflow into very large usize values that are way out of bounds. This is currently what happens when code is littered with as usize for all indexing operations.

[1] The only case that I'm aware of where this breaks is with i32 indexing on a 32-bit system with an array of bytes > 2GiB. Some negative numbers may underflow into valid indexes. We could actually fix this current potential for bugs by encouraging i32-manipulating code to use direct slicing without coercing to usize. There would be an additional bounds check for under zero in the special case of negative index type on u8 slices on under-64-bit systems. People don't like extra bounds checks though, so I'm open to other options here.

Previous activity I've seen about this topic are about auto-coercing other integer types to usize before indexing (e.g. https://github.com/rust-lang/rfcs/issues/1842). I'm not entirely sure why those have been abandoned, but this is a slightly different solution that allows different implementations for different types for correctness and performance.

djc · December 23, 2020, 10:50pm

If I recall correctly we technically support usize being u16, so your reasoning for u64 also extends to u32.

See also: [Pre-RFC] Implicit number type widening

Tom-Phinney · December 24, 2020, 12:09am

For the TI MSP430.

CAD97 · December 24, 2020, 12:33am

I've seen a number of people using an ix! macro to do this userland, where ix! is roughly $arg.try_into().expect("index out of bounds"). If this is followed immediately by the inlined Index impl, it optimizes fairly decently (though there are two checks involved, for the two separate errors).

The technical issue with this is that it breaks existing code due to inference. You can write let x = 0; arr[x]; and x will be inferred to be usize today, as there's only an Index<usize> impl. As soon as any new Index impl is added, the number type changes to be inferred to be i32.

There's basically no strong opposition to adding the ability to index with other unsigned integer types, except that it breaks arr[0] working. There's reasonable arguments against allowing signed index types (people could assume arr[-1] is the last element, or as a soft lint against using the wrong variable as an index, etc.), so adding an i32 impl just to make literal indexing work is a difficult sell.

And even if everyone agreed that indexing with signed numbers is unambiguously .try_into().unwrap(), changing the types of a lot of underconstrained variables just constrained by indexing is likely to break people's code silently.

josh · December 24, 2020, 6:01am

Exactly. Nothing stops us from changing this at an edition boundary, but it'd likely trade off between the ergonomics of some code and the ergonomics of other code. All else being equal, the least surprising behavior seems like "indexes are always usize".

In addition, try_into().expect(...) would turn a compile-time error that people might want to handle differently into a runtime error. Multiple times, I've had code where the requirement for usize in indexing forces me to think about handling portability issues.

scottmcm · December 24, 2020, 6:30am

FWIW as precedent, we do allow shifting by arbitrary types, including signed ones, even though shifting by a negative is never correct (and panics in debug), like how indexing a slice by a negative is never correct (and panics).

I would happily allow indexing by everything if we could do it without inference impact.

jhpratt · December 24, 2020, 8:24am

Having a defined hierarchy here would be nice, honestly. I ran into this when implementing extension traits on numbers. The implementation only makes sense on non-negative integers, but by implementing the trait on all uX, type inference wouldn't have worked.

Having a defined fallback, where i32 is tried first for back-compatibility (at least in the same edition), followed by u32, i16, u16, etc. would make situations like this and the indexing inference non-existent. So long as the order is defined, there shouldn't be any issues…I think.

^{Don't take the list above as an actual proposed order. I'm just making up something random.}

quinedot · December 24, 2020, 10:52am

If things go this way, I would advocate for a lint for ambiguous situations (e.g. "only usize and isize were inferred to meet the uses of this literal and isize was chosen", only phrased better).

Relatedly, RFC 212 proposed an unconstrained_literal lint for the i32 fallback, but if it still exists it must have been renamed.

programmerjake · December 25, 2020, 4:38am

interesting you should mention that since I was just participating in a forum thread where different index types causing a large speed difference was discussed: Patch Proposed For Removing BZIP2 Support From The Linux Kernel - Phoronix Forums

edre · December 25, 2020, 6:53am

Does the language currently support a prioritized tier of types for inference? I think it's reasonable to keep the default where it is if indexing will still work with specified types.

Does shifting by a number currently introduce any type constraints or preferences for inference?

jhpratt · December 25, 2020, 9:51pm

I don't think there's support like that for any time in any place, no. It's just something I've thought about on a few occasions, nothing more. I see it as a possible way forward to allow multiple "convenience" implementations while not breaking things like crazy.

amosonn · January 3, 2021, 7:27pm

Interesting video! The problem there is not actually signed-vs-unsigned arithmetic (at hardware level) though, but rather C++ guarantees for arithmetic on those types - which are weaker for signed integers, and therefore these are better optimised. In Rust these guarantees are different - AFAIK, Rust guarantees 2-complement wrapping for all integer types. This means that the optimization in the video wouldn't work for signed integers, either.

Regardless, bounds checks would probably swamp everything there anyway; if this code would be really such a hot-spot that it matters, they would need to be rid of, at best with an assert letting the compiler know everything is in bounds, and then everything definitely also doesn't overflow and signed/unsigned doesn't matter.

system · April 3, 2021, 7:27pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
To improve usize (and isize) handling in Rust language design	6	1376	September 19, 2020
The problem with array/slice/vector indexes language design	14	9857	March 25, 2019
Arrays should implement `Index` and `IndexMut` libs	4	1579	October 22, 2020
Subscripts and sizes should be signed (redux) language design	41	1715	November 9, 2023
Implicit widening, polymorphic indexing, and similar ideas ideas (deprecated)	91	22445	March 25, 2019

Allowing slice indexing with non-usize integer types

Related topics