[Pre-RFC] Implicit number type widening

Indices smaller than usize that are stored in efficient data structures (e.g., in an ECS) naturally give rise to index expressions of uN types smaller than usize. Virtually every programmer encounters such situations frequently. For me the primary thrust of this thread was widening such expressions to usize at the point where they are applied as index expressions, but not earlier. That’s the use of the unchecked as usize conversion that troubles so many of us.

3 Likes

Not sure where you got the idea that index would return an Option. Just like now, it would do .get(...).unwrap().

I think you are vastly overestimating how common that pattern is. I don’t recall when I had to cast to usize for indexing the last time.

I still agree this is a worthwhile change, though.

1 Like

I’m not sure whether I want indexing by signed numbers (though that’d be the easiest way to make the change work without compiler work), but can you elaborate? The thrust of the thread here has seemed to me as being about how indexing taking more types would be ok since it’s already partial, and it’s non-obvious to me that indexing failures from an index being too small are fundamentally different from an index being too large. Getting None from v.get(i+1) at the end of a slice and getting None from v.get(i-1) at the beginning of the slice don’t seem all that different, for example.

And because indexes are already restricted to isize::MAX as usize, indexing by isize only takes one check for in-bounds, the same as usize.

(To pre-emptively avoid a potential misunderstanding: I definitely want v[-1] to panic because of the off-by-one the same as v[v.len()] because of the off-by-one errors. I do not want v[-1] to give the last item in the slice.)

7 Likes

Those seem qualitatively different to me. “too large” can happen with a usize just as easily, or with a u8 when indexing a list with 12 items in it. A negative number suggests a more fundamental type/domain issue that may want a condition checking it, and I feel like I’m much more likely to want the compiler to flag an index with an i32 so that I can say “oops, I meant to change this type to such-and-such”.

Here’s another angle: all values of u__ indices are potentially valid indices. (Or, well, modulo system limitations.) Whereas any negative value will never be a valid index.

In other words, too big is a runtime issue, too small is easily a static issue.

1 Like

Thank you, that’s exactly how I’d look at it.

I want to exclude “potentially negative indices” for the same reason I want to exclude “potentially null pointers”.

That may be a matter of programming style or environment. For example, when interfacing with C code, I feel like I have to cast as usize all the time.

There’s also a sort-of self-reinforcing issue in Rust that non-usize types are annoying. I have many places in my code where would like to use u32 or even u16 for counts and indexes of “small” things, but I don’t, because I know it will cause tons of casts.

2 Likes

I think that having an Index impl for negative integers which will panic on negative integers is a (bug-)safer and more ergonomic option compared to buf[my_int as usize]. We could improve safety by deprecating as casts and requiring buf[my_int.try_into().unwrap()], but it will be even worse from ergonomic point of view.

2 Likes

I know it’s been discussed before, and iirc some C/C++ people argued strongly in favor for signed indexing back around when this decision was made in the first place for Rust (that is, unsigned indexing only).

In what use cases would you want to be doing manipulation of signed interners that are then used for indexing? Unsigned have a clear use case: smaller indices types.

As I see it, at least in terms of indices, i__ types are offsets from an arbitrary position in an array, and u__ are positions in an array, anchored in its start.

2 Likes

Pedantically the range of usize indices corresponding to negative isizes are also invalid indices, as well as isize::MAX.

I definitely have code in which I am using signed integers (computed via some math) to index into an array. As a concrete example a chess AI I am working on is littered with these casts. I see the domain of acceptable indexes for most arrays as really quite small. I understand the notion that we can definitely rule out all the negative numbers but until the type system can express “nonnegative numbers less than v.len()” the prohibition against signed integer indexes just means extra casts to me, it does not improve the safety of my code.

2 Likes

Signed indices are sometimes used to indicate indexing “forward” from the start of an array, or “backward” from one entry past the end of the array. in such cases valid indices would be 0..len and -len..-1. Of course this can be accomplished by an appropriate indexing function that knows both the signed index and the unsigned length of the array.

All indices, of whatever form or value, that attempt to access outside the 0..len bounds of the array are invalid when applied to index the array. I don’t see that “extra casts” are involved, but there may be an extra test for non-negative values in the implemented bounds check.

As I said earlier, I would be against that. I’d rather get a panic from an off-by-one error than mysteriously get something from the back of the array instead. (It also means extra branching in indexing, and I wouldn’t want a folk wisdom of “you shouldn’t use signed numbers because they’re slower” to develop if we did end up allowing signed numbers.)

Now, I wouldn’t be against such a thing as a non-default option: v[std::cmp::Reverse(i)] or v.rev()[i] or v.cycle()[i] or v[std::num::Wrapping(i)] or something don’t seem implausible. (I’m not making a proposal for any of those in core right now, though – I haven’t thought through whether they’re actually good ways of representing the idea.)

Because of LLVM GEP restrictions, there doesn’t need to be one. The implementation can just sext to the larger of iN and isize, then bitcast to the unsigned version and call that. (Both of those operations being essentially free on modern CPUs even if LLVM doesn’t optimize them away.)

I think that whenever you’re applying a signed offset to an index, it’ll be way easier to stay in signed and let .get(i+d) return None when you go off either end. isize and usize can both represent all legal indexes into non-ZSTs (again because of LLVM GEP restrictions), and correctly checking all overflows when adding an isize offset to a usize base index to produce a unsize index is quite a complicated thing to do.

1 Like

Signed indices are sometimes used to indicate indexing “forward” from the start of an array, or “backward” from one entry past the end of the array.

As I said earlier, I would be against that.

I wasn’t proposing that as a default interpretation of indexing, but instead just commenting that an extended-indexing impl might choose to implement such an algorithm. I personally use a simple index macro whenever I want to do some form of indexing that is not directly built into the Rust language, including indexing by a uN that is not usize (i.e., the original impetus of this thread).

Not on all platforms or environments. With some care, you can have a 3-4GB array of u8 on a 32-bit platform, for instance.

I agree that if we did such a thing we should use a separate type for indexing. That would also make slicing easier.

Some code searches suggest that slices going to len()-1 seem fairly common, and slices going to len()-something_else are not unheard-of.

You are not allowed to have a rust slice (or array) that long, though — see https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html#safety

1 Like

Interesting; I wasn’t aware of that.

Why does that limitation exist?

IIRC that’s inherited from LLVM. Pointer offset uses isize and LLVM is serious about it.

1 Like

Personally, I’m not worried about this case.

  • Memory access is checked either way. Panic on index 4294967294 is clear enough to a programmer :slight_smile:
  • I expect small negative values after the cast to far exceed the practical maximum size, so there’s a slim chance of wrapping around to a valid index. OTOH if the attacker can make the value large enough to wrap around, that would probably work with unsigned math, too.