Indices smaller than
usize that are stored in efficient data structures (e.g., in an ECS) naturally give rise to index expressions of
uN types smaller than
usize. Virtually every programmer encounters such situations frequently. For me the primary thrust of this thread was widening such expressions to
usize at the point where they are applied as index expressions, but not earlier. That’s the use of the unchecked
as usize conversion that troubles so many of us.
Indices smaller than
Not sure where you got the idea that
index would return an
Option. Just like now, it would do
I think you are vastly overestimating how common that pattern is. I don’t recall when I had to cast to usize for indexing the last time.
I still agree this is a worthwhile change, though.
I’m not sure whether I want indexing by signed numbers (though that’d be the easiest way to make the change work without compiler work), but can you elaborate? The thrust of the thread here has seemed to me as being about how indexing taking more types would be ok since it’s already partial, and it’s non-obvious to me that indexing failures from an index being too small are fundamentally different from an index being too large. Getting
v.get(i+1) at the end of a slice and getting
v.get(i-1) at the beginning of the slice don’t seem all that different, for example.
And because indexes are already restricted to
isize::MAX as usize, indexing by
isize only takes one check for in-bounds, the same as
(To pre-emptively avoid a potential misunderstanding: I definitely want
v[-1] to panic because of the off-by-one the same as
v[v.len()] because of the off-by-one errors. I do not want
v[-1] to give the last item in the slice.)
Those seem qualitatively different to me. “too large” can happen with a
usize just as easily, or with a
u8 when indexing a list with 12 items in it. A negative number suggests a more fundamental type/domain issue that may want a condition checking it, and I feel like I’m much more likely to want the compiler to flag an index with an
i32 so that I can say “oops, I meant to change this type to such-and-such”.
Here’s another angle: all values of
u__ indices are potentially valid indices. (Or, well, modulo system limitations.) Whereas any negative value will never be a valid index.
In other words, too big is a runtime issue, too small is easily a static issue.
Thank you, that’s exactly how I’d look at it.
I want to exclude “potentially negative indices” for the same reason I want to exclude “potentially null pointers”.
That may be a matter of programming style or environment. For example, when interfacing with C code, I feel like I have to cast
as usize all the time.
There’s also a sort-of self-reinforcing issue in Rust that non-usize types are annoying. I have many places in my code where would like to use
u32 or even
u16 for counts and indexes of “small” things, but I don’t, because I know it will cause tons of casts.
I think that having an
Index impl for negative integers which will panic on negative integers is a (bug-)safer and more ergonomic option compared to
buf[my_int as usize]. We could improve safety by deprecating
as casts and requiring
buf[my_int.try_into().unwrap()], but it will be even worse from ergonomic point of view.
I know it’s been discussed before, and iirc some C/C++ people argued strongly in favor for signed indexing back around when this decision was made in the first place for Rust (that is, unsigned indexing only).
In what use cases would you want to be doing manipulation of signed interners that are then used for indexing? Unsigned have a clear use case: smaller indices types.
As I see it, at least in terms of indices,
i__ types are offsets from an arbitrary position in an array, and
u__ are positions in an array, anchored in its start.
Pedantically the range of
usize indices corresponding to negative
isizes are also invalid indices, as well as
I definitely have code in which I am using signed integers (computed via some math) to index into an array. As a concrete example a chess AI I am working on is littered with these casts. I see the domain of acceptable indexes for most arrays as really quite small. I understand the notion that we can definitely rule out all the negative numbers but until the type system can express “nonnegative numbers less than v.len()” the prohibition against signed integer indexes just means extra casts to me, it does not improve the safety of my code.
Signed indices are sometimes used to indicate indexing “forward” from the start of an array, or “backward” from one entry past the end of the array. in such cases valid indices would be
-len..-1. Of course this can be accomplished by an appropriate indexing function that knows both the signed index and the unsigned length of the array.
All indices, of whatever form or value, that attempt to access outside the
0..len bounds of the array are invalid when applied to index the array. I don’t see that “extra casts” are involved, but there may be an extra test for non-negative values in the implemented bounds check.
As I said earlier, I would be against that. I’d rather get a panic from an off-by-one error than mysteriously get something from the back of the array instead. (It also means extra branching in indexing, and I wouldn’t want a folk wisdom of “you shouldn’t use signed numbers because they’re slower” to develop if we did end up allowing signed numbers.)
Now, I wouldn’t be against such a thing as a non-default option:
v[std::num::Wrapping(i)] or something don’t seem implausible. (I’m not making a proposal for any of those in core right now, though – I haven’t thought through whether they’re actually good ways of representing the idea.)
Because of LLVM
GEP restrictions, there doesn’t need to be one. The implementation can just
sext to the larger of
isize, then bitcast to the unsigned version and call that. (Both of those operations being essentially free on modern CPUs even if LLVM doesn’t optimize them away.)
I think that whenever you’re applying a signed offset to an index, it’ll be way easier to stay in signed and let
None when you go off either end.
usize can both represent all legal indexes into non-ZSTs (again because of LLVM
GEP restrictions), and correctly checking all overflows when adding an
isize offset to a
usize base index to produce a
unsize index is quite a complicated thing to do.
Signed indices are sometimes used to indicate indexing “forward” from the start of an array, or “backward” from one entry past the end of the array.
As I said earlier, I would be against that.
I wasn’t proposing that as a default interpretation of indexing, but instead just commenting that an extended-indexing
impl might choose to implement such an algorithm. I personally use a simple index macro whenever I want to do some form of indexing that is not directly built into the Rust language, including indexing by a
uN that is not
usize (i.e., the original impetus of this thread).
Not on all platforms or environments. With some care, you can have a 3-4GB array of
u8 on a 32-bit platform, for instance.
I agree that if we did such a thing we should use a separate type for indexing. That would also make slicing easier.
Some code searches suggest that slices going to
len()-1 seem fairly common, and slices going to
len()-something_else are not unheard-of.
You are not allowed to have a rust slice (or array) that long, though — see https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html#safety
Interesting; I wasn’t aware of that.
Why does that limitation exist?
IIRC that’s inherited from LLVM. Pointer offset uses
isize and LLVM is serious about it.
Personally, I’m not worried about this case.
- Memory access is checked either way. Panic on index 4294967294 is clear enough to a programmer
- I expect small negative values after the cast to far exceed the practical maximum size, so there’s a slim chance of wrapping around to a valid index. OTOH if the attacker can make the value large enough to wrap around, that would probably work with unsigned math, too.
So I'm in the middle of writing a serialization format which has a binary representation. It made me realize that there's another very concrete problem with general integer widening, which completely busts the "integer widening never results in loss of information" argument.
This serialization format, like many others, stores floating-point numbers using their bit representation (always in little-endian order for portability). For reasons of space efficiency, an
f32 is stored as itself (not converted to an
f64 before being stored), so it is effectively written as a
u32, and similarly, an
f64 is stored as a
u64 bit pattern.
In the deserializer, when I'm converting back from a bit pattern to a floating-point number, I'm using
f64::from_bits. Here is the problem. If I accidentally passed a
f64::from_bits, and it implicitly got widened, then the 4 most significant bytes of the
f64 would become all zeroes, compromising the value. This bug is impossible when there is no implicit integer widening.
Similar issues can occur when dealing with integers, too, although a deserializer spitting out 8 times as much data as intended (because I accidentally widened a
u8 to a
u64) is annoying but less likely to cause an actual, "logical" correctness bug. It might still be a nasty performance bug, though, when large amounts of bytes are being deserialized.