[Pre-RFC] Implicit number type widening

RalfJung · July 13, 2019, 9:46am

Not sure where you got the idea that index would return an Option. Just like now, it would do .get(...).unwrap().

I think you are vastly overestimating how common that pattern is. I don't recall when I had to cast to usize for indexing the last time.

I still agree this is a worthwhile change, though.

scottmcm · July 14, 2019, 12:33am

I'm not sure whether I want indexing by signed numbers (though that'd be the easiest way to make the change work without compiler work), but can you elaborate? The thrust of the thread here has seemed to me as being about how indexing taking more types would be ok since it's already partial, and it's non-obvious to me that indexing failures from an index being too small are fundamentally different from an index being too large. Getting None from v.get(i+1) at the end of a slice and getting None from v.get(i-1) at the beginning of the slice don't seem all that different, for example.

And because indexes are already restricted to isize::MAX as usize, indexing by isize only takes one check for in-bounds, the same as usize.

(To pre-emptively avoid a potential misunderstanding: I definitely want v[-1] to panic because of the off-by-one the same as v[v.len()] because of the off-by-one errors. I do not want v[-1] to give the last item in the slice.)

josh · July 14, 2019, 6:26pm

Those seem qualitatively different to me. "too large" can happen with a usize just as easily, or with a u8 when indexing a list with 12 items in it. A negative number suggests a more fundamental type/domain issue that may want a condition checking it, and I feel like I'm much more likely to want the compiler to flag an index with an i32 so that I can say "oops, I meant to change this type to such-and-such".

CAD97 · July 14, 2019, 6:36pm

Here's another angle: all values of u__ indices are potentially valid indices. (Or, well, modulo system limitations.) Whereas any negative value will never be a valid index.

In other words, too big is a runtime issue, too small is easily a static issue.

josh · July 14, 2019, 7:14pm

Thank you, that’s exactly how I’d look at it.

I want to exclude “potentially negative indices” for the same reason I want to exclude “potentially null pointers”.

kornel · July 14, 2019, 8:10pm

That may be a matter of programming style or environment. For example, when interfacing with C code, I feel like I have to cast as usize all the time.

There's also a sort-of self-reinforcing issue in Rust that non-usize types are annoying. I have many places in my code where would like to use u32 or even u16 for counts and indexes of "small" things, but I don't, because I know it will cause tons of casts.

newpavlov · July 14, 2019, 8:35pm

I think that having an Index impl for negative integers which will panic on negative integers is a (bug-)safer and more ergonomic option compared to buf[my_int as usize]. We could improve safety by deprecating as casts and requiring buf[my_int.try_into().unwrap()], but it will be even worse from ergonomic point of view.

CAD97 · July 14, 2019, 9:20pm

I know it’s been discussed before, and iirc some C/C++ people argued strongly in favor for signed indexing back around when this decision was made in the first place for Rust (that is, unsigned indexing only).

In what use cases would you want to be doing manipulation of signed interners that are then used for indexing? Unsigned have a clear use case: smaller indices types.

As I see it, at least in terms of indices, i__ types are offsets from an arbitrary position in an array, and u__ are positions in an array, anchored in its start.

toc · July 15, 2019, 3:21am

Pedantically the range of usize indices corresponding to negative isizes are also invalid indices, as well as isize::MAX.

I definitely have code in which I am using signed integers (computed via some math) to index into an array. As a concrete example a chess AI I am working on is littered with these casts. I see the domain of acceptable indexes for most arrays as really quite small. I understand the notion that we can definitely rule out all the negative numbers but until the type system can express "nonnegative numbers less than v.len()" the prohibition against signed integer indexes just means extra casts to me, it does not improve the safety of my code.

Tom-Phinney · July 15, 2019, 3:36am

Signed indices are sometimes used to indicate indexing “forward” from the start of an array, or “backward” from one entry past the end of the array. in such cases valid indices would be 0..len and -len..-1. Of course this can be accomplished by an appropriate indexing function that knows both the signed index and the unsigned length of the array.

All indices, of whatever form or value, that attempt to access outside the 0..len bounds of the array are invalid when applied to index the array. I don’t see that “extra casts” are involved, but there may be an extra test for non-negative values in the implemented bounds check.

scottmcm · July 15, 2019, 4:31am

As I said earlier, I would be against that. I'd rather get a panic from an off-by-one error than mysteriously get something from the back of the array instead. (It also means extra branching in indexing, and I wouldn't want a folk wisdom of "you shouldn't use signed numbers because they're slower" to develop if we did end up allowing signed numbers.)

Now, I wouldn't be against such a thing as a non-default option: v[std::cmp::Reverse(i)] or v.rev()[i] or v.cycle()[i] or v[std::num::Wrapping(i)] or something don't seem implausible. (I'm not making a proposal for any of those in core right now, though -- I haven't thought through whether they're actually good ways of representing the idea.)

Because of LLVM GEP restrictions, there doesn't need to be one. The implementation can just sext to the larger of iN and isize, then bitcast to the unsigned version and call that. (Both of those operations being essentially free on modern CPUs even if LLVM doesn't optimize them away.)

I think that whenever you're applying a signed offset to an index, it'll be way easier to stay in signed and let .get(i+d) return None when you go off either end. isize and usize can both represent all legal indexes into non-ZSTs (again because of LLVM GEP restrictions), and correctly checking all overflows when adding an isize offset to a usize base index to produce a unsize index is quite a complicated thing to do.

Tom-Phinney · July 15, 2019, 4:38am

Signed indices are sometimes used to indicate indexing “forward” from the start of an array, or “backward” from one entry past the end of the array.

As I said earlier, I would be against that.

I wasn't proposing that as a default interpretation of indexing, but instead just commenting that an extended-indexing impl might choose to implement such an algorithm. I personally use a simple index macro whenever I want to do some form of indexing that is not directly built into the Rust language, including indexing by a uN that is not usize (i.e., the original impetus of this thread).

josh · July 15, 2019, 4:44pm

Not on all platforms or environments. With some care, you can have a 3-4GB array of u8 on a 32-bit platform, for instance.

josh · July 15, 2019, 4:49pm

I agree that if we did such a thing we should use a separate type for indexing. That would also make slicing easier.

Some code searches suggest that slices going to len()-1 seem fairly common, and slices going to len()-something_else are not unheard-of.

scottmcm · July 15, 2019, 9:05pm

You are not allowed to have a rust slice (or array) that long, though --- see from_raw_parts in std::slice - Rust

josh · July 15, 2019, 9:07pm

Interesting; I wasn’t aware of that.

Why does that limitation exist?

kornel · July 15, 2019, 11:00pm

IIRC that’s inherited from LLVM. Pointer offset uses isize and LLVM is serious about it.

kornel · July 15, 2019, 11:10pm

Personally, I'm not worried about this case.

Memory access is checked either way. Panic on index 4294967294 is clear enough to a programmer
I expect small negative values after the cast to far exceed the practical maximum size, so there's a slim chance of wrapping around to a valid index. OTOH if the attacker can make the value large enough to wrap around, that would probably work with unsigned math, too.

H2CO3 · October 4, 2019, 3:18pm

So I'm in the middle of writing a serialization format which has a binary representation. It made me realize that there's another very concrete problem with general integer widening, which completely busts the "integer widening never results in loss of information" argument.

This serialization format, like many others, stores floating-point numbers using their bit representation (always in little-endian order for portability). For reasons of space efficiency, an f32 is stored as itself (not converted to an f64 before being stored), so it is effectively written as a u32, and similarly, an f64 is stored as a u64 bit pattern.

In the deserializer, when I'm converting back from a bit pattern to a floating-point number, I'm using f32::from_bits and f64::from_bits. Here is the problem. If I accidentally passed a u32 to f64::from_bits, and it implicitly got widened, then the 4 most significant bytes of the f64 would become all zeroes, compromising the value. This bug is impossible when there is no implicit integer widening.

Similar issues can occur when dealing with integers, too, although a deserializer spitting out 8 times as much data as intended (because I accidentally widened a u8 to a u64) is annoying but less likely to cause an actual, "logical" correctness bug. It might still be a nasty performance bug, though, when large amounts of bytes are being deserialized.

Tom-Phinney · October 4, 2019, 3:41pm

I view this as a generic hazard of transmuting between representations, which is what is happening here: f32 transmuted to u32 and vice versa, and likewise between f64 and u64. The same issue can occur if an i32 is transmuted to a u32 and then widened to a u64 before being transmuted back to an i64.

Topic		Replies	Views
Implicit widening, polymorphic indexing, and similar ideas ideas (deprecated)	91	22484	March 25, 2019
`u32` as a second fallback type language design	31	1848	June 16, 2021
Subscripts and sizes should be signed language design	151	6704	December 7, 2022
Subscripts and sizes should be signed (redux) language design	41	1737	November 9, 2023
The problem with array/slice/vector indexes language design	14	9867	March 25, 2019

[Pre-RFC] Implicit number type widening

Related topics