Subscripts and sizes should be signed

Well, on one level, yes, most abstract algorithms can be written equally fluently whether or not index variables can represent negative numbers ... precisely because most abstract algorithms do not need index variables to be able to represent negative numbers.

On another level: No, it does matter. It goes more with the grain of Rust, or C, or ... every other language I can think of, that has unsigned numeric types at all ... to use unsigned index types, and that is because programming language designers choose to make invalid states unrepresentable in this instance.

There's a practical reason for that: nobody presently makes a computer with more than 263 bytes of RAM or disk. Back in the days when computers typically had 32-bit pointers and 3 or 4 gigabytes of RAM, programs that didn't handle more than 2GB in one array correctly were a regular headache at the job I had at the time (which was making custom modifications to GCC, and sometimes the requested modification was "make it magically fix the bugs in our program").

If you cannot share the project, or at least the abstract algorithm, then I don't think you will be able to convince anyone here that your proposed change would we worthwhile, even if backward compatibility didn't make it impossible.

2 Likes

You did not answer the most important part of my response. What is your desired outcome of this thread? Please be specific, indicating every step that is necessary, including the possible breaking changes and implications of them.

1 Like

How unsigned operations imply "no type safety"? It is just a number operations.

It is a useless theoretical conception, when u need some math. It is ok for enums when u need just unique id and nothing more, but its a problem when u need math. Some comments above I provided an example with chessboard 256x256. The most useful type for it is isize. If you use u8, you just create a problems with conversions on each small math operation.

If you need more that 2**31 elements in array, you need 64-bit machine. Using so big arrays on 32-bit machine is a very rare case. If extra bit is important for you, you do something wrong, because tomorrow you fill need one more extra bit.

I provided some small examples, where unsigned index is a problem. And I did not see opposite examples. Theoretical words "compile-time invariants"? Leave them for hipsters, I dont believe in these words, when I really see in all cases, that they only create more conversion boilerplate.

I think, I must attract some attention to this problem, because looks like most part of Rust's community still doesnt know, that checking signed index need only one comparing instruction. I must tell them, how to do it. And a lot of people just repeat theoretical words about bounded types etc, and they think, that it is useful conception, because the name of this conception sounds cool. I must tell them, that it is false.

It is just a number operation which happens to violate semantic type safety. If your 32 bits represent a signed number, you typically don't want to perform unsigned division on it, and vice-versa, if your number is unsigned then you typically don't want to perform signed division. What type safety means is that the type system prevents you from accidentally using the wrong operation.

For the same reason Java does have separate types for int and float even though both are sequences of 32 bits and floating point division is also "just an operation".

1 Like

We need type safety when we must distinguish void * from struct S *, and C++ templates was a right move from pure C. But if we talk about type safety between integer types - there are only some corner cases producing unexpected result, but in common case it is not a problem.

How can you "accidentally" call unsigned function, if it has another name?

Because there are concrete cases, where implicit conversion creates unexpected result, like float f=1/3, but it is not the only way to protect programmer. In Pascal, for example, result of float f=1/3 is a 0.33(3), and if you need integer division, you should use div operator. And nobody accidentally confused between / and div

This is a strange thread because the type of an array index in Rust is not going to change, no matter the merits of signed vs unsigned integers. There are a few important points:

  1. When you do arithmetic on indices, intermediate results might take you outside the range of valid indices for the arrays in question. If you want the type system to constrain indices you need to handle this somehow, perhaps with conversions.
  2. The address space on x86 is much smaller than 64 bits. There are a few pointer formats but I think the largest "only" lets you address a 56-bit space.
  3. Unsigned arithmetic is fraught with difficulties because wrapping is common. For instance, (x - y) % 3 == 0 looks like it should test if x and y are equal mod 3, but with wrapping unsigned arithmetic it is true for y == x + 1. Signed arithmetic can overflow but the values that cause overflow are usually far away from the ones we encounter in a typical program.
  4. If efficiency isn't a concern, or you can't prove that overflow doesn't occur, and you need your results to be correct, you should use arbitrary-precision integers.
  5. Whenever possible in Rust we should avoid indexing arrays in favor of using iterators.

My point was that doing unsigned arithmetic in Java via functions like

int divideUnsigned(int, int);
int remainderUnsigned(int, int);
int compareUnsigned(int, int);
String toUnsignedString(int);

etc is an error-prone hack that exists only because Java lacks a separate unsigned type.

The analogy for float was: imagine there is no separate float type, and you store floating point numbers in ints, and operate on them using functions like:

int addFloat(int, int);
int multiplyFloat(int, int);
int divideFloat(int, int);
String toFloatString(int);
3 Likes

It depends of real cases. We need float very often when we write physics library. But when do we really need unsigned? FFI only, I think. Or low-level processing image colors, when you have u32 color with u8 components. Practice is the criterion of truth.

Really? So since Rust uses usize for indexing in practice, wouldn't this criterion indicate that usize must definitely be the ultimate best type for indexing?

3 Likes

Unsigned integers are critical for working with bit streams, implementing arbitrary precision arithmetic, and other applications. And we need to be able to do arithmetic with unsigned integers in these applications (e.g. x & -x to extract the least bit set).

There are no cases, when usize is more useful than isize. But there are some opposite cases. Practice tells that decision to make indices unsigned was wrong and creates some unexpected panics when you need negative result of calculations and forgot to write conversion boilerplate.

Agree, it is the special case, where we work not with value, but with bit representation.

There are certainly some benefits, and people have mentioned some in this thread, so just repeating this is counterproductive.

Here are two off the top of my head:

  • n < len is simpler and shorter code than signed alternatives (this is not about performance)
  • Some functions such as Iterator::take would need to specify and implement some special case behavior with negative arguments.

So to be clear, your desired outcome is a discussion but no actual change? This thread should be closed in that case, as it is unactionable.

4 Likes

Why not v.is_valid(index) ?

No, discussion is only the first step.
Actual change I want - to enable isize indexing for builtin-in slices.
But it looks that I am only human in this thread who see the problem in thousands of .try_into().unwrap(), which are really not necassary.

I remember some real example - 2d image processing library. Which data type we need to use for image sizes? I selected usize to be more consistent with other Rust libraries. I could choose isize, but I dont like situation when each library has own type for sizes - it is zoo of types which produces a lot of useless conversions.

For example, there is a library for reading and writing bmp files, author used u32, and I asked him to use usize: https://github.com/sondrele/rust-bmp/issues/31. Why did author select u32? May be he wanted to specify, that bmp width and height can not be bigger that 2**32 pixels, but which advantages has the users? Only lots of casting boilerplate. Who would suffer if he chose a more conventional usize type? Nobody. It's just a religion to write useless conversions for the sake of abstract ideas, which in this case do not give any profit. Custom type for each library - is useless terrible idea. Universal integer type is good and useful.

After using usize for image size I tried to write a method which copies a sprite with offset, and offset can be a negative, of course. So I have an interface with different types for size and offset. And when I need to mix both of them in some calculations, I need to cast both of them to isize. It will be much more useful to have same isize type for size and offset. But it will be not consistent with other Rust code. It would be much better if Rust had isize for sizes and indices from the beginning.

  1. n < len is shorter.
  2. n < len makes it more obvious what the relationship is between my variables.
  3. v.is_valid(index) would require me to first create a vector of length len, which I might not want to do.
  4. There is no is_valid method on Vec.

Several people (including me and @CAD97) in the first few responses to this thread agreed that this would be desirable and explained why it hasn't happened yet. You seem to ignore these responses.

2 Likes

No, I did not. The main point was a lot of ambiguity situations. And by Rust rules, is such cases it try to apply i32. We can add next step: if i32 can not be applied, try to apply isize.

If you need a range check (not a number comparison), may be using Range::contains will be more suitable?

n < len is still simpler and shorter and more obvious than (0..len).contains(n). Additionally, the remaining case is much clearer: if it is not true that n < len, then n >= len. With signed ranges it's more complicated.

Can you please lay out the steps necessary to accomplish your ideal scenario? A previous response of mine specifically notes significant concerns for what you stated in the original post.

I am trying to be as reasonable as possible here, but even after asking quite clearly, I have no idea how you intend to make this work. I am completely setting aside if it should happen. Assume that you get what you want. How will it happen? Please be specific.

1 Like

You can avoid writing "thousands of .try_into().unwrap()" by putting it in a macro or trait. For example, see Rust Playground.

1 Like

Do you mean radical scenario, or realistic compromisse?

a.at(i) instead of a[i]? Square brackets has important advanatage - implicit defereferencing, so I need to write *a.at(i)**a.at(j) instead of a[i]*a[j]