Subscripts and sizes should be signed (redux)

Moderator note: split from Allow use as and try as for From and TryFrom traits

There may have been a better way to develop basic guidelines when developing a language, but time is up.

There should be only one "default" integer which can be used in most cases and recommended everywhere, and other types only for cases where bit representation matters. But in real Rust we have a terrible zoo of different types. The standard slice has size as the length and index, bmp crate uses u32 (and there is no good reason to use this type instead of usize: u32 as index is incompatible with rust vectors · Issue #31 · sondrele/rust-bmp · GitHub ), if you are writing an image processing library, you should use signed numbers to represent negative values.

Rust developers have not heard of the Stroustrop article: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf , they haven't heard of the experience of a C++ programmer (never use unsigned numbers until you really need bit representation and bitwise operations), and still use dangerous unsigned types that can cause your program to crash with simple subtraction. Some Rust programmers use trimmed types and think they use a "domain-oriented type", but Rust does not support domain-oriented types such as "[0..256) of isize", but in Rust there are "types that create problems" like "u8" which should be used only for FFI, but are used everywhere and creates a problem for developers, as well as a lot of wrap and cast boilerplates.

Some of the arguments in that paper don't quite apply to Rust. Both signed and unsigned arithmetic (by default, other options are available) detect wraparound in debug builds and are modular in release builds. Pointer offsets aren't usually relevant to safe code and slice indexing prevents out-of-bounds access in any direction.

Libraries not agreeing though is a pain point.

3 Likes

Previous discussion about this.

1 Like

And slice indexing catch index overflow and underflow by different ways. If you have out-of-bound to "overflow" direction, you can use get method. If you have out-of-bound to negative direction, you should rewrite formula or write extra casting code or write extra check or write some another boilerplate. Unsigned index just create a lot of extra troubles when u need index arithmetic, and does not have advantages. So if you need a lot of integer casts in typical Rust program - it is not a program architecture problem. This is a problem of designing a language and a code base. If the user code in each library had only one "universal" int by default (except FFI), there would be no problems with casting.

Yes, immmediate fail is better than wrong result. But correct result is better that immediate fail.

Rust program will fail when try to count a-b when a<b for unsigned a and b. Unsigned subtraction is dangerous operation, but language design provoke to make it. With signed indicex operation a-b is valid and program will work and output correct result.

This is disingenuous of you, at best. You brought that paper up last year (see Subscripts and sizes should be signed - #16 by scottmcm) so you really can't say that we "have not heard of [it]".

You might not like the answers from the previous discussions, but that's very different from saying Rust developers "haven't heard" the C++ experience. A great many of the original Rust designers and current Rust users come from C++ and have that experience directly.

Just because something is the right choice for C++ doesn't make it right for Rust.

13 Likes

The problems are common. The only difference: C++ outputs wrong result when Rust fails immidiately when you try to subtract bigger number from less number. According to this topic: Rust design provoke to write a lot of integer casting, because the designers initially failed to agree on which integer type to make the main one, so each library uses its own integer type to represent count or coord of something.

That's not the "only" difference.

For example, -1 < 10u is false in C++, and latest gcc, by default, doesn't even warn about it.

Rust doesn't have such a misfeature, and thus there's no need to avoid unsigned numbers to avoid that misfeature.


Really, if anything C++ needs the fail-on-overflow more than rust does, because if the subtraction in a[i - j] overflows (in unsigned) or just becomes negative (in signed), then that code is UB.

At least in Rust, if you're in release mode and the a[i - j] subtraction wraps, it'll be caught by the in-bounds check.

3 Likes

Combining signed and unsigned numbers is a problem. In C++ you spend some time to debug and write cast operator, in Rust compiler tells about a problem and u write cast operator without spending time to debugging. Rust approach is better, yes, it is. But existing different integer types in user code (not in FFI) is because language design has not "default recommended int". Integer types system was designed badly.

I can use checked operator.
If I want to write a.get[i-j], signed indices is much more useful.
Also, I can use wrapping_sub, but it is a quirk. We have a situation when more intuitive code is erroneous and compiler can not prevent us from subtracting unsigned numbers. In signed indices world we have not this quirk.

I can't agree with this. The question is not about the main integer type, but the way of types reinterpretation. The only difference is that integer conversion is the most commonly used.


As about the main integer type, my opinion is the any choice will be wrong. Since OS and hardware developing continues, we can't say that after several years (no matter, 10, 100 or even 1000) we will have the same conditions.

Even now some systems have different word size, i.e. word size in linux depends on compilation target. Windows have the fixed variants and we have the situation like Windows X x32 and x64 where x* hints the word size, but internally we still has WORD, DWORD and QWORD, where WORD has 16 size.

So, the most array implementation sizes depends on WORD size as more effective variant. Rust uses default word size on the current machine as compromise for array indexes. Maybe better solution for this issue is to use the special type Index<T = u8 | u16 | u32 | u64> which explicitly limits array with its internal *:max size (something like this was mentioned in links above and related, that I've checked).


Now, I want to talk about subtraction operation. Rust already have an elegant solution: .sub function as checked - operator variant.

I think that many programmers can't accept or don't agree with the limitation of memory sizes. We need to have Integer.bytes + 1 type as the result for any integer types like u8, i32 and others that participates in operation. This is about real mathematics where we can't have over- and underflow due to ...

I strongly believe that programmers don't need to be mathematicians (like I become this summer), so they shouldn't to always keep in mind things like Ring Theory.

Like I mentioned above, with checked subtraction we can avoid the questions like What program need to do? and explicitly remove the case that leads us to this.


Okay, lets continue... I thinks that you stands for C++, but not the things like approach of the X language for solution in the A case. I just tired from UB, I don't want to relay on compiler interpretation of UB resolving.

I like Python, so when I see things like array[i - j] I expect to get value from the right with needed index. But what you signed index must return in this case? Return index from the right like Python does?

In this case, we have undesirable behavior that lead to bug in many cases (in contrast to length - ... where we explicity says to reader, that we need the index from the right). Or you prefer UB in this case, I dunno.


Hope you read whole message and understand what I try to say

If you think that using a different operator/function in C++ is a great answer, but using a different operator/function in Rust is "a quirk", then we'll probably just not agree on things.

I'm not going to reply further here on this sub-topic.

6 Likes

isize is ok. If you need more than 231 elements in 32-bit machine, you doing something wrong. On 64-bit machine 263 limit is enough for all. Rust on 16-bit machines - does it exist? Let's be realistic and think about existing platforms.

Omg, why did u use these dangerous error-prone unsigned types?

If you need to write a special operator - you selected wrond type. In signed-index world you dont need special methods. Native subtraction works well without quirks.

Negative index should throw a panic, of course. If you write a.get(i-j), negative index should return a None value. In signed world you can check i-j<0 and i-j>=a.len() together, it is very useful in some cass. In unsiged world you should make extra check i<j and only after that you can call a.get(i-j)

You did not understand me. In C++ there is not a great answer too. In Rust you can choose [] or .get, but both of them can not help you in you want to write a.get(i-j). You neet ti write a quirk: a.get(i.wrapped_sub(j)), and you need to remember that you dont to forget to handle i<j case in each place where you use unsigned subtraction, because compiler will not hint you about dangerous operation. Signed numbers are not so error-prone.

Im sorry too much, but I cant stay silent when somebody talks "if you need a lot of integer casts, you need to rework architecture". Rust codebase are full of strange integer type choices, and the reason is -

  1. Wrong believing that overflow/underflow checking will prevent us from C++ unsigned type problems.
  2. Wrong believing that 'u8' is a domain-oriented type (the Rust has no domain-oriented types like [0..255) of isize
  3. Missed guideline like "always us isize as integer type in user code except machine representation really matters".

Dont tell us that our architecture is wrong. Rust designers - tons of integer conversion boilerplate is a thing created by you and only you.

Nonsense, you already can perform check with .sub

u16 is possible size on embedded controller, so you wrong

u* is not error-prone as I desribed above. Plus, if you read what you wrote, you can see the next words:

Negative index should throw a panic, of course

What difference? You already panic at -, so you can use checked variant that will force you to check or explicitly unwrap


Dont tell us that our architecture is wrong. Rust designers - tons of integer conversion boilerplate is a thing created by you and only you.

Because, we trying to resolve issues, not just walking around


This is to way of original topic, so now I thinking about removing this post

Do we talk about real Rust compilers? Or about theory?

Simple minus is more intuitive but fails the program. And compiler dont prevent us from this fails! And you advice me to remember some anti-intuitive rules. That's true: "You alway need to remember that you always need to keep in mind each time whan u want to sub unsigned x-y you not forget that x can be <y". All these "you need always remember and keep in mind" tells that design is wrong. And all programmin languages progress is removing all these "you need always remember and keep in mind". C language freed us from "you need always remember and keep in mind than this register is a pointer to struct S, but this register is int", C++ freed us from "you need always remember and keep in mind that after each malloc() you should manuallt call free()", Rust freed us from "you need always remember and to keep in mind that if you has a reference to vector element you should not reallocate the vector". But why Rust's language designers introduced this unsigned number pitfall for programmers?

The differenct is to move all panics to one place. If I want to calculate index and validate it only after that, I can do it on signed numbers. But when I use unsigned, I can not "just calculate index"! So I have to smear the index check by the stage of its calculation.

1 Like

Rust supports 16-bit platforms, yes. That's why, for example, there's no From<u32> for usize. At some point in the future there will possibly be a way to tell the compiler that you only care about 32+ bit platforms which would enable that conversion, among other things, but currently both the language and the library must assume that usize may be 16 bits wide.

4 Likes

That's a false dichotomy. Both can be true.

All I said in the post you were responding to was that if there is a very large number of conversions in some code, that can usually be avoided by refactoring the code. The same is true for basically any sort of repetitive code. If the same code is repeated many times, that can probably be abstracted away behind a cleaner abstraction. Repetitive conversions are one example of repetitive code.

Perhaps a different design of the language might also reduce the number of conversions. But that doesn't contradict what I said.

Unfortunately all these features taken together can make programming challenging. Rust files sizes are measured in u64, positions are sometimes measured in u64 and sometimes in i64. This makes sense because we've been able to store files that large for decades. But we also are not allowed to assume that usize can store a u32 for the reasons you've enumerated. But Rust also requires pointers to fit in a usize which might not work on some experimental platforms.

You cannot allocate more than isize::MAX non-zero-sized items, but indices are usize. I agree with @T4r4sB . I would have preferred indices to be a signed type because it's more useful for me to be able to have a negative index (in an intermediate computation with indices) than it would be to have an unsigned index with the MSB set. But that's a separate issue.

The way I see it, the extra range is not the main reason to like unsigned types. The main reason is their conceptual simplicity.

Natural numbers are the simplest numbers. Integers are a step up in complexity, then there are rational numbers which are another step up in complexity, etc. Some calculations are better done with rational numbers! But for most indexing calculations, natural numbers are fine.

In particular, with natural numbers, n < 5 is a nice and simple way of saying "n is one of 0, 1, 2, 3, 4".

However, I think there is consensus that it would be good to allow types other than usize for indexing. If I have an array of known size, it makes sense to index it with a fixed-size type rather than a platform-dependent type. Is there a crate for this?

2 Likes

Unsigned types don't model natural numbers though; they model a quotient of the integers. Sometimes it is useful to think of them limited precision 2-adic integers instead. The simplest thing (in terms of reasoning about its behavior) would be to have an arbitrary-precision integer type, but this would incur an unacceptable performance penalty.

One think you might want to do with an index i is compute i - 1. With both unsigned and signed i this can overflow but the value 0 is a lot more common than the value isize::MIN. It's usually easier to prove that the latter never happens (e.g. if i starts out nonnegative).

I think that's the crux of the discussion about index types. Rust effectively restricts the range of values that can be valid indices and actual hardware constraints that range much further. The motivation for making indices unsigned is "indices can't be negative so let's make that constraint with the type system." (Indices also can't be really large and the isize type actually captures that constraint so it's not as clear-cut as it seems.) But then it is incumbent on the users to prove their indices are nonnegative everywhere they did subtraction. This is one place where it would have been better (in my opinion) to have the language handle this complexity especially because as @T4r4sB noted you can test i >= x.len() or i < 0 with a single comparison. But it's a moot point. (In fact if i is isize indexing with i as usize will do exactly this ... That is the feature that some users want to ban.)

That being said it is convenient that lengths are given as unsigned numbers because you know they are nonnegative. The API for read functions is simpler as a result. You know you read a nonnegative number of bytes. This isn't the case in C where negative numbers are used to signal an error and is annoying to write out over and over again. So there are reasons why the type constraint is useful.

2 Likes