There is an article about C++ from Bjarne Stroustrup: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf
There are some videos with him where he talk about this problem too. Unsigned indexes was a mistake in C++ STL design. And I dont understand, why Rust repeats this mistake.
The main idea: it we work with a bit representation of some value, it is better to use unsigned type, otherwise we need signed type. And Indexes and sizes of arrays must be signed too.
Some of Stroustrup's arguments are applicable in the Rust. Yes, there is some difference between size_t in C++ and usize in Rust: in C++ unsigned type is a module type, but in the Rust it is not.
Really, where are advantages of unsigned indexes?
May be, checking unsigned index is more effective (1 branch instead of 2)? No, there is a simple hack to check signed index using one branch: (index as usize) < (length as usize)
. Sometimes it seems to me that people, who made the decision about the unsigned index, really did not know about this hack.
May be, checking unsigned index prevent us from some errors? No. For example, I need to enumerate all elements of vector excluding last. With signed indexes I need just write
for i in 0 .. v.len() - 1
.
But with unsigned indexes this code crashes in C++ and in the Rust. It is an exemple of error generated by unsigned indexes! With unsigned I need to write something like this:
for i in 0 .. min(v.len() as isize - 1, 0) as usize
. Iterator-fans can say that I am very old and indexes are too vintage, and it is better to write
for (i, e) in v.iter().enumerate.filter(|i, _| i + 1 < v.len())
. If they think, than this loop header is more clear, I cant talk them something else.
May be, unsigned index is good, because we have a compile-time invariant, that type corresponds valid values? No, it is not, because if some vector has size s, the data type corresponding all valid indexes must be [0 .. s-1]. But unsigned type corresponds number from 0 to 2**ptr_size-1. So we have type which corresponds valid values only from lower bound. And anyway we need to check upperbound in runtime.
Also, the sentence, that lower-bound is compile-time-checked is a lie. Let we write
arr[x-y]
. How compiler can predict, that x-y>=0
? It cant. In fact this "compile-time invariant" supported by runtime checks. "Compile-time invariant" in this context is just a beautiful words for hipsters, without any useful purpose.
Also about the idea where we bound each data by type corresponding only valid values. Why it is good? It is not obvoius for me. Very often we need to make calculations where intermediate result is out of bounds. And we need to write dirty code full of casts, it os not good!
Also I have an example where unsigned index is slower that signed.
if let Some(element) = arr.get(x-y) { ... }
With signed indexes this code has only one branch.
But indexes are unsigned, and we need to check overflow and underflow using different ways, and compiler cant optimize it:
if let Some(element) = if x<y { None} else {arr.get(x-y)} { ... }
If you has an example where unsigned index is more effective, please, write it. Because I dont know such examples.
You can see, that with unsigned indexes code if more dirty, more verbose and slower. So I think, we need to make these things using many steps:
- Add possibility use signed index and size in language code and std
- Add requirment to use signed indexes in slice traits.
- Mark unsigned indexes as deprecated.
- Disable unsigned indexes. Rust codebase is not too big as in C++, so it is not to too late to fix this design mistake.