Subscripts and sizes should be signed

delfigamer · November 9, 2022, 5:04pm

That's lovely!

So then, in essense, what the OP wants can be realized by providing impl<T> Index<isize> for Vec<T> and for other types in the standard library? Or is there something that prevents us from having multiple impl Index-es for the same type?

Aiden2207 · November 9, 2022, 5:08pm

Adding that impl without removing the one for usize would break type inference.

delfigamer · November 9, 2022, 5:14pm

Is there a direct dependency between the types, or it's only a question of defaults, and it would be possible to use both impls with explicit type annotations?

In case it's the defaulting problem, is there a mechanism for selectively importing or suppressing impls? For example, by having two modules that re-export the same Vec, but with different impl Index, so that the user can choose the default that they desire in their specific module; preferably without forcing their choice to downstream dependencies by not re-exporting the locally used impl.

tczajka · November 9, 2022, 5:24pm

That's part 1 of what the OP wanted. I said in my first response that I agree, and @CAD97 then also said it's basically a consensus that it would be good.

The only concern is that existing code relies on type inference so other rules would probably have to be added to the language to realistically support it in a friendly way.

Parts 2-4 of what the OP wanted was to remove Index<usize> which is the part people disagreed with.

Aiden2207 · November 9, 2022, 5:24pm

It's the defaulting issue. With impls for multiple integer types, rustc won't be able to pick which integer type to coerce the literal to, so it will default to i32 which is just plain wrong. The following code:

use std::ops::Index;

pub struct Test;

impl Index<usize> for Test{
    type Output = Self;
    fn index(&self, idx:usize)-> &Self::Output{
        self
    }
}
impl Index<isize> for Test{
    type Output = Self;
    fn index(&self, idx:isize)-> &Self::Output{
        self
    }
}

fn main() {
    let v = Test[1];
}

Playground

Returns this error:

error[E0277]: the type `Test` cannot be indexed by `i32`
  --> src/main.rs:19:13
   |
19 |     let v = Test[1];
   |             ^^^^^^^ `Test` cannot be indexed by `i32`
   |
   = help: the trait `Index<i32>` is not implemented for `Test`
   = help: the following other types implement trait `Index<Idx>`:
             <Test as Index<isize>>
             <Test as Index<usize>>

For more information about this error, try `rustc --explain E0277`.
error: could not compile `playground` due to previous error

There is no way to scope these impls either, as all trait impls are global.

delfigamer · November 9, 2022, 5:31pm

Is there a reason why this can't be solved by also adding impl Index<i32>? In the first place, is there a good reason why an integral type should not have its corresponding impl, once we already have more than one?

Also, to fully support OP's suggestion, we would also need a polymorphic get - so, if it is currently static, pulling it into a separate trait would be another necessary change.

tczajka · November 9, 2022, 5:38pm

Polymorphic functions don't have to be trait functions. In fact, the get member function for slices already is polymorphic in the index type, since it supports usize and ranges.

Aiden2207 · November 9, 2022, 5:46pm

use std::ops::Index;

pub struct Test;

impl Index<usize> for Test{
    type Output = Self;
    fn index(&self, _idx:usize)-> &Self::Output{
        self
    }
}
impl Index<isize> for Test{
    type Output = Self;
    fn index(&self, _idx:isize)-> &Self::Output{
        self
    }
}
impl Index<i32> for Test{
    type Output = Self;
    fn index(&self, _idx:i32) -> &Self::Output{
        self
    }
}
fn main() {
    //Note: isize::MAX > 4611686018427387904 > i32::MAX
    let v = &Test[4611686018427387904];
}

returns

error: literal out of range for `i32`
  --> src/main.rs:25:19
   |
25 |     let v = &Test[4611686018427387904];
   |                   ^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[deny(overflowing_literals)]` on by default
   = note: the literal `4611686018427387904` does not fit into the type `i32` whose range is `-2147483648..=2147483647`
   = help: consider using the type `i64` instead

It compiles and runs just fine if the isize and i32 impls are removed. While I don't think much code requires this type of type inference, removing the ability to do so is still a breaking change, and could potentially cause issues on code working with very large byte or ZST arrays.

jessa0 · November 9, 2022, 6:31pm

I was thinking the same thing. I ported audio some MDCT code from C, but for me, it was pretty trivial to implement a wrapper index type to make the indexing still directly transcribable from C:

#[derive(Clone, Copy, Debug, Default)]
struct Pointer<T> {
    data: T,
    pos: usize,
}

impl<T> Pointer<T> {
    fn add_pos(&self, index: isize) -> usize {
        isize::try_from(self.pos)
            .expect("buffer underflow")
            .checked_add(index)
            .expect("buffer index overflow")
            .try_into()
            .expect("buffer underflow")
    }
    fn sub_pos(&self, index: isize) -> usize {
        isize::try_from(self.pos)
            .expect("buffer underflow")
            .checked_sub(index)
            .expect("buffer index overflow")
            .try_into()
            .expect("buffer underflow")
    }
}

Trait impls

impl<T> From<T> for Pointer<T> {
    fn from(data: T) -> Self {
        Self { data, pos: 0 }
    }
}

impl<T> Add<isize> for Pointer<T> {
    type Output = Self;
    fn add(self, index: isize) -> Self {
        Self { pos: self.add_pos(index), ..self }
    }
}

impl<T> AddAssign<isize> for Pointer<T> {
    fn add_assign(&mut self, index: isize) {
        self.pos = self.add_pos(index);
    }
}

impl<T> Sub<isize> for Pointer<T> {
    type Output = Self;
    fn sub(self, index: isize) -> Self {
        Self { pos: self.sub_pos(index), ..self }
    }
}

impl<T> SubAssign<isize> for Pointer<T> {
    fn sub_assign(&mut self, index: isize) {
        self.pos = self.sub_pos(index);
    }
}

impl<T: Index<usize> + ?Sized> Index<isize> for Pointer<&'_ T> {
    type Output = <T as Index<usize>>::Output;
    fn index(&self, index: isize) -> &Self::Output {
        &self.data[self.add_pos(index)]
    }
}

impl<T: Index<usize> + ?Sized> Index<isize> for Pointer<&'_ mut T> {
    type Output = <T as Index<usize>>::Output;
    fn index(&self, index: isize) -> &Self::Output {
        &self.data[self.add_pos(index)]
    }
}

impl<T: IndexMut<usize> + ?Sized> IndexMut<isize> for Pointer<&'_ mut T> {
    fn index_mut(&mut self, index: isize) -> &mut Self::Output {
        let index = self.add_pos(index);
        &mut self.data[index]
    }
}

impl<T: Index<RangeFrom<usize>> + ?Sized> Deref for Pointer<&'_ T> {
    type Target = <T as Index<RangeFrom<usize>>>::Output;
    fn deref(&self) -> &Self::Target {
        &self.data[self.pos..]
    }
}

impl<T: Index<RangeFrom<usize>> + ?Sized> Deref for Pointer<&'_ mut T> {
    type Target = <T as Index<RangeFrom<usize>>>::Output;
    fn deref(&self) -> &Self::Target {
        &self.data[self.pos..]
    }
}

impl<T: IndexMut<RangeFrom<usize>> + ?Sized> DerefMut for Pointer<&'_ mut T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.data[self.pos..]
    }
}

Nokel81 · November 9, 2022, 6:34pm

Could we introduce some new annotation for trait implementations that specify the priority towards type inference or have some way to ignore trait implementations for inference?

For instance then we could have


impl<T> Index<usize> for [T] {...}

#[inference(ignore)]
impl<T> Index<isize> for [T] {...}

...

That way there are both for when the type is definitely known but only the first used for inference.

scottmcm · November 9, 2022, 6:59pm

The is the "hinting" or "fallbacks" or whatever feature. And the problem is that it's a mess to actually do that right now -- much as it's wanted in lots of places (see the HashMap default parameter, for example).

It gets really awkward if there are multiple places that could each hint, but they want contradictory fallbacks, for example.

If we get a new, smarter inference engine that the types team says has a sound way to do the preferential inference, then absolutely let's do it. I just don't think that exists yet.

T4r4sB · November 10, 2022, 7:41am

Thas's false, signed index checking need only one cmp instruction too.

It is a dirty hack inside library implementation. It is ok to use dirty hacks and unsafe blocks inside library implementation, but not ok for user code. User need to work with indices as with signed numbers.

Ok, In most cases, where indices are just unique id and nothing more - there is no matter which type to use.
There are some niche cases, when signed indices are very useful.
I want to see niche cases, when unsigned indices has some advantage.

Ok, you right, it sounds very radical, and when I suggested this, I knew, that nobody will agree.
And it is obvious, that it will break existing code.
Anyway, there are a lot of scenarios, when signed indices are very useful. Ok, I can implement my own T4r4sBVec with signed indices, but I can not add my own implementation to built-in array and slices, and that's the problem.

That's strange. If Rust can not deduce type and see a lot of possible alternatives, why can't it choose one of them using some rules, which type have more priority?

Aiden2207 · November 10, 2022, 8:18am

It does. The steps for picking integer literal types are

Attempt to infer the type based on context. If there is not a single unambiguous solution, proceed to step 2.
Pick i32, and error if that doesn't satisfy the requirements.

In other words, i32 has a priority level higher than everything else. This is a known pain point in the inference system. Doing things like mixing a BigUint and integer literals is way more annoying than it needs to be because of it. Some sort of general solution to that issue would be nice, if the compiler is up to the task.

T4r4sB · November 10, 2022, 9:30am

I talk about difference in step2. Why cant Rust select one of types, which passed step1?

farnz · November 10, 2022, 10:30am

OK, since you say that you can do signed index comparison with one compare instruction, please demonstrate this without conversion to unsigned indexes.

Your trick in this thread is to say that for the purposes of bounds checking, unsigned indexes work better, and that when you assume two's-complement representation of signed indexes, converting a signed index to an unsigned index allows you to do the range check with a single instruction.

That, however, is an argument for unsigned indexes - you're saying that you must convert to unsigned to do the bounds check efficiently. And I have never in several decades needed signed indexes in any programming language - not assembly language, not BASIC, FORTRAN, not C, not C with Classes, not C++, not Perl, not Python, not Rust, not Haskell, not Idris, not Agda, or even in Prolog, which means that you need to evidence your claim that the "user needs to work with indices as with signed numbers", since it contradicts my direct experience.

Now, it's entirely plausible that this is because I've never worked in one domain of software where this matters, but you need to explain that domain, and indicate why indexing is the right answer, and why signed indexes are the correct solution in this case, rather than having the user do the casting because they're doing something unusual. You've done neither of these; you have some contrived examples (to which people have demonstrated alternatives that work well), and that's it.

As a formal logic exercise, too, signed indexes fall apart. The ideal type for indexing a data structure is a range constrained integer such that all values of that integer are valid indexes for the data structure. For Vec and slices, that's an integer whose permitted values (type system checked) are between [0, data.len()); in practice, we don't yet have good type systems for handling such constrained integers (for example, we need to be able to determine the behaviour of shifting temporarily out of bounds - do you change the constraints, so that your Int<0, 8> becomes an Int<-8, 17> during arithmetic, but restores to an Int<0, 8> if the compiler can prove that you're back in the range [0, 8), do we wrap, do we panic?), and thus we use existing integer types.

However, we live in the real world, where constrained integers are merely a subset of dependent types research, and not something that's (yet) practical to use in a real programming language. We therefore have to choose a data type we can actually work with for indexing; unsigned data types have a range of 0 to MAX, while signed data types have a range of MIN to MAX. With an unsigned data type, we have one set of invalid values (data.len()..=MAX), while with a signed data type, we have two (MIN..0 and data.len()..=MAX).

With that said, in C and C++, unsigned indexes are problematic because of their integral promotion rules interacting with their integral conversion rules; you can start with an unsigned value, and the compiler is permitted to convert it to a signed value behind your back. Worse, because C++ specifies that unsigned is always done as modulo arithmetic, but signed is implementation defined (it doesn't even need to be two's-complement, so your trick doesn't work in C++), you can write code in C++ that looks to be fully defined by the standard but in fact is implementation defined because it promoted unsigned to signed.

Rust doesn't have this problem, because it doesn't have integral promotion rules, nor does it have implicit integral conversion. If you're defining your own data type (and you should for a lot of things, rather than directly using Vec or slices), then you can define your own implementation of std::ops::Index for your data type, with the index types that work for your problem domain. You can also define your own fn get(&self, index: Idx) -> Option<T> and get_mut(&mut self, index: Idx) -> Option<T> methods that do the right thing. But for a raw slice, or a raw Vec, unsigned indexing is the right thing, because it means that there's only one set of illegal indexes, not two.

T4r4sB · November 10, 2022, 1:31pm

I dont need it. It is just a part of inner realisation. User doesnt see it.
Also you suggest me to make something it each time in user code instead make it once in library code. It is bad.

It is obvious - code with a lot of casting harder to write and harder to read. And if you forgot to cast, compiler will not tell you about it, it will make you deal with panic-prone unsigned subtracting.

Any task when you work with array indices as coordinates.

If fact you have two subsets of invalid indices each time when you use calculations with subtracting. If you make it with unsigned numbers - you just have a runtime panic. If code panic - code is wrong.

Really? https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html

You think, that you defended from C++-problems using strong conversion and calculation rules? Really? Runtime panic "subtraction with underflow" told us "NO".

It is not right, because it is very easy to forget cast to isize when you need subtract two usize numbers.

farnz · November 10, 2022, 3:18pm

T4r4sB:

farnz:

OK, since you say that you can do signed index comparison with one compare instruction, please demonstrate this without conversion to unsigned indexes.

I dont need it. It is just a part of inner realisation. User doesnt see it. Also you suggest me to make something it each time in user code instead make it once in library code. It is bad.

farnz:

and why signed indexes are the correct solution in this case, rather than having the user do the casting because they're doing something unusual

It is obvious - code with a lot of casting harder to write and harder to read. And if you forgot to cast, compiler will not tell you about it, it will make you deal with panic-prone unsigned subtracting.

farnz:

but you need to explain that domain

Any task when you work with array indices as coordinates.

But what you're basically saying here is that instead of asking someone using coordinates to wrap up their underlying data structures (Vec, slices etc) in a library of their own that does the right thing, you want to make Rust worse for people who aren't using indexes as co-ordinates so that people who are treating indexes as co-ordinates have an easier life.

And, FWIW, when working in video, I was using array indexes as co-ordinates, but did not want signed indexes into my &mut [u8] - I wanted unsigned, and I wanted to have a carefully controlled edge where I ensured that the indexes I was getting from the outside were valid and correct.

You keep using this word "obvious" - in the vast majority of code I have written in my career, indexes are naturally unsigned, and if you make indexes signed, I would have to add a lot of casts from unsigned to signed to make my code compile.

So, based on your own reasoning, unsigned indexes are the correct thing in virtually all cases, because if you make indexes signed, you have to scatter casts all over user code to convert unsigned indexes (computed from co-ordinates within the video framebuffer) to signed. And note that this is a co-ordinates task, where I want unsigned - so your assertion that "any code using co-ordinates" benefits from signed indexes is false on its face, since I have direct experience where it's not true.

Yes. If you have any operations on the index other than passing it around (addition and multiplication can overflow, subtraction can underflow), you have to consider the possibility of overflow and underflow, and handle it correctly - Rust, at least, panics in debug (but not release) mode if you don't think about this, letting you know that you've written buggy code and need to think about overflow/underflow.

This applies whether or not indexes are signed or unsigned - the advantage of unsigned indexes is that the unoptimized bounds check is always a single check, regardless of what my problem domain uses for indexes. If you need to work with signed indexes, you can always wrap the raw data structures in your own library that's optimized for your domain, and do both checks there (with inlining often optimizing out one or both checks for you).

Really. That is a change in C++20 and later - but it does not affect older versions of C++, including (for example) ISO/IEC 14882:2003, the version that is still in use in many codebases I've had to interact with (to my annoyance - I'd like them to upgrade to at least the 2011 standard, if not the 2017 standard). I have yet to see a serious codebase that insists on C++20 - it's still too new for that.

Yes, I do think I've defended that we don't have the same problems as C++; in C++, you can't reliably use unsigned indexing, because there are cases where an unsigned integer is silently converted to a signed integer, and arithmetic that is well-defined in unsigned integers (wrapping) becomes undefined behaviour, permitting the compiler to take what looks like well-defined code (idx += offset), and convert it to code whose behaviour is undefined. Once there's a path through your code with undefined behaviour at runtime, you can get any outcome from your program - there's an example here where the compiled program would legitimately optimize to invoking system("rm -fr /") due to UB.

In Rust, you get a panic in debug mode, telling you you've made a mistake, and you get wrapping (in two's complement if signed) in release mode. The panic is well-defined behaviour, as is the wrapping, which means that your program won't do anything massively not obvious - the indexing will go wrong and you'll need to fix it, but it's well-defined behaviour.

This is a massive difference - the behaviour of a Rust program is well defined in the face of predictable human error, even if it's not what the programmer intended. In C++, though, I can write what looks like a program with well defined behaviour, and have integer coercion rules convert well defined behaviour to UB for me. This makes unsigned indexes in C++ dangerous - depending on the exact sizes of the various types involved, I could have a 64 bit unsigned index (well-defined behaviour for all minuends and subtrahends) promoted to a 128 bit signed index during arithmetic (undefined behaviour in the face of overflow or underflow).

And as UB stops you reasoning about your program, because the compiler can assume that there is no UB, this is dangerous. But Rust has well-defined behaviour for all overflows and underflows, and no implicit conversions between integral types, and so you do not have the danger of accidentally hitting UB.

The behaviour of subtraction in Rust is well-defined for all values. For unsigned numbers, if your subtrahend is larger than your minuend, then the behaviour is to panic in debug mode, and to wrap in release mode. This means that isize::MIN - 1 has well-defined behaviour, and I know that it will panic in debug mode, or result in isize::MAX in debug mode. Similarly, 0usize - 1 is either a panic in debug mode, or usize::MAX in release mode.

This differs to C++, where std::numeric:limits<int>::min() - 1 is UB, and there's no way for me to deduce how my program will behave if I execute that line of code.

Further, Rust provides (already) Wrapping<usize> to mark that I explicitly want wrapping behaviour in debug and release modes, and there's experiments to add Saturating<usize> to saturate in debug and release modes, plus we have pre-existing wrapping_* and saturating_* methods, plus explicit conversions if you want to .

Finally, as I've already said, when your problem domain really does need signed indexes, you can define your own data types doing the conversions for you, complete with Index and IndexMut implementations. Going back to my video framebuffer, for example, I made use of the fact that Rust optimizes nicely to allow me to have my own struct Framebuffer, instead of a raw Vec<u8>, which in turn had get and get_mut methods plus Index and IndexMut trait implementations that took pairs for co-ordinates, not a single number, did overflow checks (so that attempting to access sample 721 of a 720 sample long line didn't accidentally access the first sample of the next line), and then did the underlying indexing into a Vec<u8>. The resulting code (checked via looking at the assembly output) was always as good as using a raw Vec<u8>, and in some cases better because the fact that get returned Some(val) was enough to constrain the possible values of the input X and Y co-ordinates to get.

delfigamer · November 10, 2022, 6:19pm

What is wrong with checking with conversion to unsigned indexes?

Is that assumption wrong? Where?

There is an important detail - this is an argument for using unsigned in the implementation of Vec. This thread, however, talks about its interface, which is serving a different purpose in the program. The entire point of abstraction is to provide an interface that is designed for ease of use first and foremost, and ease of implementation is only a secondary consideration.

In other words, the type of index in the public interface should be chosen with regards to how it is used, and not how it is implemented.

In Python and Haskell, indexes are signed. Which brings a question: did you ever feel the need for unsigned indexes in these languages? Since you didn't write anything about that, it seems to me that you didn't actually care about the signedness at all, and was only using them in algorithms that are insensitive to that.

After all, for the signed indexes to be a better choice, they don't have to be needed a lot - they only have to be needed more often than the unsigned.

To be more precise, unsigneds have a range from 0 to 2*MAX+1, if we are comparing them by the same MAX.

I don't consider these to be two separate sets, though that's probably just my personal preference. I tend to think of them as just "valid index values" and "not valid index values" (with the precise definition of "valid" being mostly an implementation detail).

A slight correction: signed overflow is not "implementation defined", it's "undefined behavior". The difference is that, "implementation defined" features have to be actually defined by the implementation - for example, in the user manual of the compiler. "Undefined behavior", on the other hand, is allowed to stay truly undefined - when a program triggers UB, the implementation is allowed to do anything, even with respect to actions that came before the UB. In practice, the optimizing compilers tend to use this provision as an axiom that UB never happens, and when transforming the code, only care about preserving code paths that do not trigger UBs. This can sometimes lead to bizarre consequences, such as completely eliding a check to stop the loop, thus making it infinite.

Due to occurrences such as that, the term "undefined behavior" has gained a certain aura of fear and respect around it (and the signed overflow gets to enjoy its full glory); while "implementation defined" or "unspecified result" are seen as much more benign.

In practice, all of the platforms supporting C++ use 2's complement for negatives. The ones that don't, even if they exist, are so obscure that any non-trivial program would fail for numerous other reasons long before the TarasB's trick ever comes into play. As a reference, it's already quite rare that unaltered sources from 00s and 90s successfully work with modern compilers, even though they are for the almost same processor and OS. Once you move such a code base to a completely different processor, those incompatibilities would only get worse.

And that is even before considering, why would you spend time and effort porting C++ and Rust to such a machine in the first place, or why you would consider them important enough to bend and twist the language design itself, just to allow their support in an indeterminate future.

CAD97 · November 10, 2022, 7:17pm

This alludes to what's actually a major reason for requiring unsigned indices.

In Python, a negative index is not out of bounds; instead, it indexes from the end of the array. This is a completely reasonable interpretation of a negative index.

[_] (slices) are a core, low level building block. By only supporting unsigned indexing for slices, Rust chooses to not need to choose a behavior for negative indices.

This is of a significant benefit that's rather easy to overlook: if someone writes v[-1], they get an immediate and clear compile error stating that negative indices cannot be used. If this compiled, then the error would only be detected by a runtime panic, and the developer would need to determine that the panic is not because the slice is empty, but because Rust doesn't implement negative indexing as from-the-end.

If you want to use negative indices, you can and should make a wrapper which provides the Index implementation and implement it how you expect. There's some extra friction for using the nonstandard type, but that's an understood tradeoff to Rust's coherence model.

Going further in circles belaboring the point won't help anyone.

riking · November 10, 2022, 7:18pm

And they have the special behavior of indexing from the rear of the collection. That's important to mention. It also adds extra branches, which is not faster.

This thread is not generating anything productive anymore. I suggest people stop replying.

Topic		Replies	Views
Subscripts and sizes should be signed (redux) language design	41	1700	November 9, 2023
Revisiting the unsigned ideas (deprecated)	5	2062	March 25, 2019
The problem with array/slice/vector indexes language design	14	9771	March 25, 2019
Unsigned integer type inference language design	24	4422	March 25, 2019
To improve usize (and isize) handling in Rust language design	6	1328	September 19, 2020

Subscripts and sizes should be signed

Related topics