Subscripts and sizes should be signed

parasyte · November 25, 2022, 9:16pm

T4r4sB:

I remember some real example - 2d image processing library. Which data type we need to use for image sizes? I selected usize to be more consistent with other Rust libraries. I could choose isize, but I dont like situation when each library has own type for sizes - it is zoo of types which produces a lot of useless conversions.

For example, there is a library for reading and writing bmp files, author used u32, and I asked him to use usize: u32 as index is incompatible with rust vectors · Issue #31 · sondrele/rust-bmp · GitHub. Why did author select u32? May be he wanted to specify, that bmp width and height can not be bigger that 2**32 pixels, but which advantages has the users? Only lots of casting boilerplate. Who would suffer if he chose a more conventional usize type? Nobody. It's just a religion to write useless conversions for the sake of abstract ideas, which in this case do not give any profit. Custom type for each library - is useless terrible idea. Universal integer type is good and useful.

After using usize for image size I tried to write a method which copies a sprite with offset, and offset can be a negative, of course. So I have an interface with different types for size and offset. And when I need to mix both of them in some calculations, I need to cast both of them to isize. It will be much more useful to have same isize type for size and offset. But it will be not consistent with other Rust code. It would be much better if Rust had isize for sizes and indices from the beginning.

Eh. I do a lot of 2D image processing and my take on this is basically "usize indexing not a big deal here". The pix crate uses i32 for pixel range coordinates because you really don't need images with either dimension larger than 2**31 - 1 but you do want to handle negative coordinates. Likewise, the range dimensions themselves are u32 because you do not want negative dimensions. Even u32 is not a perfect representation for dimensions because 0 and >= 2**31 are invalid.

This crate itself handles negative coordinates by clipping them into the image range, so they end up as unsigned anyway! The icing on the cake is that x as usize is infallible^[1] when x is in [0, 2**31), without resorting to x.try_into().unwrap(). And the fact that the cast really only needs to be done in a very limited number of places, e.g. when creating an iterator, really begs the question, why would you want to change the representation of a bitmap to use usize dimensions when it is completely unnecessary?

"Lots of casting and boilerplate" is an exceptional situation, perhaps driven by other factors like choosing to write classic for-loops with integers over creating iterators. In practice, crates like pix show that this statement is inaccurate at best.

Speaking of abstraction, referring to your claim that it's purely religious, I argue that you're mixing things up and choosing only what is convenient for you at the time. Let's revisit what you said earlier about a performance optimization with isize indices:

T4r4sB:

fn get_slice_from_zero(&self, upper_bound: isize) -> &[T] {
  assert!(upper_bound as usize < len as usize);
  self.get_slice_unchecked(0, upper_bound)
}
We have only one check, which is necessary with unsigned sizes too.

But ... this is abstraction! You are literally hiding a cast in this situation saying it is fine to do so, and then you turn around and decry how awful it is to have "Lots of casting and boilerplate" as if abstraction does not exist at all. And yet, the Iterator<Item = usize> is the abstraction that you are looking for to hide the casting!

Abstraction cannot be both good and bad at the same time. A more plausible explanation is that you are not taking full advantage of it. Saying "I don't want to use the iterator abstraction because I prefer a for-loop that increments integers" is a problem you are going to have to deal with on your own. The iterator will help you hide the details that you dislike, but it's up to you to take advantage of that property of abstraction.

I would go as far to suggest that your 2D image processing experience is probably not representative of the common situation. The situation where a user will just pull in a crate for image processing, and it will do the right thing without having to worry about details that don't matter for processing images (like that pesky usize index).

Case in point, my little game that uses pix doesn't have a single coordinate casted to usize for indexing, because doing so at this level of abstraction is unnecessary. In full disclosure, I do go the other way though, casting a dimension (u32) into a coordinate (i32). But that's for creating an iterator. Far from "Lots of casting and boilerplate", FWIW.

I cannot imagine a world where isize index is much more useful than usize for image processing. If anything, it will break even. Net zero either way. You pretty much have to cast between coordinates and dimensions at some point. Even if you choose to use signed dimensions to avoid a cast (don't do this lol) you still have to protect against <= 0 dimensions when creating the image and when converting from a coordinate.

The ultimate type for an image dimension is probably something like NonZeroU31, but it comes with its own thorns. First because this type does not exist in the standard library and second because you still have to use try_into() or unwrap an Option to safely get one from an arbitrary primitive type like u32 or i32.

That said, I also do a lot of audio processing! We have buffers (e.g. VecDeque<Sample> or RingBuffer<Sample> and the like) but very little need for indexing them with isize. Even with a ring buffer you don't want to cross the boundary (i.e. index 0) while going backwards for echo effects or whatever. Yeah, I just don't buy the idea that isize is somehow objectively superior for indexing in these moderately common cases.

This is true on 32-bit and 64-bit systems. The unwrap() can panic on 16-bit systems with very large images, so you're already doing error handling incorrectly regardless. ↩︎

user16251 · November 25, 2022, 10:53pm

For [T] with T: Copy you can just modify the code to do the dereference. @CAD97 posted a macro which works better.

The example in the original post is slightly misleading. In release mode arr.get((x - y) as usize).copied() generates code with a single branch, while x.checked_sub(y).and_then(|d| arr.get(d as usize).copied()) generates code with two branches (in release mode). But these two snippets of code are not equivalent, e.g., if x is isize::MIN and y is isize::MAX.

jhpratt · November 25, 2022, 11:53pm

Realistically, how you could see it playing out if you were able to convince everyone. What steps?

T4r4sB · November 29, 2022, 8:48am

It is hard to answer to all points, so I chose only some of them.

usize type does not protect me from negative numbers. With isize I will get runtime error when I try to apply negative size, with usize I will get runtime error on intermediate result - no difference. But usize is a problem, if I really need negative intermediate result.

I dont know how can user work with 2D image without looking to image size and comparing it with image offset.

"Perfect representation" is useless. We need universal type without problems in math operations.

Because I work with coordinates, and use them in intermediate vectors, I need to have the same type for all cases, otherwise I have useless conversion boilerplate.

And how do you represent negative offset?

Add indexing with isize, and "try-apply-isize" as new rule when we has many possible types to apply.

jhpratt · November 29, 2022, 3:49pm

As I have previously said, please be specific. I have noted issues with that approach. It is simply not possible to "add indexing with isize" right now without causing monumental breakage. That is why I am asking you to be specific. You have not laid out the necessary steps that are necessary to avoid the breakage, which leads me to believe that you simply do not understand the scale of the breakage you are proposing. I asked for the absolute minimum that is necessary, and you have not even provided that. Not even close.

You cannot just ignore the issues — they have to be directly addressed up-front. And again, this is completely setting aside whether it's even desired.

I suggest you look at the RFC template and start from there. It is designed to force you to think critically about a number of details. That is probably the starting point if you actually wanted a change, as an RFC would absolutely be required here.

parasyte · November 29, 2022, 9:51pm

These are the kinds of things I tried to demonstrate as being "irrelevant" to the task of actually drawing, sampling, or filtering image contents because a good abstraction will not require the caller to compare an image size with an offset within the image. The abstraction does the comparison and makes this point moot to the caller (the common situation). Once again, an iterator is a concrete example of such an abstraction.

If such a need arises, it can be solved by subtracting a positive integer from the current index. Iif it arises at all. Most of the time we deal with zero-based indices and seeking backwards from the current index without knowledge of what the physical index is, sounds like a very unusual circumstance. Perhaps your abstract representation is modeled after the C standard library's fseek() function where the offset is relative to SEEK_CUR?

I choose not to respond directly to your other comments because the appropriate response is "hide that in a layer of abstraction."

CAD97 · November 30, 2022, 12:02am

The main question in response to that question: how often do you need an offset which is either positive or negative, and you don't know which?

Most use cases in my experience either all go in the same direction (e.g. the offset of a field in a structure) or are served as well (if not better) as absolute indices (or relative to some shared anchor, with the anchor on one side of the indexed space).

...with the notable exception of physical models, but these cases also mostly want a numerical real type (e.g. floating point) rather than an integral type.

Even for cases that do seem like they benefit from runtime signed offsets, they're often better served by more targeted solutions. E.g. "nearby pixels" is something you might quickly jump for a for y in y-1..=y+1, if y in 0..image.size.y, for x in x-1..=x+1, if x in 0..image.size.x style loop, this is likely better expressed with something like for pixel in image.nearby((x, y), 1) which encapsulates the index manipulation entirely. Especially if this is something you do often (as it seems you do), since this means all of its users handle the literal edge cases uniformly (do you ignore out of bounds? or do you perhaps return a default color? or stretch the edge color? or wrap? or mirror? these are all valid answers provided in graphics APIs) rather than edge case handling being sprinkled all around your codebase.

Yes, at some point in the stack, you're probably going to need to convert between types. Yes, this adds more typing (in both ways) overhead than just using a single uniform type. Yes, most (but not all) uses of usize are more using it as "NonNegativeIsize" and values >isize::MAX are still logically invalid.

However, this is the entire point of abstractions. Even with your poster child (e.g. indexing slices with isize), there's still a conversion going on behind the abstraction (e.g. comparing as usize to treat negative as out of bounds with the same check catching too positive indices).

Yes, usize indexing treats too-negative and too-positive nonuniformly. Yes, you can argue in good faith the benefit of forcing an explicit decision quickly when index manipulation goes negative is outweighed by the cost of handling negative overflow separately from positive overflow.

However, by far the more common case is that a manipulated index ever going negative is a bug that should immediately panic rather than a valid interim value.

Additionally, adjusting integer fallback such that Index<isize> doesn't cause massive breakage is quite difficult. It's not (and can't be) just "if multiple choices, chose i32, but if that's not valid, choose isize", because (well for one it'd still be semantic changing, just much more subtly so) type inference doesn't work like that. This naive specification essentially results in having 2ⁿ options from choosing between i32 and isize where n is the number of integer literals in a function where you just have to try every option. What do you do if x: i32, y: i32 is invalid, but either x: i32, y: isize or x: isize, y: i32 are?

There are multiple massive hurdles before indexing slices with isize is practical, and it is very much not worth the very marginal benefit provided to put in that effort to make it so you can delete some conversions instead of designing a semantically meaningful abstraction boundary between position/offset and raw linear index.

Plus, in the specific case of images, you're dealing with a 2D data source, so you probably want to index with (x, y), so you already have a boundary in place where you can encapsulate conversion between your image pixel position/offset (isize, isize) and the raw usize index. (With fixed-size matrices you can use [[T; N]; M] and index [m][n], but for dynamically sized matrices you absolutely want Vec<T> and not Vec<Vec<T>>.)

T4r4sB · November 30, 2022, 8:52am

So when I collect main your point - you suggest me to use library, which holds all possible usage of sizes and offsets in special methods. And the all boilerplate pain holded only by library writers. But what if stardart libraries are not enough for me and I need to write something like this, but more specific for my case? In this case I need to feel all this pain with conversion boilerplate, which is not necesary in "parallel world", where Rust's designer did not make unsigned error. I need to create an abstraction to hide conversion problem inside it, but in good-designed language we dont need integer conversions.

What if I dont know, need I to move forward or backward and I need universal type, which can be adding positive integers or subtracting positive integer? Oh, sorry, this type has name "signed integer".
Look like I repeating 500-years old math discovery, when mathematicians designed negative numbers T_T

T4r4sB · November 30, 2022, 9:07am

Every time I need to copy a sprite to big image. Sprite can be partialy moved out from left or top side of big image - it is negative offset.

And I meet the same unsigned problem while implementing Image::nearby. If I write just range x-1..x+1, I will have a runtime error when x is unsigned zero. And compilers dont save me. A typical example of how unsigned numbers create problems from scratch.

Yes, I have earned a lot of problems, and this is the entire point of abstractions?

It is a part of system library, so in this case we are hiding a problem inside deeper level, and we make programmer of upper levels free from conversion problems.

Could you provide an example, please?

I think we can create some rules to fix problems like this. The only problem is a motivation.

Looks like Rust's founders did not look to negative C++'s experience, and now it is too late to fix the problems(

This abstraction is useless in real code.

Yes, I know it, of course, but it is inner realisation. I talk about the interface. The most usable interface is isize size and isize index and isize offset, but in this case my 2d-image library will be inconsistent with other Rust code.

lordan · November 30, 2022, 10:39am

Note that not even for C++ everybody universally agrees with that assessment: https://www.youtube.com/watch?v=Fa8qcOd18Hc

nakacristo · November 30, 2022, 3:58pm

Solving inference issues is an important matter on its own. If you have a good solution it would be greatly appreciated. And to that matter, the possibility of indexing with isize is just another minor motivation. I believe a good starting point is this post by Aria.

People with great motivation and ideas have failed to find an adequate way to address these problems. Please, try yourself to improve it. Indexing slices will most surely be untouched until then.

notriddle · December 7, 2022, 3:59pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Subscripts and sizes should be signed (redux) language design	41	1575	November 9, 2023
Revisiting the unsigned ideas (deprecated)	5	2023	March 25, 2019
The problem with array/slice/vector indexes language design	14	9497	March 25, 2019
Allowing slice indexing with non-usize integer types libs	12	2702	April 3, 2021
Unsigned integer type inference language design	24	4308	March 25, 2019

Subscripts and sizes should be signed

Related Topics