Subscripts and sizes should be signed

Eh. I do a lot of 2D image processing and my take on this is basically "usize indexing not a big deal here". The pix crate uses i32 for pixel range coordinates because you really don't need images with either dimension larger than 2**31 - 1 but you do want to handle negative coordinates. Likewise, the range dimensions themselves are u32 because you do not want negative dimensions. Even u32 is not a perfect representation for dimensions because 0 and >= 2**31 are invalid.

This crate itself handles negative coordinates by clipping them into the image range, so they end up as unsigned anyway! The icing on the cake is that x as usize is infallible[1] when x is in [0, 2**31), without resorting to x.try_into().unwrap(). And the fact that the cast really only needs to be done in a very limited number of places, e.g. when creating an iterator, really begs the question, why would you want to change the representation of a bitmap to use usize dimensions when it is completely unnecessary?

"Lots of casting and boilerplate" is an exceptional situation, perhaps driven by other factors like choosing to write classic for-loops with integers over creating iterators. In practice, crates like pix show that this statement is inaccurate at best.

Speaking of abstraction, referring to your claim that it's purely religious, I argue that you're mixing things up and choosing only what is convenient for you at the time. Let's revisit what you said earlier about a performance optimization with isize indices:

But ... this is abstraction! You are literally hiding a cast in this situation saying it is fine to do so, and then you turn around and decry how awful it is to have "Lots of casting and boilerplate" as if abstraction does not exist at all. And yet, the Iterator<Item = usize> is the abstraction that you are looking for to hide the casting!

Abstraction cannot be both good and bad at the same time. A more plausible explanation is that you are not taking full advantage of it. Saying "I don't want to use the iterator abstraction because I prefer a for-loop that increments integers" is a problem you are going to have to deal with on your own. The iterator will help you hide the details that you dislike, but it's up to you to take advantage of that property of abstraction.

I would go as far to suggest that your 2D image processing experience is probably not representative of the common situation. The situation where a user will just pull in a crate for image processing, and it will do the right thing without having to worry about details that don't matter for processing images (like that pesky usize index).

Case in point, my little game that uses pix doesn't have a single coordinate casted to usize for indexing, because doing so at this level of abstraction is unnecessary. In full disclosure, I do go the other way though, casting a dimension (u32) into a coordinate (i32). But that's for creating an iterator. Far from "Lots of casting and boilerplate", FWIW.

I cannot imagine a world where isize index is much more useful than usize for image processing. If anything, it will break even. Net zero either way. You pretty much have to cast between coordinates and dimensions at some point. Even if you choose to use signed dimensions to avoid a cast (don't do this lol) you still have to protect against <= 0 dimensions when creating the image and when converting from a coordinate.

The ultimate type for an image dimension is probably something like NonZeroU31, but it comes with its own thorns. First because this type does not exist in the standard library and second because you still have to use try_into() or unwrap an Option to safely get one from an arbitrary primitive type like u32 or i32.

That said, I also do a lot of audio processing! We have buffers (e.g. VecDeque<Sample> or RingBuffer<Sample> and the like) but very little need for indexing them with isize. Even with a ring buffer you don't want to cross the boundary (i.e. index 0) while going backwards for echo effects or whatever. Yeah, I just don't buy the idea that isize is somehow objectively superior for indexing in these moderately common cases.


  1. This is true on 32-bit and 64-bit systems. The unwrap() can panic on 16-bit systems with very large images, so you're already doing error handling incorrectly regardless. ↩︎

12 Likes

For [T] with T: Copy you can just modify the code to do the dereference. @CAD97 posted a macro which works better.

The example in the original post is slightly misleading. In release mode arr.get((x - y) as usize).copied() generates code with a single branch, while x.checked_sub(y).and_then(|d| arr.get(d as usize).copied()) generates code with two branches (in release mode). But these two snippets of code are not equivalent, e.g., if x is isize::MIN and y is isize::MAX.

1 Like

Realistically, how you could see it playing out if you were able to convince everyone. What steps?

It is hard to answer to all points, so I chose only some of them.

usize type does not protect me from negative numbers. With isize I will get runtime error when I try to apply negative size, with usize I will get runtime error on intermediate result - no difference. But usize is a problem, if I really need negative intermediate result.

I dont know how can user work with 2D image without looking to image size and comparing it with image offset.

"Perfect representation" is useless. We need universal type without problems in math operations.

Because I work with coordinates, and use them in intermediate vectors, I need to have the same type for all cases, otherwise I have useless conversion boilerplate.

And how do you represent negative offset?

Add indexing with isize, and "try-apply-isize" as new rule when we has many possible types to apply.

As I have previously said, please be specific. I have noted issues with that approach. It is simply not possible to "add indexing with isize" right now without causing monumental breakage. That is why I am asking you to be specific. You have not laid out the necessary steps that are necessary to avoid the breakage, which leads me to believe that you simply do not understand the scale of the breakage you are proposing. I asked for the absolute minimum that is necessary, and you have not even provided that. Not even close.

You cannot just ignore the issues — they have to be directly addressed up-front. And again, this is completely setting aside whether it's even desired.

I suggest you look at the RFC template and start from there. It is designed to force you to think critically about a number of details. That is probably the starting point if you actually wanted a change, as an RFC would absolutely be required here.

6 Likes

These are the kinds of things I tried to demonstrate as being "irrelevant" to the task of actually drawing, sampling, or filtering image contents because a good abstraction will not require the caller to compare an image size with an offset within the image. The abstraction does the comparison and makes this point moot to the caller (the common situation). Once again, an iterator is a concrete example of such an abstraction.

If such a need arises, it can be solved by subtracting a positive integer from the current index. Iif it arises at all. Most of the time we deal with zero-based indices and seeking backwards from the current index without knowledge of what the physical index is, sounds like a very unusual circumstance. Perhaps your abstract representation is modeled after the C standard library's fseek() function where the offset is relative to SEEK_CUR?


I choose not to respond directly to your other comments because the appropriate response is "hide that in a layer of abstraction."

The main question in response to that question: how often do you need an offset which is either positive or negative, and you don't know which?

Most use cases in my experience either all go in the same direction (e.g. the offset of a field in a structure) or are served as well (if not better) as absolute indices (or relative to some shared anchor, with the anchor on one side of the indexed space).

...with the notable exception of physical models, but these cases also mostly want a numerical real type (e.g. floating point) rather than an integral type.

Even for cases that do seem like they benefit from runtime signed offsets, they're often better served by more targeted solutions. E.g. "nearby pixels" is something you might quickly jump for a for y in y-1..=y+1, if y in 0..image.size.y, for x in x-1..=x+1, if x in 0..image.size.x style loop, this is likely better expressed with something like for pixel in image.nearby((x, y), 1) which encapsulates the index manipulation entirely. Especially if this is something you do often (as it seems you do), since this means all of its users handle the literal edge cases uniformly (do you ignore out of bounds? or do you perhaps return a default color? or stretch the edge color? or wrap? or mirror? these are all valid answers provided in graphics APIs) rather than edge case handling being sprinkled all around your codebase.

Yes, at some point in the stack, you're probably going to need to convert between types. Yes, this adds more typing (in both ways) overhead than just using a single uniform type. Yes, most (but not all) uses of usize are more using it as "NonNegativeIsize" and values >isize::MAX are still logically invalid.

However, this is the entire point of abstractions. Even with your poster child (e.g. indexing slices with isize), there's still a conversion going on behind the abstraction (e.g. comparing as usize to treat negative as out of bounds with the same check catching too positive indices).

Yes, usize indexing treats too-negative and too-positive nonuniformly. Yes, you can argue in good faith the benefit of forcing an explicit decision quickly when index manipulation goes negative is outweighed by the cost of handling negative overflow separately from positive overflow.

However, by far the more common case is that a manipulated index ever going negative is a bug that should immediately panic rather than a valid interim value.

Additionally, adjusting integer fallback such that Index<isize> doesn't cause massive breakage is quite difficult. It's not (and can't be) just "if multiple choices, chose i32, but if that's not valid, choose isize", because (well for one it'd still be semantic changing, just much more subtly so) type inference doesn't work like that. This naive specification essentially results in having 2n options from choosing between i32 and isize where n is the number of integer literals in a function where you just have to try every option. What do you do if x: i32, y: i32 is invalid, but either x: i32, y: isize or x: isize, y: i32 are?

There are multiple massive hurdles before indexing slices with isize is practical, and it is very much not worth the very marginal benefit provided to put in that effort to make it so you can delete some conversions instead of designing a semantically meaningful abstraction boundary between position/offset and raw linear index.

Plus, in the specific case of images, you're dealing with a 2D data source, so you probably want to index with (x, y), so you already have a boundary in place where you can encapsulate conversion between your image pixel position/offset (isize, isize) and the raw usize index. (With fixed-size matrices you can use [[T; N]; M] and index [m][n], but for dynamically sized matrices you absolutely want Vec<T> and not Vec<Vec<T>>.)

3 Likes

So when I collect main your point - you suggest me to use library, which holds all possible usage of sizes and offsets in special methods. And the all boilerplate pain holded only by library writers. But what if stardart libraries are not enough for me and I need to write something like this, but more specific for my case? In this case I need to feel all this pain with conversion boilerplate, which is not necesary in "parallel world", where Rust's designer did not make unsigned error. I need to create an abstraction to hide conversion problem inside it, but in good-designed language we dont need integer conversions.

What if I dont know, need I to move forward or backward and I need universal type, which can be adding positive integers or subtracting positive integer? Oh, sorry, this type has name "signed integer".
Look like I repeating 500-years old math discovery, when mathematicians designed negative numbers T_T

Every time I need to copy a sprite to big image. Sprite can be partialy moved out from left or top side of big image - it is negative offset.

And I meet the same unsigned problem while implementing Image::nearby. If I write just range x-1..x+1, I will have a runtime error when x is unsigned zero. And compilers dont save me. A typical example of how unsigned numbers create problems from scratch.

Yes, I have earned a lot of problems, and this is the entire point of abstractions?

It is a part of system library, so in this case we are hiding a problem inside deeper level, and we make programmer of upper levels free from conversion problems.

Could you provide an example, please?

I think we can create some rules to fix problems like this. The only problem is a motivation.

Looks like Rust's founders did not look to negative C++'s experience, and now it is too late to fix the problems(

This abstraction is useless in real code.

Yes, I know it, of course, but it is inner realisation. I talk about the interface. The most usable interface is isize size and isize index and isize offset, but in this case my 2d-image library will be inconsistent with other Rust code.

Note that not even for C++ everybody universally agrees with that assessment: https://www.youtube.com/watch?v=Fa8qcOd18Hc

2 Likes

Solving inference issues is an important matter on its own. If you have a good solution it would be greatly appreciated. And to that matter, the possibility of indexing with isize is just another minor motivation. I believe a good starting point is this post by Aria.

People with great motivation and ideas have failed to find an adequate way to address these problems. Please, try yourself to improve it. Indexing slices will most surely be untouched until then.

4 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.