Subscripts and sizes should be signed

delfigamer · November 10, 2022, 7:36pm

No, only Python does that. In Haskell, [1,2,3] !! (-1) raises an exception, while take (-1) and drop (-1) do nothing. Vectors, ByteStrings and Texts behave similarly. This is literally what OP wanted all along, except that Haskell is not a systems programming language, unlike Rust and C++.

T4r4sB · November 10, 2022, 8:32pm

No, other people would not will any difference. Most users use indices only as unique ids with only increment operation.

How did you handle negative offset of image?

Yes, bit with signed indices I have to only write simple code and nothing more.

Why performance of unoptimized operation is an avantage?

How can I wrap built-in array and slices?

It is not actual while Rust base on LLVM which support only one negative number representation. Do you know any modern processor with another representation?

Yes, but panicing program is not good. When program just works correctly, it is much better. And when program panics, I need at least one more iteration for fixing the bug. And the only reason of number underflow panic is usize type. With isize type I would not have such problems.

Also

This is not ideal. Let we have 256x256 chessboard. I will use isize type for coordinates, isize type for figure offsets (for example, I need pairs (-2,-1) etc to describe knight moves), and isize type to check that figure did not moved out from the board. I will write simple code without conversions. You tell me, that it is better to use u8 for figure coordinates, because this type exectly represents all possible coordinates. And use i16 for offsets (because it is the smallest type whic can describe numbers from -255 to 255), and use some common wide type for intermediate result, when u need to count coordinate+offset. And game logic in your code will looks the same as logic in my code, but your code will have a lot of type conversions. So what's the purpose?

Is index were isize, it is no hard to add a warning, that Rust is not a Python. Also -1 is invalid index of array in C# and C and Java too, so it is rather common behaviour.

How can I make a wrapper around built-in arrays and slices?

CAD97 · November 10, 2022, 10:47pm

(I use pseudo-Rust shorthand in this post.)

C doesn't really have arrays. Well, it does, but it doesn't have array indexing. C barely has pointer indexing. In C, a[b] is just pure syntax sugar for *(a + b), no more and no less. This applies even for actual array variables; it's just that the array variable immediately turns into a pointer when you name it in almost any context.

This is a general thesis of strongly-typed program design. By naming more distinct types, you know what the semantics of a value of that type is, not just what the value is. Primitive obsession is certainly convenient (especially without some easier way to delegate functionality), but using more descriptive types will typically lead to clearer code, and allows you to introduce functionality tied to your domain rather than the general type.

For example, in the chess example, rather than a single (isize, isize), you could construct a more expressive model perhaps similar to

pub struct ChessBoard {
    raw: [[ChessSquare; 256]; 256],
}

pub struct ChessPosition {
    pub rank: u8,
    pub file: u8,
}

pub struct ChessMove {
    rank: i16,
    file: i16,
}

fn ChessMove::new(i16, i16) -> Option<Self>; // ensure in -255..=255
fn ChessPosition::checked_add(delta: ChessMove) -> Option<ChessPosition> {
    let rank = i16::from(self.rank) + delta.rank;
    let file = i16::from(self.file) + delta.file;
    Some(ChessPosition {
        rank: rank.try_into().ok()?,
        file: file.try_into().ok()?,
    })
}

impl Add<ChessMove> for ChessPosition;
impl Sub<ChessMove> for ChessPosition; // undo
impl Index<ChessPosition> for ChessBoard;

On the other hand, position and offset like this, even when logically bounded, is very typically just represented as a general vector type struct vec2f { x: f32, y: f32 } (or sometimes struct vec2i { x: i32, y: i32 }). More structural typing is definitely more convenient to work with, and even Pure Math isn't always perfect about keeping distinct units distinct^[1].

As with most things, it's entirely a tradeoff between flexibility, convenience, guarantees, and a lot of other properties that you may or may not care about. The standard library's job as the standard library available to everyone is to provide a shared baseline of generally applicable and relatively noncontroversial functionality. If this thread has made a single thing clear, it's that signed indexing is not.

I personally don't care all that much if primitive slices allow Index<i__> — in the cases where it legitimately matters, I'm likely going to use custom types anyway. There are benefits to using signed indexing, the primary one being uniform behavior for exceeding the valid range on either end. You can certainly argue that this consistency outweighs the costs^[2] and loss of the purity of monodirectional unsigned indices.

But at this point, Rust std should imho definitely stick to just unsigned indexing for slices. Whatever the benefits are one way or the other, the churn of making such a change to a fundamental type far outweighs any potential benefit. (Plus, we're trying to convince people used to C's near-unchanging stability that Rust is a stable choice for them to consider.)

There's three main ways you could go about doing this.

Custom index type

#[derive(Copy)]
struct Idx(isize);
fn Idx::from(isize) -> Self;
fn Idx::into_raw(self) -> isize;

impl<T> Index<Idx> for [T];
impl<T> Index<Range<Idx>> for [T];

impl Add<isize> for Idx;
impl Sub<isize> for Idx;

// etc and any other helpers you want

If you could implement SliceIndex on stable, this would be the preferred solution if you're not adding semantics to the array/slice types beyond being able to index with negative indices. On stable, you'll need an extension trait or other helper function if you want .get.

Single wrapper

#[repr(transparent)] // important
struct Indexable<T: ?Sized>(T);

fn Indexable<T: ?Sized>::from(T) -> Self;
fn Indexable<T: ?Sized>::from_ref(&T) -> &T;
fn Indexable<T: ?Sized>::from_mut(&mut T) -> &mut T;
// pointer casting for conversions is sound because of repr(transparent)
fn Indexable<T: ?Sized>::into_inner(self) -> T;
fn Indexable<T: ?Sized>::as_inner(&self) -> &T;
fn Indexable<T: ?Sized>::as_inner_mut(&mut self) -> &mut T;

impl<T: Index<usize> + ?Sized> Index<isize> for Indexable<T>;
impl<T: Index<Range<usize>> + ?Sized> Index<Range<isize>> for Indexable<T>;
// etc and any other helpers you want

Specific wrappers

struct Slice<T>([T]);
struct Array<T, const N>([T; N]);

// conversions, indexing, helpers as previous

I personally wouldn't use this approach unless there are additional semantics enforced beyond just the new indexing impls.

They all have things they're better or worse at than the others. All of those can be delt with other than the fact that you're using a nonstandard type, so have to deal with conversions at the edge when you're communicating to other code which talks the standard types.

The fun problem child to pick on is how the linear algebra cross product of two 3D vectors is itself a 3D vector, but kinda not quite, actually, if we're being specific, it's a pseudovector (or an axial vector), because it behaves differently (e.g. to transform a plane-normal pseudovector you use the inverse transformation matrix from what you would use for a regular vector). If you look to geometric algebra, you'll see that there they define a new nominal type called the "bivector" which is isomorphic to a plane-normal pseudovector, which also then generalizes to other dimensionalities (2D, 4D+). But that's getting way into math nerd weeds. (But rotors are imho a beneficial generalization and recontextualization of quaternions (and complex numbers for rotation in 2D). I am the math nerd.) ↩︎
TBQH, the ability to have and index arrays/vectors of ZST with more than isize::MAX elements (which is a concrete loss to using isize indices) doesn't matter all that much. You could even branch on the constant size_of::<T>() == 0^[3] to maintain support at the cost of a slower check (which is very hot for sized T but comparatively rare for ZST). ↩︎
This branching / specialization is actually already done for slice iterators, as they use a { begin: *T, end: *T } representation and guaranteed-inbounds pointer offsetting for sized T, but this breaks for ZST as the entire array is zero sized. For ZST, end is begin.wrapping_byte_add(len), and popping the next element from wrapping_byte_sub(1)s end rather than incrementing begin. ↩︎

elidupree · November 10, 2022, 10:48pm

I don't disagree with the overall point, I'd just like to point out: a literal v[-1] could presumably be made to trigger the unconditional_panic lint, which is deny-by-default, meaning that you'd get a compile error (as you want) even if it typechecks correctly.

T4r4sB · November 11, 2022, 11:39am

CAD97:

 let rank = i16::from(self.rank) + delta.rank;
    let file = i16::from(self.file) + delta.file;
    Some(ChessPosition {
        rank: rank.try_into().ok()?,
        file: file.try_into().ok()?,
    })

I will use isize only and without conversions this code is much easier

fn valid(file: isize, rank: isize) -> bool {
  file >= 0 && file < 256 && rank >= 0 && rank < 256
}

fn ChessPosition::checked_add(delta: ChessMove) -> Option<ChessPosition> {
  let rank =self.rank + delta.rank;
  let file = self.file + delta.file;
  Self::valid(rank, file).then(|| ChessPosition{rank, file})
}

Also it is easy to add method try_move_to, which check that position is inside chessboard. So I still dont see advantage of bounded types approach.

Custom index type
Single wrapper
Specific wrappers

Thanks a lot, may be I will use some of there work-arounds...

scottmcm · November 11, 2022, 7:49pm

Consider also poking libs-api about potentially stabilizing https://doc.rust-lang.org/std/primitive.u32.html#method.checked_add_signed and friends if you need addition of signed things to unsigned things with overflow checking.

delfigamer · November 12, 2022, 1:21pm

This looks rather similar to another idea that sometimes happens in game development, 3D graphics and nearby areas: instead of a single Vector3 type for both absolute and relative positions, instead use two different types Point3 and Displacement3 with additional restrictions on available operations (e.g. you are not allowed to add Point3s together or multiply them by scalars). This looks like a good idea on paper, but once you start to actually use it in real code, it quite quickly falls apart - which is the reason why in practically all game engines, there is only a single Vector3 primitive that fulfills both roles; and when you need to distinguish between the two - for example, in homogeneous transformations - it's done by having multiple functions, such as TransformPoint versus TransformVector, rather than by having multiple overloads of a single function.

Ryan1729 · November 12, 2022, 6:00pm

This phrasing makes it sound to me like you feel that the actual distinction doesn’t make sense, rather than it sometimes is inconvenient to spell out the distinction all the time. If you really feel like the distinction doesn’t make sense, then I am curious why that is so.

My take on why most game engines end up with a single Vec3 type is that it occasionally adds some friction to maintain the distinction, and many users of game engines want to copy-paste vector code from other sources which historically have not maintained the distinction, without needing to figure out which type a given Vec3 ought to be.

Additionally, when using a library, it is quite annoying if operations are missing. For example, subtracting Point3s to get a Displacement3 is a valid and useful operation. Having it all be the same type means that operations that happen to be the same computation don’t need to be defined twice, which increases the chance the library includes them.

For myself, I have had enough bugs that involved a lack of attention to this distinction that I have begun using distinct types in my code. And after some initial friction, like defining constants to be the wrong type and therefore having unnecessary conversions, I now prefer it that way. But I mostly do 2D stuff, so maybe it becomes more annoying in 3D, or something.

kpreid · November 12, 2022, 8:14pm

In my experience, the reason why point/vector distinction can feel awkward in game programming is that there is often a hierarchy of different coordinate systems (scene graphs with each node potentially having a transform; rooms within a larger map; vertices within a mesh/collider vs. location of the object in the world), so that what is from one perspective a point, “location of this object”, is from another perspective a translation vector, “translation of this mesh's vertices”. So you end up wanting to interpret points as vectors a lot. For myself, I think that there is some value in marking that that's happening, so I don't mind using points and often converting them to vectors to add them to other points, but there are definitely cases, particularly in the middle of algorithms, where there isn't an obviously correct answer to “is this a point or a vector”.

It might be interesting to try going farther and statically distinguishing points as belonging to different spaces using a marker type parameter (insofar as that's possible — you can't do that with a runtime-built transform hierarchy, of course), to capture more of the distinctions that actually exist in the domain, and see whether any value comes out of that.

scottmcm · November 12, 2022, 9:23pm

One interesting trick here is to use const generics for the point/vector distinction like in homogeneous coordinates. So you have a newtype like

#[repr(transparent)]
struct H3<const D: i32>(f32x3);

with a couple convenience aliases

type Vector3 = H3<0>;
type Point3 = H3<1>;

so you can add points together, getting H3<2>s. That way you can do p0 + p1 - p2 and it'll give a point, exactly like you wanted, without needing to add parens as p0 + (p1 - p2) like you would if you only had points and vectors. Similarly, -p compiles and gives an H3<-1>, which can be added to another Point3 to get a Vector3 again.

(And because it's a transparent newtype you can always O(1) convert buffers between types if needed.)

delfigamer · November 12, 2022, 9:45pm

No, I'm talking specifically about the inconvenience. From the mathematical point of view, the distinction is clear and easily justified: in homogeneous coordinates, points have their w coordinate set to 1, while displacements are w=0.

elidupree · November 12, 2022, 10:36pm

Unfortunately, there are still inconveniences even with this. Suppose I want to take the center of mass over a runtime-sized collection of Point3s; I would want to add them up, and divide by the length. But that would mean the type parameter has to be a runtime variable!

Naturally, this is solvable using dependent types (where you can simply prove that you divide your way back down to 1 at the end, so the type parameter can still be erased a runtime.)

(And you can technically work around this issue by subtracting the origin from each point to get a vector, then adding the origin back in at the end, which can likewise be erased at runtime. Although that arguably defeats the purpose of having a separate Point type.)

But this illustrates a general problem/trade-off, where when you start using strong types, you sometimes run into problems that can only be solved with even stronger types. I would love to have a language with full dependent types that is as convenient as Rust, but the programming field just isn't there theory-wise or ecosystem-wise yet (and so having Rust choose a weaker system is completely okay!).

…of course, all of this is a tangent to the main point of the thread, since the inconveniences with signed/unsigned conversions ARE solvable and we are steadily making progress on solving them.

delfigamer · November 12, 2022, 10:49pm

Under floating-point arithmetic, no, it can't - float addition is not associative. Case in point:

(1e50 + (-1e50) + 3) / 3 = 1
((1e50-1e50) + (-1e50-1e50) + (3-1e50)) / 3 + 1e50 = 0

elidupree · November 12, 2022, 10:50pm

I'm pretty sure adding exactly zero is still a no-op?

scottmcm · November 12, 2022, 10:57pm

Yeah, that's why I mentioned that it should be a newtype over a more primitive f32x3 -- it means it's always sound to just turn your &[Point3] into a &[f32x3] when you need the escape hatch.

Floating point is terrible. According to LLVM (https://rust.godbolt.org/z/qvMfWsMjM), subtracting zero is a nop, but adding zero isn't. (Probably has something to do with signed zeros.)

delfigamer · November 12, 2022, 10:57pm

At this point, you could as well just coerce Points into Vectors.

Or, another idea - do away with the distinction and just use Vectors everywhere.

elidupree · November 12, 2022, 11:01pm

Yeah, that's the point I was making - "you can technically bail out by converting them explicitly, but that kinda defeats the purpose of the distinction".

I just phrased that as "subtracting the origin" because that's mathematically equivalent (as long as you don't actually use floating-point operations for that mathematical subtraction, I guess).

scottmcm · November 12, 2022, 11:09pm

Interestingly, you can also solve the problem by changing the algorithm.

As @delfigamer pointed out, adding them all up is actually a poor way to calculate the average in floating-point, since you start introducing magnitude differences. If you use the well-known online algorithm, though, the types work out great in μᵢ = μᵢ₋₁ + (xᵢ - μᵢ₋₁)/i.

But also, I just don't think "you need an escape hatch sometimes" is so bad. We're in Rust, after all, with a great big unsafe escape hatch, and still find the type safety of safe code useful.

elidupree · November 12, 2022, 11:19pm

Yeah, this is firmly a category of "waaah, that is only a pragmatic approach when I wanted it to be a perfect one"

delfigamer · November 12, 2022, 11:39pm

From my experience, the programs don't have to be perfect, their main purpose is to do the job. As you keep pouring more effort into a single program, you start getting ever diminishing returns - it takes a lot of resources to find a flaw in something already nearly perfect, and there is only a small benefit to fixing a use case that almost never comes up in practice. At a certain point, trying to improve a program becomes a net negative, spending time on the program loses you more than the value that you gain - and this is the point where a sane person should stop and move on.

When we talk about the signed/unsigned debate, I think of it in the same framework - what is the value gained versus the value lost; and I think you should do the same.

Perhaps, @T4r4sB should collect all of the points raised in this thread in a table and mark each of them, whether it's a gain or a loss, and how significant it is in comparison with others.

Topic		Replies	Views
Subscripts and sizes should be signed (redux) language design	41	1700	November 9, 2023
Revisiting the unsigned ideas (deprecated)	5	2062	March 25, 2019
The problem with array/slice/vector indexes language design	14	9771	March 25, 2019
Unsigned integer type inference language design	24	4422	March 25, 2019
To improve usize (and isize) handling in Rust language design	6	1328	September 19, 2020

Subscripts and sizes should be signed

Related topics