Dollar syntax for Rust


#1

In numerical processing code like you write in Matlab, Python Numpy, Julia and so on, you often slice arrays, both 1D, 2D and more:

https://docs.scipy.org/doc/numpy-dev/user/quickstart.html

In such code you sometimes have complex expressions that contain several slicing operations, so to keep that code readable you really want a compact and noise-free slicing syntax.

Rust offers a readable array slicing syntax, but there’s something I’m missing in Rust, that I’d like Rust to steal from D. But first I show some examples with Python code (Python REPL):

>>> s = "abcdefgh"
>>> s[2:]
'cdefgh'
>>> s[2:-1]
'cdefg'
>>> s[2:-2]
'cdef'
>>> s[-2:]
'gh'
>>> s[-4:-2]
'ef'

In Python the index -1 refers to the last item, -2 is the penultimate, and so on. This is handy because you often enough want to slice taking the end of the array as reference point.

But handling negative numbers like that slows down all the array access, and I don’t like when sometimes, because of a bug I create a negative index and my Python code still keeps running, and giving wrong results instead of raising an out of bounds exception.

The D language has solved the problem efficiently, more safely, and in a sufficiently compact way using the “$” operator. This is D code equivalent to the Python code above:

void main() {
    import std.stdio;
    auto s = "abcdefgh";
    s[2 .. $].writeln;
    s[2 .. $ - 1].writeln;
    s[2 .. $ - 2].writeln;
    s[$ - 2 .. $].writeln;
    s[$ - 4 .. $ - 2].writeln;
}

The outputs is the same (D strings/dstrings/wstrings can be sliced):

cdefgh
cdefg
cdef
gh
ef

$” means the length of the array, but you can’t use it like “a.$”. So the D code:

a[$ - 2]
b[$ - c[$ - 2]]
d[random(0, $)]

Is equivalent to the Rust code:

a[a.len() - 2]
b[b.len() - c[c.len()- 2]]
d[random(0, d.len())]

As you see in the $ refers to the length of the innermost array.

So $ makes the code shorter, but also safer because it’s more DRY, you don’t have to state the name of the array twice.

So I’d like the “$” operator in Rust slices/arrays/vectors too.

In D you can also overload the “$” in library code if you define the “opDollar” method in struct/class, so you can use a nice slicing syntax in library-defined matrices too, for numerical processing:

mat[$-1, 0]

int i = r[$-1, 0];  // same as: r.opIndex(r.opDollar!0, 0),
                    // which is r.opIndex(r.width-1, 0)
int j = r[0, $-1];  // same as: r.opIndex(0, r.opDollar!1)
                    // which is r.opIndex(0, r.height-1)

More info about opDollar usage:

https://dlang.org/spec/operatoroverloading.html

opDollar is used in the ndslice in the standard library:

https://dlang.org/phobos/std_experimental_ndslice.html

https://dlang.org/phobos/std_experimental_ndslice_slice.html#.sliced

You can use it like:

t[3..$,0..3,0..$-1]

That is similar to the more compact NumPy syntax:

t[3:,:3,:-1]


#2

I’ve definitely missed $ from D, myself.

Obviously, though, Rust can’t use $ literally. # is probably okay, given it would otherwise be followed by ! or [.

Another reason to have something like this is that it allows you to do this…

s.some_method_that_returns_a_slice()[1..$-1]

…without having to bind the result to a temporary variable first.

I don’t know about the multidimensional indexing thing, though. Frankly, I’m quite disappointed that Rust has kind of closed that door already for the moment, but you can do it with tuples. Problem with that is that this means Rust’s equivalent to $ can’t do the template trick. I suppose you can make $ return a tuple and index manually, giving you…

t[(3..#.0, 0..3, 0..#.2-1)]

Of course, Rust can’t return non-slices out of indexing, so it’s a bit of moot point.


#3

To clarify what is obvious to our macro overlord, but maybe not to everyone, $ is used in macros, so it could potentially cause issues. That said, I’m not sure I can think of a case where it would be ambiguous; is there a situation where $ could end up being followed by an ident or (?


#4

Actually, I just meant that $ is the “macro pixie dust” symbol. Having it used both at macro expansion and in regular code would make reading macros significantly more painful than it already is.

Macros are hard enough to read, thankyouverymuch. $ is ours. You can’t have it. *holds $ tight*


#5

I’ve experimented with just simply supporting signed range ends, so in ndarray, 1..-1 is a range equivalent to 1..axis_length - 1.

That works well, you can even use for example -3..-1 to pick out the two elements before the last.

The misergonomy comes from the fact that the boundaries are always isize, while lengths and indices that you often use to compute these things are usually usize.

If I add more trait implementations, so that slicing using usize ranges is possible as well as isize, something interesting happens: Then 1..-1 suddenly says that the array does not support slicing by the type i32 (!).


#6

We can just support negative indices the same way that Ruby does with current Rust by implementing Index<Range<isize>> and friends. This does not need to slow down non-negative indices since those will continue to use Index<Range<usize>>. I hope type inference works the way we want it here…


#7

In my experience, type inference does not help you there (see my post). It will be ambiguous if you are using literals (like a range 1..-1).


#8

One papercut in Python slicing syntax when we count from the end of the input array is, we need to choose different syntaxes depending on whether to include the last element or not. For example,

def drop_last(a, n):
  return a[0:-n]

No n (except -len(a)) can have get_slice(a, n) return the entire copy of the a array:

>>> drop_last([1,2,3], 0)
[]
>>> drop_last([1,2,3], 1)
[1, 2]
>>> drop_last([1,2,3], 2)
[1]
>>> drop_last([1,2,3], 3)
[]

We need to change to this syntax: a[m: ]