Extend range notation to allow the equivalent to matlab's "1:(end - 5)"


#1

In matlab you can say

my_array(1 : end - 5)

and it will give you all the elements in the array except for the last 5. In rust this is a little bit more verbose, , and is not usable to obtain mutable references (the latter is is not a good reason, as this will be fixed in the future).

&mut my_array[0..(my_array.len() - 5)]

I have a few ideas how to allow such a notation.

  1. The “simplest” one (which cannot work as far as i know, and noone will understand what’s going on, but feel free to prove me wrong):

     my_array[1..-5];
     my_array[99..-5]; // very odd thing, especially if compared to my_array[99..0]
    
  2. Add a generic End helper type:

     my_array[1..End - 5]
    

    This is like the matlab version. The actual type of stop will be derived through the - operator. Requires std::ops::Range to have two type parameters which are comparable. Although this would break consistency of the .. operator since 1..End would actually run to the end, instead of stopping one element before. Maybe this should only work with the (not yet existing for ranges) ... operator. Alternatively (once the inclusive range operator exists), both could be allowed, where 1..End would not include the last element and 1...End would.

  3. Add a generic End newtype:

     my_array[1..End(5)]
    

    Requires std::ops::Range to have two type parameters which are comparable, not sure how intuitivly readable this is.

comments? ideas? reasons this is totally riddiculous?


#2

D supports something along these lines.

Originally, it had magic support for the length identifier. If used within an index or slice expression, it would read the length property of the array. Later, the $ “operator” was introduced, which actually invoked the opLength method on the thing being sliced. So a[0..$] was rewritten to a[0..a.opLength()].

I feel like the Rustiest solution would be (all names bikesheddable) to first introduce a trait for accessing a thing’s length:

trait L: ?Sized {
    type A;
    fn len(&self) -> A;
}

impl<T> L for [T] {
    type A = usize;
    fn len(&self) -> usize { self.len }
}

// ...

Next, introduce a length “operator”. Let’s just use $ for now (although I think # is a slightly nicer choice). Define it such that it walks up the expression tree to the first index expression, then is substituted with a call to L::len on the subject of said index expression.


The alternative to the above is to introduce, into the prelude, some kind of End marker object and define Index overloads that work with that. But that seems like a fairly finicky solution which will require more code for users.


#3

1. is how Python (search for Indices may also be negative numbers) works and the notation is really nice and probably really popular.

Here’s an old proposal to adopt Python like range syntax by a core developer. He states negative should count from the right as well. So you aren’t the first to want this.


#4

@mdinger: it’s only problematic due to the fact that -100…-42 is a valid range in rust, which would be rather ambiguous


#5

@ker: I hadn’t noticed that. Thanks for pointing it out.


#6

This doesn’t seem ambiguous, to me. v[-100..-42] would return a slice starting at the 100th element from the end and ending with the 43rd element from the end (since the range is exclusive). Similarly, v[-5] could be used to directly access the fifth element from the end.


#7

and what would

for i in -100..-42 {
    println!("{}", i);
}

do in your opinion? (playpen does what i think it should: http://is.gd/sqgJhJ)

i still think using negative numbers would create more confusion than it’s worth. rust does lots of things explicitly, why not this, too?


#8

I don’t see any ambiguity between for i in -100..-47 and s[-100..-47]. One could simply implenent Index as something like this (ignore the syntax details):

impl<T> Index<Range<isize>> for [T] {
    fn index(&self, range: &Range<isize>) -> &[T] {
        let start = if range.start < 0 { 
            range.start + self.len() 
        } else { 
            range.start 
        };
        let end = if range.end < 0 {
            range.end + self.len() 
        } else { 
            range.end 
        };
        self.slice_from_to(start, end)
    }
}

That said, I still prefer an explicit length operator than negative index because it is (1) more explicit, (2) more flexible like allowing s[..$/2], (3) does not require polymorphic indexing yet.


#9

the length might not be known while the end of the range might be, that would speak against an length operator. also in non-indexing ranges the operator makes no sense.


#10

Just define a

#[lang="end"]
trait End {
    type Output;
    fn end(&self) -> Self::Output;
}
impl<T> End for [T] {
    type Output = usize;
    fn end(&self) -> usize { self.len() }
}

and then desugar any $ inside an indexing context to a call to End::end():

g(s)[f($)] => { let ref _temp = g(s); _temp.index(f(_temp.end())) }

It is already explained in Daniel’s reply above.


Edit: Just to clarify, the $ above can be replaced by anything, e.g. “End”, in which case it would be the same as OP’s suggestion in syntax, except that the compiler will automatically convert it into a suitable number. We cannot use the symbol $ like D because this is already used for macros.


#11

I’d expect it to behave exactly as it does. I’m not sure where the ambiguity is to which you referred. Given the concept that negative numbers count back from the end, having -100..-42 yield -100 through -43, having v[-100..-42] return a slice containing the 100th element from the end through the 43rd element from the end, having v[-100] return the individual element that is the 100th from the end, and having

for i in -100..-42 {
    println!("{}", v[i]);
}

give the same output as

for v in test[-100..-42].iter() {
    println!("{}", v);
}

all seem perfectly consistent.


#12

FWIW, I generally appreciate this functionality in Python, but it sometimes causes problems for me due to the possibility of getting negative integers by accident. For example, a lazy way of extracting a given substring plus all following text from a string would be s[s.find('sub'):] – this works fine if the substring is there, but if it’s not, find returns -1 and the expression silently extracts the last character. (I was ignorant to write code like that, since I should have used the index method, which does the same thing but throws an exception if the substring wasn’t found; but there are other cases too, just not as obvious.)

In Rust, I’d be moderately afraid of large unsigned integers becoming negative after being truncated to a signed type for indexing, though that would depend on the design.


#13

The only problem with negative numbers meaning index-from-end would be the inconsistency when operating on a map with signed integer key type:

fn collections(v: Vec<&str>, m: HashMap<i32, &str>) {
    println!("{}", v[-1]); // the last element
    println!("{}", m[-1]); // element whose key is the number -1
}

The other option is to use a special type to describe an index:

enum Index {
    FromBeginning(usize),
    FromEnd(usize),
}

fn function(v: Vec<&str>) {
    v[FromEnd(0)] // last element
}

I’m in favor of using negative numbers unless someone can come up with a syntax sugar for specifying FromBeginning(n) and FromEnd(n). Maybe prefixing a number with # would turn it into a FromEnd variant?


#14

Our vector doesn’t even allow sizes greater than isize::MAX if I understand correctly, which makes this all the more tempting. cc @Gankro

Rust as a language does already support the notation and first-class range values to make this happen, so it seems like an API question.


#15

wow, i didn’t know that. not that it matters on 64bit machines. might be an issue for regular arrays on 8bit machines. But that could be fixed with a lint.


#16

This is an implementation detail which will become a public API if we expose negative indexes.


#17

It’s basically not Rust’s choice, it comes from LLVM: (rust bug/discussion)