step_by on negative numbers

We recently turned down a request to stabilize step_by on the grounds that the behavior on negative steps needs more discussion and thought. I want to kick that off :smile:

The current behavior was inherited from various free functions, prior to the range notation. The typical usage is: (100..0).step_by(-1), which is inclusive on the left an exclusive on the right – i.e., you get 100, 99, 98, … 1, but not 0. This is of course distinct from reversing the iterator (0..100).step_by(1).

I’m genuinely unsure what the best option is here. To me, the range (100..0) is already quite confusing; if you read it in mathematical style, it’s an empty range.

A possible alternative would be for range.step_by(-x) to be equivalent to range.step_by(x).rev(), but I suspect that may also be counterintuitive to some.

So I’m interested to hear people’s intuitions, to hear about common usage patterns, and talk about how other languages/libraries do things. It’d be good to nail this down so that we can stabilize the method.

1 Like

In Python, range(a, b) returns empty if a > b:

>>> range(1, 0)
[]

I find this is both intuitive and a good, established convention to follow.

4 Likes

This would be really confusing, either you have 100..0 which would be a reversed range on it's own. Or you use (0..100).set_by(-1). But using both is just plain confusing, I read it as reversing a reverse range. So when seeing (100..0).step_by(-1) I would expect the same as (0..100).

But maybe i'm overthinking it..

Edit: when I compared (100..0).step_by(-1) and (0..100) I forgot about 100 being inclusive and 0 exclusive in the first and the other way around in the second. But the point I tried to make still stays the same :wink:

1 Like

I think I’ve prefer to just rely on using rev() to step backwards over a range instead of encoding that behavior into the range itself. Was there a drawback to this approach?

Nothing prevents you from providing a negative step on a signed range, though, so we have to give it some semantics. The question is, what should that semantics be?

You could imagine (0..100).step_by(-1) producing 0, -1, -2, ... and ultimately overflowing, for example. Maybe that’s reasonable.

Hrm email reply got eaten by the discourse gods.

Personally I would expect -n steps to have a the behaviour of rev().step_by(n), or be a logic error:

  • (0..10).step_by(7) should yield: 0, 7
  • (0..10).step_by(7).rev() should yield: 7, 0
  • (0..10).rev().step_by(7) should yield: 9, 2 (this is not possible to instantiate due to blanket Rev)
  • (0..10).step_by(-7) should yield: 9, 2 (or be an error)
3 Likes

+1 It is what I would expect too

Personally I would expect -n steps to have a the behaviour of rev().step_by(n), or be a logic error:

  • (0..10).step_by(7) should yield: 0, 7
  • (0..10).step_by(7).rev() should yield: 7, 0
  • (0..10).rev().step_by(7) should yield: 9, 2 (this is not possible to instantiate due to blanket Rev)
  • (0..10).step_by(-7) should yield: 9, 2 (or be an error)

So if step_by has the ability to reverse an iterator, does that mean that step_by should enforce DoubleEndedIterator? I think that is one of the issues, because right now it isnt, and that is nice.

Maybe im just wholely wrong about my understanding, but wouldnt that disallow something like (0..).step_by(3)?

2 Likes

Why not prevent it? That is, define Step as follows:

trait Step {
    type Step;
    fn step(&self, by: &Step) -> Option<Self>;
    fn steps_between(start: &Self, end: &Self, by: &Step) -> Option<usize>;
}
impl Step for i32 {
    type Step = u32;
    // ...
}

This definition as quite a bit more flexible. For example, we could step through the alphabet:

impl Step for char {
    type Step = usize;
    // ...
}

Note: Step could also be defined as:

trait Step<By> { ... }

to allow stepping by different types (but I'm not sure if this is worth the extra complication).

1 Like

Then in the mathematical style, 0..100 should be an uncountable range. It's implicitly (0..100).step_by(1) but conceptually it could just as easily be (0..100).step_by(0.1) or (0..100).step_by(10) by default. The fact that the step can't be 0.1 is likely an artifact of the type system. There's nothing about real numbers which would inhibit this behaviour.

So if 0..100 only makes sense with an implicit positive step, would it be that big a stretch to use 100..0 where step(-1) is also implicit? As a range that seems perfectly clear (I'm not suggesting making step_by(-1) implicit. It's unusual enough that making it explicit seems like a good idea). Conceptually this shouldn't be difficult.

It's also very common. In matlab (actually octave the matlab clone) it looks like (you can even use decimals):

0:3          // 0-3
0:1:3        // same
3:1          // empty array []. Ranges go up unless a step is added.
3:-1:0       // 3-0
1.8:0.2:2    // 1.8 2.0
2.0:-0.2:1.8 // 2.0 1.8

In rust:

// Make life easy: 10 and down except last element
(10..0).step_by(-1)

// Make life hard: 0-9 but it's backwards so 9-0...does
// the step agree? Oh, good it does. Drat. Wanted 10-1...
(0..10).rev().step_by(-1)

Also, especially if floats are supported, this gets harder with complex equations:

let start = // really complex equation
let end = // another really complex equation
let step = // complex equation

// Gotta think abstractly now:
// So `start` goes to `end`-1 but it's reversed...so
// `end`-1 goes to `start`. Will my `step` still always
// agree in direction?
(start..end).rev().step_by(step)

Floats should never be supported, in my opinion.

3 Likes

I think this is correct, and is really the only way that makes sense to me. You start with the initial value, and then keep adding the step until you pass the end value.

My take on the result of iterating over some of the examples given:

  • (100..0) (yields no values since 100 is already past 0 with default step of 1)
  • (100..0).step_by(-1) → [100, 99, 98, …, 1]
  • (0..10).step_by(7) → [0, 7]
  • (0..10).step_by(7).rev() → [7, 0]
  • (0..10).rev().step_by(7) → Compile error (step only applies directly to Range)
  • (0..10).step_by(-7) (yields no values since 0 is already past 10 when stepping by a negative number)
  • (10..0).step_by(7)
  • (10..0).step_by(-7) → [10, 3]
  • (0..-10).step_by(-7) → [0, -7]

I think negative stepping is useful in addition to rev() specifically because it makes it easy to have the higher, starting number be inclusive and the lower, ending number be exclusive.

3 Likes

The current behavior makes sense to me.

Some particular intuitions: I expect (0..).step_by(1) to iterate through the non-negative integers and (-1..).step_by(-1) to iterate through the negative integers.

2 Likes

How do other languages do this?

  • Looking at my C++ reference, I did not immediately find ranges with a negative step, I would actually use C’s for loops in those cases
  • Java has no concept of a range, though third party libraries like guava or Apache commons-lang define them (though the former has no ordering and the latter has no iteration capability). Otherwise, the same C-for loops apply
  • Python obviously has xrange(from, to, step) where from is always inclusive, to always exclusive, step defaults to 1
  • OcaMLhas a library that includes “–” and “–^” operators for ranges (ex- and inclusive, I could not find anything about stepping)
  • Haskell has a somewhat weird mathematically-inspired syntax (link goes to learnyouahaskell) which is so not going to fly in Rust

Python seems the conceptually nearest language (since they also have no C-style for loop).

I think negative stepping is useful in addition to rev() specifically because it makes it easy to have the higher, starting number be inclusive and the lower, ending number be exclusive.

this is exactly how I'd expect it. I've written loads of matlab code, and I always found this very intuitive.

well... I agree that the mathematical [100, 0) is an empty range. But rust's a..b ranges aren't mathematical, they are iterators.

that'll be problematic in case you have variables. a..b could then either be a forward or a reverse range, which is impossible to program afaik.

An iterator always goes from start to end. If start is "not beyond or at" end it returns None, else it returns a copy of start and sets start = start + step. The implementation detail of "not beyond or at" is < right now. Imho that implementation detail should not leak into the semantics of the iterator.

I can see negative ranges being both intuitive (at least to mathematicians) and a footgun.

It could be useful to create a lint (e.g. in rust-clippy) that warns on ranges of the type [x, y] where x >= y and no negative .step_by(_). This would at least catch obviously unintended cases. Also the docs for ranges should warn of the possibility of empty ranges.

Finally, it would be good to have a lint that denies calling .step_by(0) (and I believe .step_by(0) should also panic, currently it happily creates a StepBy{ step_by: 0, range: self }.

Edit: I just added two issues to rust-clippy.
Edit 2: Also filed an issue on rust because of infinite looping behavior
Edit 3: And got shot down on the rust part, as .range(0, 1).step_by(0).take(10000) does not loop forever and even an infinite loop is not deemed unsafe, so it doesn’t warrant to panic!(). Still, at least a message, perhaps behind some kind of flag would be nice.

2 Likes

Yeah there is definitely a problem of bounds (being excluded and included)

  • a < b: (a..b) you would get [a, a+1, ..., b-2, b-1] eg. (0..5) ==> [0,1,2,3,4]
  • a > b: (a..b) you would get [a, a-1, ..., b+2, b+1] eg. (5..0) ==> [5,4,3,2,1]

But i'm not particularly in favor of this notation, I just wanted to make clear that (5..0).step_by(-1) ==> [5,4,3,2,1] is extremely confusing because there are two separate things that "say" reverse. Just imagine we use this (5..0).step_by(-1) then, what would just (5..0) mean and what would just (0..5).step_by(-1) mean? It is not very intuitive in my opinion.

I am much rather in favor of:

  • (5..0) ==> None
  • (0..5).step_by(-1) ==> [4,3,2,1,0] being the equivalent for (0..5).rev().step_by(1)

Why would that make sense?

Because of chaining methods. (a..b).step_by(c) is supposed to be a two step process. First you have (a..b) returning a range on which you chain the method step_by(c). step_by should thus not modify the bounds of the given range. And with this reasoning it makes sense that step_by(-c) is shorthand for rev().step_by(c). The minus would have the meaning of backwards iteration.

note: I now see why (100..0).step_by(-1). But I was quite confused at first, and still think it's not the most intuitive.

That would mean .step_by(_) creates an iterator that does not start at the starting point. As we have it, ranges are defined to be left-inclusive / right-exclusive. This has some very nice properties that E. W. Dijkstra used to harp rhapsodically about in his famous paper.

Now you want .step_by(_) to nix this property for negative values, which is a special case in my book. Should we handle it that way, it would in my not too humble opinion require at least a paragraph in the documentation. I’d suggest the docs should note the (non-)relation between .rev() and .step_by(_) with negative argument in any event.

1 Like

For me, the most important part is codegen. We need to iterate on step_by until nobody can say it’s a trap or slower than a for loop in C++. Then try to fit as many features into that straightjacket as possible :smile:

4 Likes