Step_by with 0 should return an infinite iterator like repeat

I am coding a problem where I want first n values from an iterator having a non negative step.

Currently I have to code it using either for loops discarding iterators altogether, or else if I consider iterators I have to either do a conditional impl Iterator implementation or Box implementation for once where step = 0 and another where step > 0.

I propose two approaches:

  1. Recommended option: If take is used with step_by before collecting the iterator then infinite iterators and 0 as argument of step_by should be allowed. It is a non breaking change, hence should be okay.
  2. Not Recommended option: we have a new step_by_take function with two arguments. The first argument should be the step number and second argument will be maximum how many values to be considered (like take). Both are non negative integers. Whether infinite iterator or whether 0 is passed in step_by this function can work without breaking a sweat.

Rust has unique ownership and non-copyable types, so it's not easy to just return the same element over and over. step_by would have to require a trait bound like Clone or Copy, which currently it does not, and adding it would be a breaking change.

9 Likes

Removing documented panics is also a breaking change.

3 Likes

step_by used to do this. It was intentionally removed.

Why? A couple of reasons:

  • Generally, the standard library would rather panic for an off-by-one error than do something drastically different. Having your program hang because you called .step_by(n) with an n that happened to be zero is not a good outcome.

  • Sometimes being infinite greatly reduces the number of trait implementations that are possible. For example, today StepBy: ExactSizeIterator when the inner iterator is also ESI. That would be incorrect if it can sometimes be infinite. And being ESI in the normal case of why someone would use step_by is far more useful than the repeat behaviour. Not to mention that repeating an item means it needs a Clone bound to be iterable, which would make step_by substantially less useful.

  • Adding extra options like this adds extra runtime cost to people who don't want it. There's extra code to check for zero and to call clone on the item, but it'd be dead code for most users. Not to mention that you'd lose the possibility of storing a NonZeroUsize in the iterator, and thus Option<StepBy<_>> would have to be bigger than necessary.

If you want one of two different behaviours, return an Either of the two different things. The standard library adapter should not be forced to handle it.

9 Likes

Thanks for the suggestion of returning either. It worked like a charm.

I don't agree with your viewpoint and there is one primary reason: If step_by is run on an infinite iterator it will return capacity overflow error during runtime instead of throwing an error during static analysis. If we are allowing such behaviour, then what's wrong with allowing step_by(0)? (0..).step_by(0).take(2) should behave similar to (0..).step_by(2).take(2).

That's not the behaviour I see in this Rust Playground. I use repeat to create an infinite iterator, and step_by does not return "capacity overflow" at any point, even if I remove the take (which just makes it run forever).

Making step_by(0) act like repeat requires two breaking changes, and in many cases turns a predictable panic on a bug into an infinite loop instead.

The breaking changes are:

  1. step_by will have to require that the item is Clone or even Copy, which it doesn't today.
  2. StepBy will no longer be able to implement ExactSizeIterator, since that requires that the resulting iterator is finite sized, but if step_by(0)` is possible, then it'll sometimes be infinite.

This also has knock-on effects if you have step_by more than once in a chain; StepBy only implements DoubleEndedIterator if the iterator it's wrapping implements ExactSizeIterator. As a consequence of #2, StepBy can never implement DoubleEndedIterator if there's a step_by earlier in the chain of iterator combinators.

I get your 2nd point, but 1st point seems off. The 2nd point is kind of similar to that of what @scottmcm mentioned. Abstractions should be zero cost and hence if any conditionals are required they should ideally be passed on to the one using the language to write a program.

Regarding capacity overflow error: I was using it with println! with debug pretty print after collecting it to a vector that's why it was showing capacity overflow which you can see here. I think infinite iterator without take() should be captured as an error during static analysis.

As far as your breaking changes are concerned:

  1. repeat requires Clone but if you check out repeat_with it takes closures and doesn't require the element to have Clone behaviour. So it is evident that since step_by is a method on top of an existing iterator it will work like repeat_with since the iterator is already there for it to use and hence won't need to Clone anything.
  2. Your point is valid. Error Messages need to improve though. Case in point: Rust Playground link. It says step_by and take doesn't satisfy _: ExactSizeIterator. I think the message should be Range<Integer> doesn't satisfy _: ExactSizeIterator instead.

How, exactly, will it work? repeat_with creates an iterator by repeatedly calling a FnMut closure or function that returns a value every time. Please answer with code here, since if it can't call clone(), nor can it rely on the item being Copy, it has no way to get the same item that it just got from next() ever again.

Remember that, by definition, next() returns an item and removes it from the underlying iterator. That item cannot be obtained from the underlying iterator ever again, since it's been removed; to implement your step_by(0) behaviour, StepBy needs to somehow repeat that item.

That's an error from Vec - it's telling you that you've tried to collect more items than it can fit in memory. It's useful to have an infinite iterator without take() in other circumstances, though - for example, if I'm using a loop with break to process the items from the iterator, or if I'm applying a short-circuiting iterator method (like find) where I know that it will eventually find the item.

Also note that Rust does not have the concept of infinite iterators per-se at the moment. It knows about exact length iterators via the ExactSizeIterator trait, and about indefinite iterators in general, where the iterator's size is not known, but you'll know if it's run out of items, but not iterators that will never run out. Anything that requires Rust to reason about infinite iteration will require this to be fixed first.

This is definitely an issue, and I'd make a bug report against Rust for this, because it'd be nice to have the error reporting look deeper and tell you that std::iter::Take<RangeFrom<{integer}>>: ExactSizeIterator would be satisfied if RangeFrom<{integer}> implemented ExactSizeIterator.

I would note that if I remove the step_by(2), I do get an error message telling me why Take does not implement ExactSizeIterator here, so the compiler has the knowledge it needs to give you a good error message, it's just that it's not using it (possibly because there's a risk of it taking a really long time to generate the error message):

error[E0599]: the method `len` exists for struct `Take<RangeFrom<{integer}>>`, but its trait bounds were not satisfied
   --> src/main.rs:5:33
    |
5   |     println!("{}",(0..).take(2).len());
    |                                 ^^^ method cannot be called on `Take<RangeFrom<{integer}>>` due to unsatisfied trait bounds
    |
   ::: /playground/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/range.rs:189:1
    |
189 | pub struct RangeFrom<Idx> {
    | ------------------------- doesn't satisfy `RangeFrom<{integer}>: ExactSizeIterator`
    |
   ::: /playground/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/iter/adapters/take.rs:19:1
    |
19  | pub struct Take<I> {
    | ------------------ doesn't satisfy `_: ExactSizeIterator`
    |
    = note: the following trait bounds were not satisfied:
            `RangeFrom<{integer}>: ExactSizeIterator`
            which is required by `std::iter::Take<RangeFrom<{integer}>>: ExactSizeIterator`

That's because it passes through these properties. Just like if you feed a finite iterator to .map(…) you get a finite iterator, but if you feed an infinite iterator to .map(…) you get an infinite iterator.

Being pass-through for traits like ExactSizeIterator is very useful. If depending on the value it can be infinite, that forces it to never be ExactSizeIterator. And that's unacceptable.

1 Like

Now I know why the decision was taken to avoid step_by(0). I am just clarifying some misunderstandings:

  1. I didn't want step_by to behave just like repeat. I wanted it to accept 0 as an argument. Currently it accepts numeric iterators, and I would like it to stay that way.
  2. Also, I understand that the capacity overflow issue is from Vec, I only wanted improved static analysis if possible. Since you mentioned the short circuits I got inspired to create this thread: Applying take() on an infinite iterator should make it exactsizediterator - language design - Rust Internals (rust-lang.org). Once this is finalized, the static analysis on the iterators can be improved further.

Maybe you can comment on this bug report I opened regarding the error message issue: Confusing error message on using step_by and take on infinite iterator · Issue #123637 · rust-lang/rust (github.com)

What would be the semantics of step_by(0) other than repeat? Remember that step_by(1) is a no-op iterator combinator, since next() on an iterator steps it by 1, and thus step_by(0) needs to do something different, otherwise it's confusing.

Also note that there's an argument that step_by should never have taken usize, but instead should have taken NonZeroUsize as its argument; that would force you to address the error earlier.

And I've seen the other thread and bug report - they both look good in themselves, and you've got traction from people better placed than me to drive it to a good place.

But you said "repeat" in the title of the thread, so I don't follow what you mean by this.

Obviously you didn't mean that step_by is always repeat, but we can't change the types and traits depending on a runtime value.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.