How do we actually want RangeFrom to behave?

The behavior of RangeFrom is inconsistent when near the upper edge of representable types. This is documented as unspecified, but I think the current behavior is suboptimal due to its inconsistency.

Note : Currently, no overflow checking is done for the Iterator implementation; if you use an integer range and the integer overflows, it might panic in debug mode or create an endless loop in release mode. This overflow behavior might change in the future.

Specifically, of the implemented Iterator methods, currently:

  • Iterator::size_hint/TrustedLen promise an endless iterator.
  • Iterator::next differs depending on debug/release:
    • In release mode, it wraps (253u8, 254, 255, 0, 1, ...).
    • In debug mode, it panics if it would return Idx::MAX (253u8, 254, panic, panic, ...).
  • Iterator::nth differs depending on debug/release:
    • In release mode, it wraps, but also panics if Idx::MAX would be skipped over.

[playground example]

The "perfect world" behavior would be to consistently yield all values and then panic on further steps (253u8, 254, 255, panic, panic, ...). Unfortunately, this is strictly impossible, as the contents of RangeFrom are fully public and stable, and this behavior would require another bit of state.

pub struct RangeFrom<Idx> {
    pub start: Idx,
}

So, how might we actually improve this situation? What behavior do we want out of RangeFrom at the upper edge of representable types? My ideal priority list is

  • Consistent behavior between RangeFrom::next and RangeFrom::nth. There should be no difference between calling range.nth(n) and for _ in 0..n { range.next() } range.next().
  • Consistency of debug/release split. Debug-checked overflow is consistent with the rest of the language, but consistent behavior from the iterator is also great (i.e. it'd be great if (0u8..).take(256) would work as expected both in debug and release).
  • No undue burden on custom types to be used in ranges (i.e. the Step trait). Ideally Step is implementable with just successor and predecessor functions.

The third is a loose and easily droppable goal, especially because Step is unstable and probably far from stabilization.

The possible behaviors that meet the first goal of next/nth consistency I see are:

  • next is right. Make nth always wrap in release mode, rather than panicking when jumping over Idx::MAX (consistent debug-checked overflow). This is also the smallest change to behavior, as it only makes a release mode panic of nth not panic.
  • Always wrap, even in debug mode, changing the behavior of both next (in debug mode) and nth (in both modes).
  • Always panic like next does in debug mode. This changes release mode behavior of next and nth to deterministically panic when yielding Idx::MAX rather than setting up for a wrap.
  • Consistent saturating behavior (235u8, 254, 255, 255, ...). This is the most invasive change as it impacts next and nth in both debug and release (but tbf only in cases where debug mode panics).

So what behavior do you think is the most correct? Do you see some potential behavior I overlooked?

(I'm currently leaning towards embracing debug-checked overflow and having nth do wrapping in release mode. Whatever direction is chosen impacts the shape of the Step rewrite I'm currently working on, as it has to support the behavior.)

1 Like

I prefer debug-checked overflow, which seems more in spirit with the rest of Rust.

1 Like

There's another reasonable behavior for RangeFrom: just stop at the maximum.

(253u8..) would yield [253, 254, 255].

1 Like

Yes, but, I think his point is what is meant by "Stop"? Does it mean panic if asked for next one? In release? In Debug? Does it mean yield the final value again? In Release? In Debug?

It means stop: .next() would return None and iteration would stop, in both debug and release mode.

That's the behavior I would expect and find least surprising, personally.

In other words, (253u8..) would behave like (253u8..=255u8)

6 Likes

@josh, how would you achieve this? Keep in mind that the definition of RangeFrom is fully public and stable:

pub struct RangeFrom<Idx> {
    pub start: Idx,
}

Consider RangeFrom<u8> as a case example. If we want 0u8.. to behave the same as 0u8..=255u8, there are 257 possible results from calling .next(): each value of Some(u8) as well as None. There isn't enough state to have 257 states in RangeFrom, so the exhaustion behavior is impossible for RangeFrom.

Even beyond this, there is semantic difference between 0u8.. and 0u8..=255u8: the former is an unbounded range (thus would keep going if the integers were infinite range or a larger data type were used) but the latter is actually bounded on the top, and so should stop no matter the integer type in use.

I think the theoretical perfect is for start.. to be [..., MAX - 1, MAX, panic, panic, ...] and start..=MAX to be [..., MAX - 1, MAX]. It is most likely an error to go past MAX in an top-unbounded iterator, just like regular overflow is most likely an error.

2 Likes

Just a quick reminder: debug-checked overflow has an appearance of "off by one" panicking. After all, (255u8..).next() just gives 255, which isn't overflowing the space of u8! The overflow is in (necessarily) preparing for the next invocation of next(), which can be non-obvious why the panic is earlier than the ideal situation of [253u8, 254, 255, panic, panic, ..].

2 Likes

I think it was a mistake to implement Iterator for RangeFrom, and instead IntoIterator should have been implemented with an iterator type that behaves either like @josh argues, or that returns i64/u64 or i128/u128 values up to the maximum.

Changing this is a breaking change, but simple for loops won't be broken, so maybe it can be done?

Other than that, always panicing seems the only viable choice, since it doesn't silent produce wrong results.

EDIT: on a more broader note, the root cause is that Rust's treatment of integers is mathematically inconsistent: a type like u8 can either be considered as the ring Z/256Z, which is closed under addition and multiplication but doesn't have an ordering, or as a subset of the integers Z, which has an ordering but is not closed under any arithmetic operation; RangeFrom as is requires both an ordering and being "computationally closed" under addition by 1, which is impossible without a bignum type or a type that is as large as the maximum running time of a program (i.e. 64-bit at least)

11 Likes

I'm personally of the opinion that Range* types should've not implemented Iterator themselves and been Copy and IntoIterator, but that's neither here nor there. These types are stable, so we cannot change them in a semver-major way. It's not a soundness fix, just a papercut with a literal edge case.

And no, editions don't work either. Editions are currently only for language changes, not library changes, and even if we would extend it to minor library things, it'd be something like hiding deprecated names from resolution, not something as huge as changing what traits a stable type implements! (Remember, the same standard library is used by all editions, all of which can coexist in a single project.)

I don't think the argument that always panicking is the only possible choice is fair, because we've already chosen that regular addition (if you were doing a loop manually by incrementing a variable) is debug-checked for overflow, and wraps in release mode. It is also consistent for the top-unbounded range to behave this way.

5 Likes

I'm not suggesting that we should do this, but: in an edition, we could change what type the .. operator produces, preserving the current RangeFrom but not using it.

10 Likes

I am not sure if this is acceptable in all cases.

I guess the most appropriate behaviour depends on whether one considers fixed-width integer types to represent values that are assumed to fit in the type’s range in practice (in which case panicking may be more useful in that it can catch bugs when that assumption proves wrong), or whether they represent values that cannot be outside the type’s range in principle (in which case, it is entirely reasonable to just stop at the maximum value).

In other words, it depends on whether fixed-width integers are supposed to (approximately) model ℕ and ℤ or (exactly) model ℤ ∩ [−2n − 1, 2n − 1) and ℤ ∩ [0, 2n). (For ℤ/2nℤ, there is Wrapping<_>.) That the default arithmetic operations have been defined to panic on overflow (at least in debug mode) instead of returning Option<_> suggests the former.

1 Like

Very interesting problem indeed.

So, for reference, here is the implementation of Iterator for RangeFrom :

So, the acceptable behavior when trying to go beyond the upper bound, given how the start .. notation does show that we do not care about it, despite the range of machine integers being finite, are to either exhaust / panic! (in a consistent manner), or to wrap.

  • There is a third option, which I will mention for the sake of completeness, which would be to endlessly saturate instead of wrapping (e.g., for u8: ..., 254, 255, 255, 255, ...). I think this option has the advantage of being "strictly worse" than the cycle alternative, so that we can dismiss it (the only advantage I can see is that it would offer the invariant that the elements yielded by start .. are all ≥ start, but for that the exhausting alternative is better).

The issue, however, as @CAD97 pointed out, is that it is technically impossible to offer a behavior beyond {integer}::MAX different than that of one of the previous values (a.k.a., "since we can only hold an {integer}, which one should represent the state of next({integer}::MAX)?"):

  • to exhaust or panic beyond {integer}::MAX that value would have to be {integer}::MAX, which semantic-wise would mean that instead of start .. ~ start ..= MAX, we'd have start .. ~ start .. MAX :confused:

  • thus the only solution that matches both the technical constraints (without edition "breakage" power, more about that below), and an acceptable behavior is to choose that value to be 0, i.e., wrapping behavior:

impl<A: Step> Iterator for ops::RangeFrom<A> {
    type Item = A;

    #[inline]
    fn next(&mut self) -> Option<A> {
        let mut next = self.start;
        // next = next.wrapping_add_usize(1);
        match next.add_usize(1) {
            | Some(it) => { next = it; },
            | None => { next.replace_zero(); },
        };
        Some(mem::replace(&mut self.start, next))
    }

    #[inline]
    fn size_hint(&self) -> (usize, Option<usize>) {
        (usize::MAX, None) // wrapping ⟹ cyclic ⟹ infinite
    }

    #[inline]
    fn nth(&mut self, n: usize) -> Option<A> {
        // self.start = self.start.wrapping_add_usize(n);
        if let Some(plus_n) = self.start.add_usize(n) {
            self.start = plus_n;
        } else {
            // Unhappy path: no optimization
            self.take(n).for_each(drop);
        }
        self.next()
    }
}

This is a non-surprising behavior, which is consistent and technically compatible with the current constraints, expecially since the Step trait is unstable and could thus have a wrapping_add_usize method added.


Given that the panic!-king behavior could be preferable to that of a cycle, we could try to go for that, at the cost of a little bit of effort and a hack (given how much the API of RangeFrom was stabilized (all fields pub + Iterator instead of just iterable), having to use a hack should not be that surprising):

Hacky idea: abusing edition changes

This is based on @josh suggestion, but in a more moderate fashion; given that if the state of {integer} .. were an Option<{integer}> we would have our problems solved, maybe we could, through an edition:

Make {integer} .. unsugar to RangeFrom { start: Some({integer}) }, and have RangeFrom<{integer}>'s API be ported to RangeFrom<Option<{integer}>>'s, but with the exhausted or panic!-king behavior.

I'm just going to outline here what I think is my current preferred solution (using a good deal of std magic to properly implement overflow checks):

/// Types with a notion of a `successor`/`predecessor` operation.
unsafe trait Step: Clone + PartialOrd + Sized {
    /// The number of forward steps required to get from `start` to `end`.
    fn steps_between(start: &Self, end: &Self) -> Option<usize>;

    /// Step forward, returning None if step would overflow.
    fn forward_checked(this: &Self, count: usize) -> Option<Self>;
    /// Step forward, panicking, wrapping, or saturating on overflow.
    fn forward(this: &Self, count: usize) -> Self { Step::forward_checked(this, count).unwrap() }
    /// Step forward, causing undefined behavior on overflow.
    unsafe fn forward_unchecked(this: &Self, count: usize) -> Self { Step::forward(this, count) }

    /// Step backward, returning None if step would overflow.
    fn backward_checked(this: &Self, count: usize) -> Option<Self>;
    /// Step backward, panicking, wrapping, or saturating on overflow.
    fn backward(this: &Self, count: usize) -> Self { Step::backward_checked(this, count).unwrap() }
    /// Step backward, causing undefined behavior on overflow.
    unsafe fn backward_unchecked(this: &Self, count: usize) -> Self { Step::backward(this, count) }
}

unsafe impl Step for uN {
    // ...
    /// This impl respects debug-checked arithmetic;
    /// It panics in debug mode and wraps in release mode.
    fn forward(this: &Self, count: usize) -> Self {
        match Self::try_from(n) {
            Ok(n) => Add::add(*self, n),
            Err(_) => {
                Add::add(Add::add(Add::add(*self, n as uN), uN::MAX), 1)
            }
        }
    }
    // ...
}

impl<A: Step> Iterator for ops::RangeFrom<A> {
    type Item = A;

    fn next(&mut self) -> Option<A> {
        let next = Step::forward(&self.start, 1);
        Some(mem::replace(&mut self.start, next))
    }

    fn nth(&mut self, n: usize) -> Option<A> {
        self.start = Step::forward(&self.start, n);
        self.next()
    }
}

This should give RangeFrom consistent debug-checked behavior over primitive integers as well as predictable behavior for other Step types, if/when Step becomes stabilizable.

I think this is interesting. We could even use (some/all) of the current range types as the intoiters of the hypothetical new copy-but-not-iterable interval types. (Resolving the RangeInclusive question of smaller vs faster by having both.) RangeBounds might even mean that this isn't as horrific to do as it first would seem.

This would also be an opportunity to enforce T: PartialOrd on range literals, if we wanted.

6 Likes