Pre-RFC: Fixing Range by 2027

That issue makes it seem like Range would currently implement PartialOrd lexicographically, but actually Range doesn't have an impl at all. Rather, (5..10).partial_cmp(8..9) calls Iterator::partial_cmp, which naturally is lexicographic. impl Iterator for Range strikes again.

I don't think the new type should implement PartialOrd either, because apparently there are many orders it could naturally be. My expectation would have been by inclusion (as sets).

(a..b) <= (c..d) in

  • interval order iff b <= c (assuming a <= b && c <= d)
  • inclusion order iff c >= a && b <= d

If the type of a..b doesn't enforce a <= b on creation (which it probably can't), you don't get a valid ordering with just b <= c, since you would get cycles like (1..2) < (3..0) < (1..2).

3 Likes

Oh, that's good to hear!

made me think that Range had succumbed to the usual Rust mistake of "just slap a derive on it; I'm sure it's fine".

Very nice catch, I personally would much rather have no partial_cmp than that lexically ordered one. Hopefully the lack of PartialOrd bounds, and the lexical order being silly reduces the amount of code that depends upon that to negligible levels so we won't be inclined to reintroduce it. I went ahead and left a comment in that bug report clarifying a few things...

Should there be an iter_mut method that still mutates the underlying range? Or would it be better to have a remainder method on the iterator to get the remaining range back?

This can be useful for things like try_fold which don't take the Iterator by value.

Should RangeFrom still implement IntoIterator? It's currently error prone because iterating up to the maximum value will overflow, panicking in debug and wrapping in release mode.

Where would iter_mut give references to? Only the boundary values exist in stable locations, the rest would just be temporaries discarded after the iteration. I don't think either iter or iter_mut make sense to exist, only the value yielding into_iter.

Unless I'm misunderstanding, by "get the remaining range" I guess you mean something more like a version of drain that determines the range removed based on where the iteration got to? Given no other collections have such an iterator it'd be weird for the ranges to have one.

Various iterators do offer methods to access the not-yet-iterated-over values.

2 Likes

Well, maybe it doesn't quite fit the iter_mut mold and deserves a different name, but what I meant is that advancing the iterator returned by iter_mut would increment the start of the range.

Here's how it works now:

let mut r = 0..11;
assert_eq!(r.next(), Some(0));
assert_eq!(r.start, 1);

The new ranges won't implement Iterator. Since their IntoIterator takes the range by value, .into_iter().next() won't modify the range in place:

let r = 0..11;
assert_eq!(r.into_iter().next(), Some(0));
assert_eq!(r.start, 0);

Some people might use that aspect of the current Iterator ranges, so I'm proposing two ways of enabling that:

let mut r = 0..11;
assert_eq!(r.iter_mut().next(), Some(0));
assert_eq!(r.start, 1);

// or

let r = 0..11;
let mut iter = r.into_iter();
assert_eq!(iter.next(), Some(0));
assert_eq!(iter.remainder().start, 1);

With regard to an hypothetical RangeInclusiveIterator, please do NOT copy RangeInclusive: it optimizes badly.

The problem posed by the current RangeInclusive when using as an argument to for, or otherwise iterated externally is that LLVM is incapable of splitting loops based on a monotonically updated variable (exhausted, here). This leads in LLVM checking the exhausted flag at each and every iteration of the loop, and preventing further optimization. It's a long known issue.

Instead, the iterator should be modeled as an enum whose variants is selected dynamically based on start and end variables:

  1. If possible (eg, if start can be decremented or end can be incremented), automatically switch to an open-ended range.
  2. Otherwise, somehow deal with an inclusive range.

It may be as simple as:

pub struct RangeInclusiveIter<T>(RangeInclusiveIterImpl<T>);

enum RangeInclusiveIterImpl<T> {
     Exclusive { start: T, end: T },
     Inclusive { start: T, end: T, exhausted: bool },
}

Which keeps the current representation in truly inclusive cases... though arguably, I'd advise switch the whole enum to Exclusive after the first iteration for simplicity. With luck, if not looped on directly, the next loop after the first call to next, can then be vectorized and all.

3 Likes

An inclusive range iterator is also an exclusive range iterator chained with a single element iterator. How does LLVM do with that implementation?

I've tried different implementations a bit on godbolt and it seems like the main problem causing missed optimizations here is that LLVM can't figure out or can't use the fact that exhausted will never transition from true to false.

The same applies if we ever mutate between enum variants. *self = Self::Exclusive ... will cause LLVM to never optimize out the variant matching in next, even though we never transition from Exclusive to Inclusive.

So the only thing we can do by adding an extra variant is improve runtime performance for ranges that don't cover the whole integer space. It doesn't improve anything for the 0..=MAX case.

This may be something we can improve in LLVM or we may just have to do this in MIR optimizations, in which case I don't really see a reason to make the Iterator an enum as proposed.

Uh, but isn’t that the entire point? Currently every single usage of inclusive ranges is pessimized by the need to handle the one corner case.

2 Likes

I'm not saying it's a bad idea, per se. I'm just saying that if we can fix the optimization issue it would be moot.

Badly. LLVM chokes on all chain-like iterators.

This would be ideal, yet the issue I mentioned is 6 years old so I have some doubts it will ever be "fixed".

I think that beyond of the one-off variation -- or for the more generic case of a multi-chain iterators, of the monotonic increment -- there's also the issue of code bloat: splitting a loop involves copy/pasting the loop body as many times as you split.

Sure, this may reveal optimization opportunities within the loop body afterwards -- by specializing its context -- but it may also not, and thus you'd need to develop a set of heuristics not dissimilar to inlining/constant-propagation heuristics to gauge whether it's worth splitting or not.

I wouldn't necessarily abandon hope, but after 6 years I think we're better off special-casing RangeInclusiveIter for now: we can always simplify the implementation later on if LLVM catches up!


BTW, I just realized we may need a 3rd case in the enum:

enum RangeInclusiveIterImpl<T> {
    //  Akin to `start..end`.
    Exclusive { start: T, end: T },
    //  Akin to `(start..end).map(|e| e + 1)`.
    ExclusiveShifted { start: T, end: T },
    //  Akin to start..=end.
    Inclusive { start: T, end: T },
}

The new case (shifted) being used to represent ranges of the form x..=MAX where end cannot be incremented, but the whole range can be mapped to (x - 1)..MAX and each element incremented by 1 prior to being returned.

This ensures that Inclusive is only used for full ranges, when start cannot be decremented and end cannot be incremented -- so it first needs to return start, and then be converted to ExclusiveShifted { start, end } which will take care of iterating over (start - 1)..end and incrementing each returned value.

2 Likes

ExclusiveBiased might be better as "shift" makes me think of << rather than +.

Alright, after quite a bit of work, I put together Range experiment by pitaj · Pull Request #118769 · rust-lang/rust · GitHub

Hopefully we can get a crater run started to see how much breakage is expected in the ecosystem. To start with, quote and rayon both need to be patched for Range: IntoIterator so that's not a great sign.

4 Likes

Some notes from that implementation.

The biggest issue seems to be when code assumes explicitly that Range: Iterator. I don't think it's possible to write a fix that can generally adjust these cases to use IntoIterator instead. Compare my fix for rayon to my fix for quote.

It's probably best to just continue using the legacy range types in those cases, i.e. legacy::Range::from(x..y). In other cases we may be able to transition completely to the new range types.

While we are at it, it would be nice if the new RangeTo and RangeToInclusive could also have .rev() (despite not being iterable themselves).

As for .map(), some people might expect it to merely map the bounds and return a new range, IE impl<Idx> Range<Idx> { fn map<U, F: FnMut(Idx) -> U>(self: Range<Idx>, f: F) -> Range<U>; }. So maybe that one should not be delegated to the iterator.

4 Likes

Yes, and this would be a useful new API. Especially given that Bound::map is a thing now.

2 Likes

I think changing the meaning of (1..11).map(...) is a pretty big hazard. There is a lot of existing code, documentation, etc that uses it in the Iterator sense. It seems like it would be pretty confusing especially to a newcomer to have it so something totally different between editions. Especially since in some cases it could silently change meaning:

for n in (1..11).map(|n| n*2) {
    // n = 2, 4, 6, ...., 16, 18, 20
}
// vs
for n in (1..11).map(|n| n*2) {
    // n = 2, 3, 4, 5, 6, 7, ...., 15, 16, 17, 18, 19, 20, 21
}

I'm not opposed to something like map_bounds though. Could even live on RangeBounds

7 Likes

This is probably too much to ask for, but step_by can be pretty annoying to use with ranges because the step amount is a usize regardless of the range type. Is there some way that could be addressed? I'd like to be able to specify arithmetic progressions, maybe not as a Range but as something.

3 Likes