Pre-RFC: Fixing Range by 2027

This probably won't happen for the 2024 edition but i think it would be criminal to continue to let this slide past 2027 so I figured I'd share some thoughts here.

It sounds like the most tenable solution is:

  • add a new set of range types
    • core::ops::range2027::Range*
    • all identical to the existing types except RangeInclusive, which will not have the exhausted flag
    • impl<T: Copy> Copy for Range*<T>
  • impl IntoIterator, not Iterator
    • the exhausted flag will live on IntoIter where applicable
  • change the range syntax to resolve to the range2027 types in edition 2027
  • add a coercion between the old and new range types for interoperability

Open Question: how to handle reference coercions? The old and new types would need to have identical layout, but old RangeInclusive has the exhausted flag and new one would hopefully not.

One option for that issue is redefining the old RangeInclusive in terms of the new:

pub struct RangeInclusive2027<Idx> {
    pub start: Idx,
    pub end: Idx,
}
pub struct RangeInclusive<Idx> {
    pub(crate) inner: RangeInclusive2027<Idx>,
    pub(crate) exhausted: bool,
}

Then at least old references can coerce to new ones, but unfortunately new ones cannot coerce to old ones. But that's probably okay, since using newer edition dependencies in a program on an older edition is pretty rare and easily fixed by upgrading edition.

Another Open Question: How do we implement coercions? Do we want to special-case this, make it a perma-unstable language feature, or make implicit conversion a language feature available for all?

By far, I imagine implementing the coercions will be the most difficult and controversial aspect of this proposal. The libs changes for this will be pretty trivial by comparison.

38 Likes

Are the coercions that necessary? Most std range APIs go through RangeBounds/SliceIndex already to be able to support the various ranges, the new types can implement those too and will be supported directly. If there are other APIs in the ecosystem that use concrete range types, an explicit conversion via .into() seems good enough.

14 Likes

You make a good point. Maybe they're not that necessary. It would be good to see a survey of how many .into()s would be needed across the ecosystem.

Thanks for posting this, I would love to see Range be fixed.

This may be possible to do in edition 2024 - in fact, the relevant issue is in the ed2024 project already Lang Edition 2024 · GitHub / https://github.com/rust-lang/rfcs/issues/2848. It is categorized as "needs RFC" and "needs champion" so you may want to link this there.

There is some more discussion on this topic at Rust 2024 survey · Issue #209 · rust-lang/lang-team · GitHub.

Possible justifications to add for an RFC:

  • Storage in Copy structs should be possible
  • A lot of Span implementations could be Ranges if they were Copy, meaning a lot of crates will be able to reuse this std type instead of recreating the same API
  • RangeInclusive gets smaller (mentioned indirectly in the top post)
  • RangeInclusive not having a bool means the type becomes bytemuckable
  • Easier to reuse:
    let r = 0..10;
    let a = x[r];
    let b = y[r];
    
    That really should work without a .clone() for the best ergonomics, but it of course can't right now.

Possible downsides:

  • Churn from passing a Range to an : Iterator bound (we should be able to emit a nice error here, maybe even rustfix something).
  • Lack of precedent using editions to help library features... but I think we need to figure out a good way to do this at some point, this won't be the only use case. And we do something similar for macros like panic!.
  • It is pretty universally agreed upon that this would make things the way they "should" be. But is there anything that becomes less ergonomic with the change? That is tough to know.
  • Added iterator type(s) (the RFC should probably say what these look like)

It would be interesting to do a crater run with the changed impls to have an idea of how widespread the effects are.

17 Likes

One possible issue is that things like (2..10).rev() would now need to call into_iter: (2..10).into_iter().rev().

But the most commonly used ones like rev and map can easily be added as associated functions directly on the types themselves to maintain the ergonomics (which will also reduce fixing needs).

I think for the most part, they'll look like the current Range types, but with private fields. The RangeFrom iterator should probably also have an exhausted field so you can do this without hitting a debug overflow panic:

for x in 0_u8.. {
    dbg!(x);
}
...
[src/main.rs:4] x = 253
[src/main.rs:4] x = 254
thread 'main' panicked at core/src/iter/range.rs:393:1:
attempt to add with overflow

related

1 Like

Not to start the bikeshedding too early, but it's worth making some of these "what should this look like in 10 years". My personal preference would be a reasonable name (std::ops::Span*?) with Range* types deprecated but only edition 20xx onward (this mechanism doesn't exist right now as far as I know). I think a lot of this will shake out in the implementation, but don't be shy.

4 Likes

Yeah that was just a stand-in. I will probably do something like std::range::Range* or std::ops::range::Range*. I'm not a huge fan of Span because that means something very different in C++ (and probably other languages), where it is the analog to Slices.

1 Like

I think at least some people (strongly?) feel that RangeFrom is meant be infinite and never return None by definition, and the current overflow behavior is the expected one (and as such you’re always supposed to take etc to make it finite). My own intuition was that RangeFrom would indeed stop at T::MAX if T does have a maximum and I was slightly surprised that it does overflow instead. But on the other hand it is consistent that the Step impl simply does whatever += 1 does.

(Maybe it could be useful to have .wrapping(), .saturating() and .checked() adapter methods? Probably too niche.)

7 Likes

As I recall this was the major complication that kept this from happening for the 2021 edition.

I don't know exactly how hard that is to address, though. Maybe it's "well it's been three more years and this has only gotten more annoying so we need to fix Range more than we need to worry about these", and we should just bite the bullet.

That worries me, in some cases. If Range is not an iterator, I might expect Range::map to be more like (10..20).map(|x| x * 2) → 20..40, like Option::map.

(And, practically, I'd like a rule that limits the copying as much as possible. There are just so many Iterator methods! I actually rather like rev: Range<T> → RevRange<T>, though, where RevRange<T> could again just be IntoIterator, not Iterator. And it might be spelled Rev<Range<T>> or Reverse<Range<T>> or something; dunno if a type for it would make sense. But RevRangeTo<T> could be IntoIterator, which could be interesting.)

12 Likes

I don't think confusion is likely given how common this already is. Plus, it's pretty easy to tell based on the function signature what is going on and this is easy to document.

I agree that we should limit it to the most useful / common cases.

I don't really see any benefit to returning an IntoIterator type instead of Iterator. Feels kinda like if drain returned an IntoIterator

What else would that type be able to do? I guess it could implement RangeBounds but reverse bounds don't really fit and mapped bounds can't be used the same.

Making it return an iterator has a few benefits:

  • easier to implement
  • matches existing behavior
  • less code churn
  • can chain iterator methods right off it

The last one is really important to avoid needing into_iter in the middle of a chain:

(3..11).map(|x| x*2).rev()
// vs
(3..11).map(|x| x*2).into_iter().rev()

(0..LEN).map(|_| Default::default()).collect()
// vs
(0..LEN).map(|_| Default::default()).into_iter().collect()

(0..max_width_index_in_input)
    .rev()
    .skip_while(.....)
// vs
(0..max_width_index_in_input)
    .rev()
    .into_iter()
    .skip_while(.....)
1 Like

I somewhat believe we should also support inverted ranges, if not in the same RFC, then in another.

2 Likes

What do you mean by inverted ranges? A syntax for reversed direction ranges?

I mean iterating ranges where start>end.

Okay I understand, but are you talking about introducing new syntax or just new standard types? Because I think adding syntax for that will be a pretty tough sell:

  • Needs to be succinct
  • Yet readable and immediately understandable
  • Especially versus existing range syntax
  • And needs to be substantially shorter than (3..11).rev()

I didn't think too hard about it, but I'd like (10..=0).into_iter() to just work™️.

1 Like

Would be a silent breaking change given that currently it's guaranteed to be an empty range.

4 Likes

Were talking about an edition change here!

Yes, but it's not like editions allow arbitrary breakages! This change would cause a silent bug and would also not be rustfix-able in the general case, so I'm pretty sure it's out of the question. Maybe in a hypothetical Rust 2 someday…

11 Likes

But editions are opt-in. When you update edition of your crate, it's up to you to find and fix breakage.

The currently-existing types already implement Iterator; is there a compelling reason to replace them with new types for this purpose?

If we call the new types something like Span*<T>, they can coexist with the ones we already have in perpetuity and the only edition change would need to be which type the .. operator expands to. Anything generic over RangeBounds, SliceIndex, or IntoIterator will work with both families out of the box, which provides a reasonable path forward for libraries that want to be ergonomic cross-edition.


For better or worse, Range* in Rust is the name of a type that implements Iterator and there's a lot of documentation/books/blogs that convey that. As we need types that implement Iterator in the new scheme as the IntoIterator::IntoIter associated type, it seems like unnecessary churn to change this meaning.

Most Rust libraries that have their own copyable range-like type seem to have settled on the name Span*, so it seems like a natural choice. But something else could work just as well— Just please don't call the new types Range*, as it will only cause confusion with people that find old information.