Pre-RFC: Range Extension Syntax and Offset Method (Draft)

Note that I'm still drafting this out, so it's not a "set-in-stone" RFC.

Feature Name: Range Extension Syntax and Offset Method

  • Feature Name: range_offset
  • Start Date: __
  • RFC PR: __
  • Rust Issue: __

Summary

Introduce new syntax for ranges to allow for a more concise representation of ranges with a specified length or offset from the start. Additionally, introduce an offset method for ranges to achieve similar functionality programmatically.

Motivation

Rust's range syntax offers robust capabilities, but developers frequently find themselves defining a range by specifying a starting point and then determining the end based on a certain offset or length from that start. This is especially prevalent when dealing with byte arrays or general buffers. In such cases, the need arises to define a range that spans from an initial index to another index derived from an offset. This proposal seeks to streamline these common patterns, enhancing clarity and efficiency in both code and functionality.

Guide-level explanation

With this proposal, the following new range syntaxes and methods are introduced:

Proposal:

  1. start..+offset: Represents a range starting at start and extending by offset units. This syntax introduces a concise way to define a range that begins at a specified start value and extends forward by a given offset. The resulting range is exclusive of the end value.

  2. start..+=offset: Represents a range starting at start and extending by offset units, inclusive of the end. This syntax is similar to the previous one but includes the end value in the range, making it an inclusive range. It starts at the specified start value and extends forward by the given offset, including the end value. Note that using += syntax felt more intuitive and understandable than =+, but one could still argue for the other

  3. offset(self, length: usize) -> Range: A method on the RangeFrom type that extends the range by the specified length. The .offset method is proposed as an extension to the RangeFrom type in Rust's standard library. The primary goal of this method is to provide a concise and intuitive way to define a range that starts from a specified value and extends by a given offset. Note that this is not exclusive to RangeFrom, so it could be defined for other range members too.

Examples:

let v_len = 10;

// Using the new syntax:
let range1 = v_len..+5;       // Equivalent to [v_len, v_len+5) interval, or v_len..v_len+5 range
let range2 = v_len..+=5;      // Equivalent to [v_len, v_len+5] interval, or v_len..=v_len+5 range

// Using the offset method:
let range4 = (v_len..).offset(5);  // Equivalent to v_len..v_len+5

Basic Implementation Strategy

There are two primary ways to implement the proposed .offset method:

1. Extending the Existing RangeFrom Types

Given a RangeFrom instance, the .offset method would take a single argument, the offset, and return a Range. This is a straightforward approach and would not introduce new types into the standard library.

impl<T: Add<Output = T> + Copy> RangeFrom<T> {
    pub fn offset(self, length: T) -> Range<T> {
        Range {
            start: self.start,
            end: self.start + length,
        }
    }
}

Pros:

  • No new types introduced, keeping the standard library lean.
  • Direct and intuitive for users familiar with the existing Range types.

2. Introducing a New RangeOffset Struct

This approach involves creating a new struct, RangeOffset, which represents a range with a specified offset. The .offset method would then return this new type.

Pros:
  • Clear separation between traditional ranges and offset-based ranges.
  • Allows for potential additional methods or properties specific to offset-based ranges in the future.
Cons:
  • Introduces a new type, which can increase the complexity of the standard library.
  • Users would need to become familiar with another range type.

Decision Factors:

  • Backward Compatibility: Modifying the existing Range types might have implications for backward compatibility. Introducing a new type would sidestep this issue.

  • Intuitiveness: Using the existing Range types might be more intuitive for users, as they wouldn't need to learn about a new type. However, a new type could provide clearer semantics for offset-based operations.

  • Flexibility: A new type offers more flexibility for future extensions specific to offset-based ranges.


Given these considerations, community feedback would be appreciated. Both strategies have their merits, and the decision should weigh the benefits of simplicity and intuitiveness against the flexibility and clarity of introducing a new type.

Reference-level explanation

The compiler's handling of range syntax would need to be extended to recognize and correctly parse the new patterns. The resulting Range objects would be equivalent to those produced by the existing syntax. Additionally, the Range type would need to be extended to include the offset method.

Drawbacks

Introduces new syntax and methods which might have a learning curve. Potential for confusion with multiple ways to express similar ranges.

Rationale and alternatives

This proposal makes certain range patterns more concise both syntactically and programmatically. An alternative is to rely solely on methods like offset without introducing new syntax. Another alternative is to not make any changes and rely on the existing syntax and methods.

Prior art

Languages like Python have slicing syntax, but Rust's range syntax is unique. This proposal is about making Rust's range syntax and methods more expressive for common patterns.

Unresolved questions

  • Are there potential parsing ambiguities introduced by this syntax? e.g start.. + 5 with Add<usize>? (Through, orphan rule prevents this)
  • How would this new syntax and method interact with other potential future changes to Rust's range syntax or type system?
  • Is this already solved by another library crate?
3 Likes

I don't like the naming of offset. It really sounds like it offsets entire range, but in actuality it only affects the end, and it doesn't even offset it, instead replacing it with start + offset.

5 Likes

My expectation reading this code would have been

let range4 = (v_len..).offset(5);  // Equivalent to v_len+5..

and have it on all range types

(x..y).offset(5); // Equivalent to x+5..y+5

We don't even have a way to take a RangeFrom and apply an endpoint to it, the closest is (v_len..).take(5) but that's a iter::Take<RangeFrom> not a Range. (RangeFrom being an Iterator makes adding inherent methods to it a little tricky as well).

2 Likes

Yeah, what would offset do in this situation:

let mut r = (1..);
r.next();
let r2 = r.offset();

I can see the ambiguity with using .offset in this context yes. It is so hard to find a good name for it. How do you feel about .span(length)/.span_with(length)/.span_by(length) or .stretch(length)/.stretch_with(length)/.stretch_by(length)?

I would separate adding any methods from adding new syntax. Adding methods would be much easier if we do the oft-discussed transition to a Range: IntoIterator type instead.

1 Like

I agree, that the lack of behavior, and that it being an iterator creates a dilemma. It might be better to extend these types with constructor methods instead.

Range::slice(start, length)
Range::span(start, length)

There might be some criticism against using slice in this context, given the Rust Idioma surrounding the word slice. Also, there's a std::slice::range(..) which might feel like they are inverses of each other, which they are not.

Hmm, sure, that could also be a good first step. But do you agree with the new syntax?

I would definitely suggest avoiding offset for this, because to me that means https://doc.rust-lang.org/std/primitive.pointer.html#method.offset -- really, a[i] or a[i..] is more like offset than this would be, since that uses a.offset(i) (essentially) internally.

Note, also, that I suggest always using a[i..][..n] instead of a[i..(i+n)], because it avoids dealing with the arithmetic overflow in the addition.

I prefer the double-slicing too when that’s what you’re doing, but sometimes you really are constructing Ranges, and I like the new operators for that.

For the method form to go from RangeFrom to Range, I would just as soon write

let a = 5..
let b1 = a..7
let b2 = a..+2

Or leave that out altogether, that one doesn’t come up as often.

As far as names go — for the proposed method named offset, how about a term such as expand / extend / lengthen / widen?


WDYT about adjusting the implementation to take account of dimensions with differently-types derivatives? For example the derivative of a Date is a Duration, not a Date, so we'd like to be able to extend a Date by a Duration.

e.g. A trait named RangeExtend:

use chrono::{NaiveDate, Duration};
use std::ops::Range;

trait RangeExtend<U> {
    fn extend(self, length: U) -> Self;
}

impl<T, U> RangeExtend<U> for Range<T>
where
    T: Add<U, Output = T>,
{
    fn extend(self, extension: U) -> Range<T> {
        Range {
            start: self.start,
            end: self.end + extension,
        }
    }
}

...and then this just works:

let start = NaiveDate::from_ymd_opt(2023, 9, 21).unwrap();
let end = NaiveDate::from_ymd_opt(2023, 9, 24).unwrap();
let date_range = start..end;
let extended = (start..end).extend(Duration::days(5));

dbg!(extended);
// extended = 2023-09-21..2023-09-29

More generally, I like the proposal, though I'm not sure it's significant enough to be in the language vs. implemented on the types. Why not start a crate with this, for the method albeit not the syntax?

The crate could also implement methods such as .shift. Or .stretch for types that implement Mul with themselves, etc.

  • start..+offset: I really like this. I regularly use ranges start..start + offset so this is an easier and less error-prone spelling. e.g. bytes[start + 37..+8] is clearly the 8 bytes starting at byte 37 while bytes[start + 37..start + 45] is less obviously an 8-byte value.
  • start..+=offset: I would prefer the =+ spelling. += makes me think that something is getting assigned. With =+ it is clearer that this is an inclusive range ..= combined with the +offset above.
1 Like

The RFC should also mention the existing alternative of using two open ranges e.g. bytes[start + 37..][..8]. This is currently the clearest way to spell (start, length) ranges and making the argument that ..+ is even nicer is important.

To be clear, it’s the nicest way to spell (start, length) slices, doesn’t help with ranges.


The problem with "extend" etc is that they read poorly when the "extension" is negative. Also, sometimes you might want to extend the start instead. If you already have a start and end, relative length changes are already ergonomic with normal +/- operators (start..end+delta etc.)

One option could be a builder pattern type syntax, where you could write eg. start(a).len(b) where start returns a T.. and len then bounds it. This would also symmetrically allow end(a).len(b). A downside would be a syntax different to and partly redundant with the existing dotdot constructors.

1 Like

Well, I don't think that's necessarily clear, if you're used to programming languages that have a unary + operator. Sure, Rust doesn't, but start + 37..+8 and start - 37..-8 being completely different feels bad.

The best way to be "clearly 8 bytes" is to use bytes[start + 37..][..8], because [..8] will always be clearer for being 8 elements than anything else.

1 Like

Perhaps I phrased it badly, but my use of the term "clearly" was comparing ..+8 to ..45 where the writer wants to emphasize the length of the range.

I agree that ..+offset could be confused with ..offset with a unary plus. Rust is asymmetric with unary + and - so this seems consistent within Rust. Anyone familiar with unary + from other languages should wonder why a (typically no-op) unary plus is used.

This has been said repeatedly in this thread but ranges have uses other than slicing.

1 Like

Maybe then

let range = (start..)[..len]

should be a range? (With the caveat again that the range syntax would first have to be chaged to be an IntoIter.)

Also iff we do this +1 on the =+len syntax.

I don't think that's possible since Index::index must return a reference.

Hm. Currently there are a couple gotchas in redefining, but potentially this could be allowed:

let range = (start..)..len;

(I do not endorse the above)

On topic, I think this would work as a method regardless of whether or not new syntax gets added for it. If that method sees just tons of use in the wild then promoting it to syntax would be a very real motivation. Could we also include a similar method for (..10).with_lower_offset(5) ≡ 5..10. Or is that sign inversion (necessary for unsigned ranges) a little too crazy?