Pre-RFC: Range trait (For the drain-range feature)

bluss · August 5, 2015, 11:29pm

I’m working on a draft RFC to finish up .drain() and .drain(a..b) for collections so that we can get them stabilized.

I’m having trouble with the range trait, currently called RangeArgument. How should it be designed to best support index ranges; including inclusive ranges in the future?

One headache is that inclusive ranges may actually overflow if you try to convert them to a half-open range with both upper and lower bound, so not even that is a good common representation (even if that’s the only thing the Vec really works with).

I’d be happy to see ideas how it should be implemented. The draft drain rfc is below, including a sketch Range trait.

Right now I’m leaning towards just the basic Range trait, but that will lead to duplication of logic every place where it is used.

Summary

Implement .drain(range) and .drain() respectively as appropriate on collections.

Motivation

The drain methods and their draining iterators serve to mass remove elements from a collection, receieving them by value in an iterator, while the collection keeps its allocation intact (if applicable).

The range parameterized variants of drain are a generalization of drain, to affect just a subrange of the collection, for example removing just an index range from a vector.

drain thus serves both to consume all or some elements from a collection without consuming the collection itself. The ranged drain allows bulk removal of elements, more efficently than any other safe API.

Detailed design

Implement .drain(a..b) where a and b are indices or keys, for all collections that have meaningful order.
Implement .drain() for other collections. This is just like .drain(..) would be (drain the whole collection).
Ranged drain accepts all range types, currently …, a…, …b, a…b, and drain will accept inclusive end ranges if they are implemented.
Drain removes every element in the range.
Drain returns a lazy iterator that produces the removed items in the range, by value.
Drain removes the whole range, regardless if you iterate the draining iterator or not.

`RangeArgument` trait

Currently called collections::range::RangeArgument, to be called IndexRange.

Use case:

impl Vec<T> {
    fn drain<R>(&mut self, range: R) -> Drain<T> where R: IndexRange
}

/// Implemented by Range, RangeFrom, RangeTo, RangeFull and
/// future inclusive range types
pub trait Range<T> {
    fn start(&self) -> Option<&T>;
    fn end(&self) -> Option<&T>;
    fn includes_start(&self) -> bool;
    fn includes_end(&self) -> bool;
}

/// Implemented by ranges with usize index
pub trait IndexRange : Range<usize> {
    /// Start value (exclusive)
    ///
    /// Return if start value is present, else `None`.
    fn exclusive_start(&self) -> Option<usize>;

    /// Start value (inclusive)
    ///
    /// Return if start value is present, else `None`.
    fn inclusive_start(&self) -> Option<usize>;

    /// End value (exclusive)
    ///
    /// Return if end value is present, else `None`.
    fn exclusive_end(&self) -> Option<usize>;

    /// End value (inclusive)

    /// Return if end value is present, else `None`.
    fn inclusive_end(&self) -> Option<usize>;
}

stebalien · August 6, 2015, 12:03am

*caugh* *caugh*

Specifically:

pub enum Bound<Idx> {
    Inclusive(Idx),
    Exclusive(Idx),
    Unbounded,
}
pub trait Range<Idx> {
    fn into_bounds(self) -> (Bound<Idx>, Bound<Idx>);
    fn bounds(&self) -> (Bound<&Idx>, Bound<&Idx>);
}

Pros: The user only has to implement two methods and it's easier to match on the range.

mdinger · August 6, 2015, 6:00am

s/receieving/receiving/

bluss · August 6, 2015, 10:38am

This is nice, but still, it’s quite a process to use this to extract the two indices that for example the drain method needs.

stebalien · August 6, 2015, 6:46pm

Regardless of how this is defined, I think we should define:

pub trait IndexRange : Range<usize> {
    /// Return inclusive start, and end bounds or None if the range is empty.
    /// Note 1: If the inclusive start > the inclusive end, the range is considered empty.
    /// Note 2: Unbounded range endpoints return 0 and usize::MAX for start and end respectively.
    fn normalize(&self) -> Option<(usize, usize)> { /* default impl */ }
}

Also, how good is LLVM at optimizing out functions that return constants? Should we wait for https://github.com/rust-lang/rfcs/pull/1062 and use associated constants for inclusive/exclusive/unbounded?

glaebhoerl · August 8, 2015, 9:43pm

I did some brainstorming on this here. Copy-pasting:

RangeArgument trait brainstorming:
trait IsRange {
     type Idx;
     fn as_range(&self)         -> Range<Option<&Idx>>;
     fn as_mut_range(&mut self) -> Range<Option<&mut Idx>>;
     fn into_range(self)        -> Range<Option<Idx>>;
}
I feel like an associated type is more appropriate here? The existing range types all have an index type that's uniquely determined by the type of the range.

Instead of a tuple or separate getters, we can use Range itself. I find this appealing aesthetically, though I'm not sure about the ergonomics. This could potentially have positive interactions if we were to give the Range types fleshed-out APIs with inherent impls (e.g. range intersection, smallest-containing-range, ...), as befits their status as built-in types.

I'm not sure if there's a way to abstract the three very similar methods into just one, or if not, whether the single trait should perhaps be decomposed into a hierarchy (a la the other Foo, FooMut traits). (If the only implementors are going to be the built-in range types, perhaps not.)

I don't like the name RangeArgument. Passing any range type as an argument is a particular use case, but not what the trait is. I'd most like to call it simply Range, but the conflict with struct Range would probably be too annoying. (I don't really like IsRange either - it's just a bad compromise after I couldn't think of anything better.)

Can we just use convert::{As, AsMut, Into} instead? (Maybe if we had trait aliases to make it ergonomic...?)

This doesn't address inclusive ranges, but perhaps that's just a matter of replacing Option in the above with @stebalien's Bound enum, which seems like a clean solution.

bluss · August 8, 2015, 10:09pm

Thanks! I'll insert some quick comments since I've touched upon some of these issues in my planning.

Associated type does not work for RangeFull (does not compile — the range type does not uniquely identify an index type).

Maybe we can use IndexRange if the trait narrows in focus. But trait Range<T> would be the best for the general version.

glaebhoerl · August 9, 2015, 11:11am

OIC. I didn’t even realize RangeFull isn’t generic. Was there any discussion back when it was added about whether it should be or did it just happen this way?

bluss · August 9, 2015, 11:14am

This is about what I want to do, however I'm looking for the "perfect" solution. Criticisms of this sketch:

Returning 0 or usize::MAX unconditionally is not useful for .drain or for slicing.
Returning None for empty range is not enough, the position is significant. 10..10 might be in bounds and 32..32 is not.
Inclusive upper bound has imperfection: We need +1 that for the correct pointer offset to the end of the iterator or slice, so we need to handle overflow.

What I want is something like this:

/// Return endpoints for a half open range suitable for indexing a vector
/// of  `length`, or `None` if out of bounds
fn into_index_range(self, length: usize) -> Option<(usize, usize)>;

The problem with this is that we are going to trust bounds checking to the trait impls, then they need to be unsafe. So it's unbeautiful. But this would allow us to treat the inclusive upper bound ranges right without overflow trouble, and without transferring the overflow problem to the consumer side and to code that mostly deals with half open ranges.

bluss · August 9, 2015, 11:16am

I will probably disappoint you, but my goal is the simplest possible trait to complete the drain feature

bluss · August 9, 2015, 1:07pm

I tried this out with arrayvec’s .drain() (code), and it seems to fit the requirements. The downside is as noted above, the unsafe impl required to trust the bounds check.

In a way it comes down to checking end <= len for half open ranges and end < len for closed ranges, and it’s better to do this up front instead of trying to apply arithmetic (that will overflow etc) before you do the comparison.


use std::ops::{
    RangeFull,
    RangeFrom,
    RangeTo,
    Range,
};

/// For illustration, add an inclusive range too
#[derive(Copy, Clone)]
pub struct InclusiveRange<T> {
    pub start: T,
    pub end: T,
}

pub trait IntoIndexRange {
    fn into_index_range(self, len: usize) -> Range<usize>;
}

pub unsafe trait IntoCheckedRange {
    fn into_checked_range(self, len: usize) -> Option<Range<usize>>;
}


impl IntoIndexRange for RangeFull {
    #[inline]
    fn into_index_range(self, len: usize) -> Range<usize> {
        0..len
    }
}

impl IntoIndexRange for RangeFrom<usize> {
    #[inline]
    fn into_index_range(self, len: usize) -> Range<usize> {
        self.start..len
    }
}

impl IntoIndexRange for RangeTo<usize> {
    #[inline]
    fn into_index_range(self, _len: usize) -> Range<usize> {
        0..self.end
    }
}

impl IntoIndexRange for Range<usize> {
    #[inline]
    fn into_index_range(self, _len: usize) -> Range<usize> {
        self
    }
}

impl IntoIndexRange for InclusiveRange<usize> {
    #[inline]
    fn into_index_range(self, _len: usize) -> Range<usize> {
        // this doesn't work so well!
        self.start..self.end.saturating_add(1)
    }
}


unsafe impl IntoCheckedRange for RangeFull {
    #[inline]
    fn into_checked_range(self, len: usize) -> Option<Range<usize>> {
        Some(0..len)
    }
}

unsafe impl IntoCheckedRange for RangeFrom<usize> {
    #[inline]
    fn into_checked_range(self, len: usize) -> Option<Range<usize>> {
        if self.start <= len {
            Some(self.into_index_range(len))
        } else { None }
    }
}

unsafe impl IntoCheckedRange for RangeTo<usize> {
    #[inline]
    fn into_checked_range(self, len: usize) -> Option<Range<usize>> {
        if self.end <= len {
            Some(self.into_index_range(len))
        } else { None }
    }
}

unsafe impl IntoCheckedRange for Range<usize> {
    #[inline]
    fn into_checked_range(self, len: usize) -> Option<Range<usize>> {
        if self.start <= self.end && self.end <= len {
            Some(self.into_index_range(len))
        } else { None }
    }
}

unsafe impl IntoCheckedRange for InclusiveRange<usize> {
    #[inline]
    fn into_checked_range(self, len: usize) -> Option<Range<usize>> {
        if self.start <= self.end && self.end < len {
            Some(self.start..self.end + 1)
        } else { None }
    }
}

bluss · August 14, 2015, 8:49pm

Drain-range RFC posted! Please help me by discussing it there.

Thanks for the comments here.

bluss · March 25, 2019, 8:24am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: `{BTreeMap, BTreeSet}::drain` with a range libs	17	1851	April 8, 2022
Drain API Design libs	4	946	March 25, 2019
[crazy] Replace Most of the Collections API with ranges and iterators libs	12	4526	March 25, 2019
BTreeMap::drain with a range libs	5	1414	April 8, 2021
Pre-RFC: Lazy Drain and Self-Exhausting Iterators libs	15	1608	March 25, 2019

Pre-RFC: Range trait (For the drain-range feature)

Summary

Motivation

Detailed design

RangeArgument trait

Related topics

`RangeArgument` trait