Drain API Design


#1

From the tracking issue for drain_filter:

We’d like to see experimentation to attempt to generalize into a smaller API surface various combinations of draining:

  • A sub-range (through RangeArgument a.k.a. RangeBounds) v.s. the entire collection (though the latter could be achieved by passing .., a value of type RangeFull).
  • Draining everything (possibly within that range) v.s. only elements that match a boolean predicate
  • Self-exhausting on drop v.s. not (leaving the rest of the elements in the collection).

Possibilities might include “overloading” a method by making it generic, or a builder pattern.

One constraint is that the drain method is stable. It can possibly be generalized, but only in backward-compatible ways.

So that’s what this thread is for. Discussion of possible APIs.


#2

Builders

A builder approach would be nicely extensible in the future. The existing Drain is already an Iterator and is therefore not a good base for a builder. It could already be partially iterated. If we limit ourselves to builders that can never fail to be converted to iters, implementing IntorIter would help their ergonomics. An explicit into_iter() call will still be necessary before adapters can be chained onto them.

With the obvious drain() name unavailable, this can get verbose. E.g. drain_filter:

collection.drain_builder() // drain is not a good builder base
    .filter(f)             // where is a keyword
                           // not the same as Iterator::filter
    .into_iter()           // necessary for iterator adapters
    .chain(..)

The future proofing is nice but the ergonomics are rather bad. If we were to add this, I would still advocate for a shorthand drain_filter.


#3

Alternatives

I’ve experimented a bit with drain APIs, in particular with non-self-exhausting ones. A thing that I’ve initially overlooked is that the side effect of removal on next() limits a few iterator adapters.

Simply giving Drain and DrainFilter a non-exhausting mode could be useful but will not interact well with the following 3 adapters.

take_while will take 1 more element past the end
skip_while without removal isn’t possible
peekable removes a peeked, but unconsumed element unnecessarily

It should be relatively straightforward to add a non-destructive peek() to the drain structs so long as calling a drain_filter's closure twice isn’t a problem (once on peek(), once on next()). Peeking together with manual iteration can also fills the role of take_while() although unergonomically. skip_while() is still left out.

I’m going to describe what I’ve tried to deal with these shortcomings. As of now, I wouldn’t propose any design for addition to the std lib.

More flags

One straightforward approach is to use the same design as drain_filter but extend it to let the closure communicate more things than “take” and “keep” by letting the predicate closure return a type other than bool, namely something like:

// one possible return type
// remove and yield current element on Ok(true) | Err(true)
// stop on Err
Result<bool, bool>

// another possibility
enum Flag {
   // these 4 really need to be possible
   Take,        // Yield
   Keep,        // Continue
   Stop,        // Break
   TakeAndStop, // Return

   // bonus
   Delete,      // drop in place & continue
   Advance(usize), // nth
}

As the comments show, this duplicates a lot of imperative control structures and it’s not very ergonomic to write if *element > 10 { Flag::Take } else { Flak::Keep } instead of *element > 10. One could accept any T: Into<Flag> as output of the closure and implement From<bool> for Flag to map true to Take and false to Keep. That would allow the basic drain_filter behaviour to work as is with the generalized drain, but it’s still not very elegant.

Control inversal via proxy elements

So, why not use all these imperative control structures that we already have and just give the control to the consumer? We can hand out a proxy object to the drain on each call to next().

// without lifetimes
struct Element<T>(&mut NewDrain<T>);
// the real deal
struct Element<'el, 'a: 'el, T: 'a>(&'el mut NewDrain<'a, T>);

The minimum necessary API is just 2 functions:

impl<'el, 'a: 'el, T: 'a> Element<'el, 'a, T> {
    fn get_mut(&mut self) -> &mut T;
    fn take(self) -> T;
}

It can deref to &mut T of the current element for convenience. take() reads out T and updates the drain’s state. A proxy object can also allow much more control over the iterator through the backreference such as manual advancing, in place deletion, or peeking back or ahead arbitrarily far. If the Element isn’t consumed, its Drop will do the work required to uphold the invariants (such as contiguousness of the hole of deleted items in a vec’s drain).

Observant readers have already noticed that this API isn’t possible right now. The Element is invalidated on the next call to next(). This API requires streaming iterators and therefore Generic Associated Types which are still a long way off. Furthermore, the Element's Drop seems to hinder optimization very badly. In some benches I’ve done this had ~60-70% of the speed of regular drain on small objects (perf bug?).
It can be improved by replacing the Drop impl by what amounts to a manual drop flag inside the Drain, but it’s still considerably slower.

Proxy + Closure

The proxy object can be combined with a closure like with DrainFilter to circumvent the problem of the lifetime and the Drop impl, but you either encounter the same unsightly Flag enum of the previous approach or you’re basically driving the iteration explicitly from inside the closure. In the latter case, the proxy object doesn’t buy you much.

Manual iteration

If we abandon the Iterator protocol and allow driving the drain manually, we can avoid the performance problems with Drop for the proxy element issue but still get all the flexibility and control over draining. I’ll call this manual iterator a Cursor hereafter. The API could look like this:

pub struct Cursor<'a, T: 'a> {...}

impl<'a, T: 'a> Cursor<'a, T> {
    /// keep the current element and advance to the next
    /// does nothing if cursor is already at the end
    pub fn advance(&mut self);

    /// Take a reference to the current element
    /// Returns `None` if cursor is exhausted
    /// Doubles as peek()
    pub fn get_mut(&mut self) -> Option<&mut T>;

    /// Derefs to &mut T and can remove the element. Avoids double boundary checks
    pub fn get_proxy<'b>(&'b mut self) -> Option<Element<'b, 'a, T>>;

    /// Remove current element and advance to next
    /// Returns `None` if cursor is exhausted
    pub fn take(&mut self) -> Option<T>;

    /// Take a reference to all elements that are still in the collection
    /// The first slice is to elements that were iterated over and kept,
    /// the second slice is to the elements that are yet to be iterated over including the current element
    pub fn get_slices(&mut self) -> (&mut [T], &mut [T]);

The Iterator trait is very useful though and it’s not hard to make an iterator out of the cursor.

    /// Create an iterator out of the cursor that will yield what the given closure returns
    /// Stops on None
    ///
    /// Alternatively, a (hypothetical) std::iter::try_repeat_with() would do
    pub fn iter<F>(self, f: F) -> CursorIter<'a, T, F>
    where
        F: FnMut(&mut Cursor<T>) -> Option<T>;

The cursor gives you full control and can be performant but just like with the proxy + closure approach, this is not particularly ergonomic. For example, for-loop over drain_filter:

while let Some(el) = cursor.get_proxy()
    .filter(|el| *el < 10)
    .map(|el| el.take())
{
    /* do stuff */
}

Source code

The source code with which I’ve been experimenting is available here: Drain experiments

It contains vec drains for each of the above points (3 with Flags) and benches for draining from one vec into another and clearing a vec via drain. Don’t expect clean code.

Conclusion

All of the above designs allow for non-destructive take_while() and skip_while(). Peeking can be made possible but it’s not so straightforward to get it to behave like it would for drain_filter because skipping the elements that don’t fit the predicate needs to be done by hand.

My favorite approach so far would be the streaming iterator, possibly in combination with some cursor like methods on the iter itself. It has the minimum viable API of the three. But it’s blocked on GAT and also requires fixing the performance problems with Drop proxy objects. In the future, it could be extended to also add elements during iteration (depending on the collection). That would be very cheap e.g. for LinkedLists or Vecs if the hole of removed elements is filled.


#4

I’d like to advocate for stabilizing drain_filter now rather than later.

These 2 points from SimonSapin’s post

  • A sub-range (through RangeArgument a.k.a. RangeBounds) v.s. the entire collection (though the latter could be achieved by passing …, a value of type RangeFull).
  • Draining everything (possibly within that range) v.s. only elements that match a boolean predicate

seem to be trivially fulfilled if we just give a RangeBounds parameter to drain_filter, because we already have the stable drain with range parameter.

The 3rd point, self-exhaustion, should not be holding drain_filter back, given that simply removing the self-exhaustion from drain and drain_filter does not allow for fully general non-exhausting iteration anyway because of skip_while, take_while and peek (+ possible new functionality). There are a few unexpected problems and the solution will probably end up with a more complex API than drain_filter.

We should keep our options open and add a note to drain_filter, that it’s unspecified how many elements are removed, if the struct is leaked. We already have such a note for drain.