Alternatives
I’ve experimented a bit with drain APIs, in particular with non-self-exhausting ones. A thing that I’ve initially overlooked is that the side effect of removal on next()
limits a few iterator adapters.
Simply giving Drain
and DrainFilter
a non-exhausting mode could be useful but will not interact well with the following 3 adapters.
take_while
will take 1 more element past the end
skip_while
without removal isn’t possible
peekable
removes a peeked, but unconsumed element unnecessarily
It should be relatively straightforward to add a non-destructive peek()
to the drain structs so long as calling a drain_filter
's closure twice isn’t a problem (once on peek()
, once on next()
). Peeking together with manual iteration can also fills the role of take_while()
although unergonomically.
skip_while()
is still left out.
I’m going to describe what I’ve tried to deal with these shortcomings. As of now, I wouldn’t propose any design for addition to the std lib.
More flags
One straightforward approach is to use the same design as drain_filter
but extend it to let the closure communicate more things
than “take” and “keep” by letting the predicate closure return a type other than bool, namely something like:
// one possible return type
// remove and yield current element on Ok(true) | Err(true)
// stop on Err
Result<bool, bool>
// another possibility
enum Flag {
// these 4 really need to be possible
Take, // Yield
Keep, // Continue
Stop, // Break
TakeAndStop, // Return
// bonus
Delete, // drop in place & continue
Advance(usize), // nth
}
As the comments show, this duplicates a lot of imperative control structures and it’s not very ergonomic to write if *element > 10 { Flag::Take } else { Flak::Keep }
instead of *element > 10
. One could accept any T: Into<Flag>
as output of the closure and implement From<bool> for Flag
to map true
to Take
and false
to Keep
. That would allow the basic drain_filter
behaviour to work as is with the generalized drain, but it’s still not very elegant.
Control inversal via proxy elements
So, why not use all these imperative control structures that we already have and just give the control to the consumer?
We can hand out a proxy object to the drain on each call to next()
.
// without lifetimes
struct Element<T>(&mut NewDrain<T>);
// the real deal
struct Element<'el, 'a: 'el, T: 'a>(&'el mut NewDrain<'a, T>);
The minimum necessary API is just 2 functions:
impl<'el, 'a: 'el, T: 'a> Element<'el, 'a, T> {
fn get_mut(&mut self) -> &mut T;
fn take(self) -> T;
}
It can deref to &mut T
of the current element for convenience. take()
reads out T
and updates the drain’s state. A proxy object can also allow much more control over the iterator through the backreference such as manual advancing, in place deletion, or peeking back or ahead arbitrarily far. If the Element
isn’t consumed, its Drop
will do the work required to uphold the invariants (such as contiguousness of the hole of deleted items in a vec’s drain).
Observant readers have already noticed that this API isn’t possible right now. The Element
is invalidated on the next call to next()
. This API requires streaming iterators and therefore Generic Associated Types which are still a long way off. Furthermore, the Element
's Drop
seems to hinder optimization very badly. In some benches I’ve done this had ~60-70% of the speed of regular drain on small objects (perf bug?).
It can be improved by replacing the Drop
impl by what amounts to a manual drop flag inside the Drain
, but it’s still considerably slower.
Proxy + Closure
The proxy object can be combined with a closure like with DrainFilter
to circumvent the problem of the lifetime and the Drop
impl, but you either encounter the same unsightly Flag
enum of the previous approach or you’re basically driving the iteration explicitly from inside the closure. In the latter case, the proxy object doesn’t buy you much.
Manual iteration
If we abandon the Iterator
protocol and allow driving the drain manually, we can avoid the performance problems with Drop
for the proxy element issue but still get all the flexibility and control over draining. I’ll call this manual iterator a Cursor
hereafter. The API could look like this:
pub struct Cursor<'a, T: 'a> {...}
impl<'a, T: 'a> Cursor<'a, T> {
/// keep the current element and advance to the next
/// does nothing if cursor is already at the end
pub fn advance(&mut self);
/// Take a reference to the current element
/// Returns `None` if cursor is exhausted
/// Doubles as peek()
pub fn get_mut(&mut self) -> Option<&mut T>;
/// Derefs to &mut T and can remove the element. Avoids double boundary checks
pub fn get_proxy<'b>(&'b mut self) -> Option<Element<'b, 'a, T>>;
/// Remove current element and advance to next
/// Returns `None` if cursor is exhausted
pub fn take(&mut self) -> Option<T>;
/// Take a reference to all elements that are still in the collection
/// The first slice is to elements that were iterated over and kept,
/// the second slice is to the elements that are yet to be iterated over including the current element
pub fn get_slices(&mut self) -> (&mut [T], &mut [T]);
The Iterator
trait is very useful though and it’s not hard to make an iterator out of the cursor.
/// Create an iterator out of the cursor that will yield what the given closure returns
/// Stops on None
///
/// Alternatively, a (hypothetical) std::iter::try_repeat_with() would do
pub fn iter<F>(self, f: F) -> CursorIter<'a, T, F>
where
F: FnMut(&mut Cursor<T>) -> Option<T>;
The cursor gives you full control and can be performant but just like with the proxy + closure approach, this is not particularly ergonomic. For example, for-loop over drain_filter:
while let Some(el) = cursor.get_proxy()
.filter(|el| *el < 10)
.map(|el| el.take())
{
/* do stuff */
}
Source code
The source code with which I’ve been experimenting is available here:
Drain experiments
It contains vec drains for each of the above points (3 with Flags) and benches for draining from one vec into another and clearing a vec via drain. Don’t expect clean code.
Conclusion
All of the above designs allow for non-destructive take_while()
and skip_while()
. Peeking can be made possible but it’s not so straightforward to get it to behave like it would for drain_filter
because skipping the elements that don’t fit the predicate needs to be done by hand.
My favorite approach so far would be the streaming iterator, possibly in combination with some cursor like methods on the iter itself. It has the minimum viable API of the three. But it’s blocked on GAT and also requires fixing the performance problems with Drop
proxy objects. In the future, it could be extended to also add elements during iteration (depending on the collection). That would be very cheap e.g. for LinkedLists or Vecs if the hole of removed elements is filled.