Pre-RFC: Lazy Drain and Self-Exhausting Iterators

Our current drain and drain_filter APIs are partially eager in that they will self-exhaust themselves on drop. This forces two orthogonal behaviours together:

  1. Pick some elements out of a collection
  2. Clear all elements, that match a predicate / are inside a range or the entire collection.

It’s identical in behaviour to a collection.drain_nonexhausting().exhausting() iterator where the Exhausting adapter drives the iter to completion.

This is a proposal to split this functionality up into two APIs, a non-self-exhausting drain and an iterator adapter for self-exhausting iters. Of course, the current drains will stay as-is for backwards compatibility.

RFC Text
Pre-RFC Text: Nonexhausting Drain
Pre-RFC Text: Exhausting Iter
RFC: Non-selfexhausting Drain
RFC: Self-exhausting Iter Adapter

I’d be happy to hear comments and suggestions for improvement. If you dislike either the self-exhausting iterators or the the lazy drain, please bear in mind that they are functionally independent from each other. I’ve put them together here because they are related and I would like to have one place for discussion about their interactions.

Instead of .drain_nonexhausting(), would it be possible to have a fn but_do_not_remove_the_rest_of_them(self) on the Drain types? (With a much better name, of course.) It might be nice to not have to mostly-duplicate the API…

Probably. All of them have access to the underlying collection anyway so they can carry out the repair. Therefore you could definitely construct a nonexhausting drain from an exhausting one. In theory, you could have different behaviour during iteration for a lazy drain and and an eager one. The nonexhausting drain may want to make rebuilding of the collection’s internal structure cheaper with some on the fly changes. But if the lazy drain must be buildable from the eager one at any point, then that limits those differences. I don’t know if that could be useful for the map collections. It’s certainly not the case for Vec, VecDeque and BinaryHeap.

Note that the Drain iterator is eager to guarantee memory safety, The actual “exhaustion” does not take place in Drain::drop(), but even before iteration starts. (see here) This is because leaking variables is considered safe, so types must not rely on their destructor to ensure memory safety.

That means a non-exhausting variant would have to find a way to repair its backing storage that doesn’t rely on drop() being called.

That is leak amplificatin in case the Drain is leaked. A lazy drain needs to do the same thing, but the self-exhaustion on drop is a different matter and not necessary for safety.

So what exactly would the non-exhausting version of Drain do with its remaining items on drop? As I understand things, they cannot remain inside the backing collection due to leak amplification. Would they simply get leaked?

During iteration the collection is in an invalid state. Leak amplification makes that unobservable in case the Drain is leaked and the repair code isn’t run. There is nothing more to it.

Take for example a drain(start..end) on Vec. When you construct it, everything in vec[start..] is forgotten (no destructor will be run) and the vec is truncated. As you iterate over the items, you get them back out one by one. After iterating (including the iterating inside drop) The remaining elements are shifted into the now empty range and the Vec's len is set to the correct value again. You “unleak” the remaining elements. It doesn’t matter where you stop, you just have to fill the hole.

How you repair the structure differs from collection to collection, but bear in mind that .drain_filter() exists and what you can do with that already.

1 Like

That's not quite right. The leak amplification happens there, but the actual exhaustion is definitely in Drop: https://doc.rust-lang.org/src/alloc/vec.rs.html#2488

This ought to be a pretty easy method to add. As a sketch, I think it'd be something like this:

impl<'a, T> Drain<'a, T> {
    fn keep_remainder(self) { 
        // Move the yet-to-be-iterated range into the "tail to keep"
        self.tail_start -= self.iter.len();
        self.tail_len += self.iter.len();
        // Make sure dropping ourself doesn't re-iterate those
        self.iter = [].iter();
    }
}
1 Like

Alright, thanks for your patient explanations, @scottmcm and @Emerentius. I’ve got a better understanding of how Drain works and what motivates this pre-RFC now. :smiley:

Do you think you could prepend this background knowledge to the Motivation section of the RFC? It’d probably help other people who’re not very familiar with the technicalities of drain() judge the suggested additions better.

Now understanding the issue better, I’ve read through the RFC text once more. The addition of the non-exhausting methods seems fine, but I do get the strong feeling that leaving .exhausting() out might be a good idea. All drawbacks and unresolved questions are concerned with it and there doesn’t seem to be a particularly strong motivation for it beyond “for completeness’ sake”. (And, as you’ve rightly said, “there is no interdependency” between it and drain_nonexhausting()). If .exhausting() turns out to be desirable, it can always be suggested at a later time in a separate RFC.

A related PR about exhaustion methods; the response to this PR may be interesting to you:

It’s not the first time exhaust() was proposed and its sole use is always just in less writing (clarity). Prior to for_each I could understand the appeal but now it’s just an alias for for_each(drop).

Exhausting’s raison d’être is the delayed self-exhaustion so the iterator remain usable but is guaranteed to be emptied (unless leaked). There is no other adapter that can emulate this.

It would act as .exhaust() if you immediately drop it, though I don’t count that as a good use.

Unless there are performance differences between drain and drain_nonexhausting, we could use Exhausting to basically define Drain internally as Exhausting(DrainNonexhausting)

Considering its niche use case, would it make sense to add .exhausting() to itertools instead of the standard library?

The RFC mentions that. What I’d like to clarify first is how we should handle a few subtle edge cases. Mainly, that is how we handle panic-ing iterators. Panicking is safe and leaking is safe but we still don’t want to leak and abort willy-nilly. Currently, all drains embed the logic of Exhausting in its simplest form, just for _ in &mut self on drop.

DrainFilter executes user code during iteration and can therefore easily cause a double panic and thus an abort if the iteration panicked first outside of Drop and then again inside Drop. Long running programs, e.g. a server, can employ threads and catching unwinds to guard against program bugs from tearing down the process. A panic during panic goes past that and will stop your entire program.

In another case of a single panic during drop, DrainFilter will leak all remaining items. An Exhausting(DrainFilterNonexhausting) will stop the exhaustion but then repair the collection and allow the remainder to be dropped normally. This difference will be there regardless of how you handle panics in Exhausting.

We can put exhausting() in itertools but then std will either not deal with these edge cases or it will have to reimplement the behavior in a private interface.

I’ve split up the RFC in two:

Nonexhausting Drain
Exhausting Iter

And I’m probably going to post them as such. Got any more comments to spare? Style-wise and grammar, too.

Nope, they look pretty good to me! :+1:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.