Pre-RFC: Lazy Drain and Self-Exhausting Iterators

Emerentius · March 13, 2018, 5:24pm

Our current drain and drain_filter APIs are partially eager in that they will self-exhaust themselves on drop. This forces two orthogonal behaviours together:

Pick some elements out of a collection
Clear all elements, that match a predicate / are inside a range or the entire collection.

It’s identical in behaviour to a collection.drain_nonexhausting().exhausting() iterator where the Exhausting adapter drives the iter to completion.

This is a proposal to split this functionality up into two APIs, a non-self-exhausting drain and an iterator adapter for self-exhausting iters. Of course, the current drains will stay as-is for backwards compatibility.

~~RFC Text~~
~~Pre-RFC Text: Nonexhausting Drain~~
~~Pre-RFC Text: Exhausting Iter~~
RFC: Non-selfexhausting Drain
RFC: Self-exhausting Iter Adapter

I’d be happy to hear comments and suggestions for improvement. If you dislike either the self-exhausting iterators or the the lazy drain, please bear in mind that they are functionally independent from each other. I’ve put them together here because they are related and I would like to have one place for discussion about their interactions.

scottmcm · March 13, 2018, 7:46pm

Instead of .drain_nonexhausting(), would it be possible to have a fn but_do_not_remove_the_rest_of_them(self) on the Drain types? (With a much better name, of course.) It might be nice to not have to mostly-duplicate the API…

Emerentius · March 13, 2018, 8:16pm

Probably. All of them have access to the underlying collection anyway so they can carry out the repair. Therefore you could definitely construct a nonexhausting drain from an exhausting one. In theory, you could have different behaviour during iteration for a lazy drain and and an eager one. The nonexhausting drain may want to make rebuilding of the collection’s internal structure cheaper with some on the fly changes. But if the lazy drain must be buildable from the eager one at any point, then that limits those differences. I don’t know if that could be useful for the map collections. It’s certainly not the case for Vec, VecDeque and BinaryHeap.

troiganto · March 14, 2018, 3:03pm

Note that the Drain iterator is eager to guarantee memory safety, The actual “exhaustion” does not take place in Drain::drop(), but even before iteration starts. (see here) This is because leaking variables is considered safe, so types must not rely on their destructor to ensure memory safety.

That means a non-exhausting variant would have to find a way to repair its backing storage that doesn’t rely on drop() being called.

Emerentius · March 14, 2018, 3:19pm

That is leak amplificatin in case the Drain is leaked. A lazy drain needs to do the same thing, but the self-exhaustion on drop is a different matter and not necessary for safety.

troiganto · March 14, 2018, 3:39pm

So what exactly would the non-exhausting version of Drain do with its remaining items on drop? As I understand things, they cannot remain inside the backing collection due to leak amplification. Would they simply get leaked?

Emerentius · March 14, 2018, 5:28pm

During iteration the collection is in an invalid state. Leak amplification makes that unobservable in case the Drain is leaked and the repair code isn’t run. There is nothing more to it.

Take for example a drain(start..end) on Vec. When you construct it, everything in vec[start..] is forgotten (no destructor will be run) and the vec is truncated. As you iterate over the items, you get them back out one by one. After iterating (including the iterating inside drop) The remaining elements are shifted into the now empty range and the Vec's len is set to the correct value again. You “unleak” the remaining elements. It doesn’t matter where you stop, you just have to fill the hole.

How you repair the structure differs from collection to collection, but bear in mind that .drain_filter() exists and what you can do with that already.

scottmcm · March 15, 2018, 2:26am

That's not quite right. The leak amplification happens there, but the actual exhaustion is definitely in Drop: https://doc.rust-lang.org/src/alloc/vec.rs.html#2488

This ought to be a pretty easy method to add. As a sketch, I think it'd be something like this:

impl<'a, T> Drain<'a, T> {
    fn keep_remainder(self) { 
        // Move the yet-to-be-iterated range into the "tail to keep"
        self.tail_start -= self.iter.len();
        self.tail_len += self.iter.len();
        // Make sure dropping ourself doesn't re-iterate those
        self.iter = [].iter();
    }
}

troiganto · March 15, 2018, 1:05pm

Alright, thanks for your patient explanations, @scottmcm and @Emerentius. I’ve got a better understanding of how Drain works and what motivates this pre-RFC now.

Do you think you could prepend this background knowledge to the Motivation section of the RFC? It’d probably help other people who’re not very familiar with the technicalities of drain() judge the suggested additions better.

Now understanding the issue better, I’ve read through the RFC text once more. The addition of the non-exhausting methods seems fine, but I do get the strong feeling that leaving .exhausting() out might be a good idea. All drawbacks and unresolved questions are concerned with it and there doesn’t seem to be a particularly strong motivation for it beyond “for completeness’ sake”. (And, as you’ve rightly said, “there is no interdependency” between it and drain_nonexhausting()). If .exhausting() turns out to be desirable, it can always be suggested at a later time in a separate RFC.

scottmcm · March 15, 2018, 7:46pm

A related PR about exhaustion methods; the response to this PR may be interesting to you:

Emerentius · March 15, 2018, 8:21pm

It’s not the first time exhaust() was proposed and its sole use is always just in less writing (clarity). Prior to for_each I could understand the appeal but now it’s just an alias for for_each(drop).

Exhausting’s raison d’être is the delayed self-exhaustion so the iterator remain usable but is guaranteed to be emptied (unless leaked). There is no other adapter that can emulate this.

It would act as .exhaust() if you immediately drop it, though I don’t count that as a good use.

Unless there are performance differences between drain and drain_nonexhausting, we could use Exhausting to basically define Drain internally as Exhausting(DrainNonexhausting)

troiganto · March 16, 2018, 7:13am

Considering its niche use case, would it make sense to add .exhausting() to itertools instead of the standard library?

Emerentius · March 16, 2018, 12:18pm

The RFC mentions that. What I’d like to clarify first is how we should handle a few subtle edge cases. Mainly, that is how we handle panic-ing iterators. Panicking is safe and leaking is safe but we still don’t want to leak and abort willy-nilly. Currently, all drains embed the logic of Exhausting in its simplest form, just for _ in &mut self on drop.

DrainFilter executes user code during iteration and can therefore easily cause a double panic and thus an abort if the iteration panicked first outside of Drop and then again inside Drop. Long running programs, e.g. a server, can employ threads and catching unwinds to guard against program bugs from tearing down the process. A panic during panic goes past that and will stop your entire program.

In another case of a single panic during drop, DrainFilter will leak all remaining items. An Exhausting(DrainFilterNonexhausting) will stop the exhaustion but then repair the collection and allow the remainder to be dropped normally. This difference will be there regardless of how you handle panics in Exhausting.

We can put exhausting() in itertools but then std will either not deal with these edge cases or it will have to reimplement the behavior in a private interface.

Emerentius · March 18, 2018, 2:50pm

I’ve split up the RFC in two:

Nonexhausting Drain
Exhausting Iter

And I’m probably going to post them as such. Got any more comments to spare? Style-wise and grammar, too.

troiganto · March 18, 2018, 3:42pm

Nope, they look pretty good to me!

system · March 25, 2019, 8:29am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Drain API Design libs	4	946	March 25, 2019
Pre-RFC: Range trait (For the drain-range feature) libs	12	2942	March 25, 2019
[Pre-RFC] Retain Iterators API for HashSet/HashMap/Vec/VecDeque libs	13	1789	March 25, 2019
[crazy] Replace Most of the Collections API with ranges and iterators libs	12	4526	March 25, 2019
Iterators v2 libs	9	1038	May 8, 2021

Pre-RFC: Lazy Drain and Self-Exhausting Iterators

Related topics