Feature request: PureIterator trait

I found out that the std::iter::Rev iterator can't implement .last() by returning self.iter.next() because the iterator may have side effects for all the skipped elements (for example, it might have .inspect(…) side effects for all the skipped elements, and those unexpectedly wouldn't happen).

I'd like to propose a PureIterator (or NoSideEffectsIterator) trait that, like ExactSizeIterator or FusedIterator, lets the rest of the standard library know that it can use certain optimizations that can only be performed on iterators that have no side effects (other than advancing which element they'll produce on the next .next()/.next_back() call).

Iterator adapters that preserve the purity (no side effects) of a pure underlying iterator would also be marked as PureIterators.

This way, things in the standard library like std::iter::Rev can implement specialized optimized versions of methods like .last().

2 Likes

For some adapters that would be easy enough to implement, but to get the maximum benefit out of that we'd need a way to specialize on adapters containing pure closures.

can't implement .last() by returning self.iter.next()

I don't think PureIterator would be sufficient for that because last() normally exhausts an iterator and that can be observed via by_ref() iterators. Just taking a &mut reference to an iterator itself is pure. So it'd require some stronger property than just purity to enable that.

I have had a bunch of similar optimization ideas that have been NACKed by the libs team because they eliminate side-effects.

I think it might be possible to add a defaulted generic parameter to the Iterator trait that signals whether one cares about side-effects and then propagate that parameter into the adapters. which could then be opted into via iter.yoIo()you only iterate once. And if we feel especially daring it could even be made the default at some edition boundary. But it would require quite a bit of plumbing for some niche optimizations so I'm doubtful whether it would be accepted.

By purity, I do not mean a change of which elements in the iterator have been consumed. So, when .by_ref().last() exhausts the underlying iterator's elements, that would be a side effect, but of the kind we're disregarding.

I mean something more like .inspect(…) potentially producing side effects for every element in the iterator, which wouldn't allow, for example, skipping to the end element directly for .last().

Instead, PureIterator would apply only to those iterators where things like skipping a bunch of elements all at once doesn't have missing side effects that should have happened.

Disregarding these effects might cause the optimization not to be accepted by reviewers.

So I think PureIterator needs to make additional promises that rule this out, turning it into an UnobservableEffectsIterator.

Yeah, I think the name or concept might need some work to narrow down exactly what it refers to.

I mean something like an iterator that contains items in the same way a Vec would. When you .remove(0) a Vec item, the item itself doesn't get informed, in a manner of speaking, that you removed it so that it can have side effects. .next() on one of these suggested iterators would act just like that.

Similarly, iterating in different orders would not change the expected elements. For example, you wouldn't get a different element by taking the fifth .next() return value versus .nth(4) or using .step_by(4) or the proper number of .next_back() calls or whatever. The Iterator is effectively just like a Vec's iterator. The items don't have side effects simply from being produced by the iterator. The nth item in the iterator will always be the same, no matter how you iterate to it.

It would probably need to be a subtrait of FusedIterator as well.

Yes, but you also get the guarantee that no other element is removed. .clear() also removes the first element, but has other side-effects on the whole Vec.

Likewise, last guarantees to evaluate the whole iterator until it returns None, which is a side effect on the whole iterator.

That's true about those methods. You said 'but', but I'm not seeing how you're disagreeing with what I said.

The basic idea is that the iterators I'm referring to should basically act like a_vec.iter() in several ways that allow for optimizations.

I wasn't at all suggesting that .last() should act in some strange way that doesn't eliminate all the elements from the iterator.

An alternative optimization for rev.().last() is implementing it on top of [advance_by()]. Since Rev requires ExactSizeIterator one can do

fn last(self) -> Option<Self::Item> {
   self.advance_by(self.len().saturating_sub(1)).expect("ExactSizeIterator requires the length to be correct");
   self.next()
}

For pure adapters like Copied advance_by is passed through and for slice::Iter it's simple pointer arithmetic so it ends up being O(1). Though there aren't many pure adapters that can do this.

This also composes a bit better than UnobservableEffectsIterator because advance_by can be (and is) implemented for by_ref() because advancing does preserve that one side-effect.

This doesn't work for all collections though. Implementing a best-case O(1) advance_by for BtreeMap iterators would be more difficult.

Rev does not require ExactSizeIterator, it requires DoubleEndedIterator. This optimization also is not an optimization when .advance_by(n) requires a lot more work to advance through the entire iterator compared to .next_back() advancing by one element.

For example, let's say you have a string of words put together without separating spaces. You want to iterate through all separations of the string into dictionary words in lexicographic order.

There are somewhat efficient ways to get the first or the last elements remaining in the iterator. There's no efficient way to tell how many elements are in the iterator without iterating through it, and, even if getting the length was very fast, .advance_by(n) would take much more work than .next_back() because the solutions come sort of unpredictably and so .advance_by(n) requires doing almost all the work of generating the next n elements of the iterator to make sure the iterator's internal state advances by exactly n elements.

Rev does not require ExactSizeIterator, it requires DoubleEndedIterator.

My bad, I was thinking of Skip.

For example, let's say you have a string of words put together without separating spaces. You want to iterate through all separations of the string into dictionary words in lexicographic order.

Well, that sounds like something a 3rd party crate would be doing. Doing specializations within std is one thing, offering them to 3rd party code would be a bigger deal. There are tons of properties that an iterator can promise and it's not clear which ones we should codify and which not.

I don't think .rev().last() itself is worth the new trait, not even an internal one unless someone can show relevant examples in the wild where this would yield significant performance wins. Or some juicier optimizations. Given the current motivational example I'd just recommend using next_back() instead of last().

Generic code that works for all iterators isn't going to know that you even have a DoubleEndedIterator so that .next_back() is available, and, outside of std, you can't really specialize things based on whether or not a trait is implemented. When such generic code needs to get the last element, it can only use .last() or an equivalent procedure using only methods on Iterator itself.

This isn't just for .last(), though. That was just an example used because iterating through all elements versus one of them is an obviously beneficial optimization.

There have been several proposed optimizations that have been rejected because the relevant kind of side effects must be preserved. If an iterator implements this new trait, though, that would make those optimizations valid, and std's iterator adapters could from then on use the underlying optimized versions of methods.

For example, Rev alone only implements seven Iterator methods. It can't, for example, provide a .min() method that calls the underlying iterator's .min() method, even though, with my earlier example, a lexicographically sorted underlying iterator can have a very optimized .min() method provided.

That can't be done, though, because the potential relevant side effects would happen in the reverse order to what would be expected, and that can't be allowed.

This is essentially throwing away a lot of optimizations built into underlying iterators whenever the iterators are wrapped with Rev or Enumerate or any other iterator adapter in std.

It fails to provide the "zero cost" benefits that Rust promises whenever an iterator adapter from std is used.

It also robs optimizations in the std iterator adapters themselves. For example, iter.enumerate() has a very obvious .min() value: .next(), because tuples are compared lexicographically and the first item in the tuple is the enumerated index, which is in ascending order.

It cannot implement this optimization, though, because, again, it might be expected that .min() traverses the whole iterator, and so a programmer might expect it to perform the relevant side effects of every item in the iterator.

If there are no such relevant side effects, then using the optimization will be invisible to the programmer, violating none of their expectations for the behavior of the program.

1 Like

Those still seem to be minor optimizations to me, especially in generic code that isn't too fuzzy about taking last() from arbitrary sources for example. That's (hopefully) not the hottest loop in some application. Bigger ones would be about eliding side-effects on map or inspect which can be arbitrarily expensive rather than just stepping along some cursor in a data structure. But we can't skip those until pure closures can be detected, which this trait doesn't enable.

I generally like the idea of having a PureIterator (or similar) trait for optimization purposes. I'm just saying that making it part of the public API will have to meet a higher bar than some internal optimizations.

Some real code samples that would measurably benefit from this could help.

This isn't merely about internal optimizations like .min() using .next() in Enumerate, this is about external optimizations as well based solely on adding some new iterator adapter methods that pass through to the underlying iterator's optimized methods.

The example I gave above about splitting a string without spaces into dictionary words is a crate that I'm working on. I've got Iterator implemented with some nice optimizations of its own in non-.next() iterator methods. I'm planning on implementing DoubleEndedIterator, but I realized that when I optimize .last() to .next_back() after that's done, that nice optimization will only exist if no one applies any std iterator adapters whatsoever, which is nowhere near a guarantee. This can ruin performance for a seemingly innocuous reason that a lot of programmers aren't going to suspect, particularly on very long strings that need to be split into words, which might have a very large number of solutions.

It also allows for external optimizations like noticing that your iterator adapter in your crate adds no new side effects and so you do this:

impl<I> ThisTraitName for IteratorAdapterInThisCrate<I>
where
    I: ThisTraitName
{}

This will be limited for now, of course, since external iterator adapters cannot specialize based on traits being implemented or not, and so it might be incorrect to have certain passthrough methods on wrapped iterators when you don't know whether they have side effects or not.

I was looking to submit a feature request to the Rust GitHub repository, and I found this request for the exact suggestion I started with. Following a few links through to another issue, optimizations like these (without a trait like this) were discussed favorably in the past, but were ultimately rejected specifically in order to preserve the behavior of side effects:

Ok, thanks for the patience here! The libs team talked about this and the conclusion was that these iterator adaptor methods should always preserve the same semantics as the current default implementations, specifically with respect to side effects that may be seen.

In light of that I'm going to close this as "unfortunately we can't do this", and otherwise I've opened a tracking issue for auditing those changes we've already made in the standard library.

— alexcrichton, issue 28125, 2 October 2015

This would remove that objection because the underlying iterator or iterator adapter author would be effectively certifying that no side effects would be observable.