Iterator methods count and last

ssomers · December 12, 2021, 2:58pm

The description of Iterator::last says it “will evaluate the iterator until it returns None”. Many (double-ended) implementations don't oblige but directly return the last element, I guess because they know evaluating that particular iterator has no easily observable side effect and to avoid the side effect of wasting computing power. But why should I guess? Shouldn't the description mention the likely possibility that the function describing itself as O(n) is O(1) instead?

The description of Iterator::count says the same thing about evaluating, but few iterators override their count method, even though many could, being ExactSizeIterator. Are count() and len() not always equal? Why does ExactSizeIterator not equate count to len? The ones I found that do define count apply to slicy things including Vec and Vec::IntoIter which only defines count but not len and apparently relies on the default len implementation that uses size_hint and an assertion.

steffahn · December 12, 2021, 3:20pm

count also consumes the whole iterator (evaluating all its items if it's lazy), len doesn't.

Iterators on slices can optimize this without changing behavior, because evaluating and dropping all the items doesn't have any side-effects.

Intuitively I'd say that the properties that an iterator must have in order to be able to optimize count vs. being able to optimize last are quite similar. Your post seems to suggest that many iterators have a custom last while few iterators have a custom count. Perhaps you could list some examples of iterators that optimize last but not count, so we could discuss whether a better count implementation might be missing?

ssomers · December 12, 2021, 4:33pm

Maybe "equate" is not the right word, but surely count can always return the value of returned by len, in ExactSizeIterator? Just like last does next_back and also consumes, when there is no side effect.

Now you mention dropping… that seems like an observable side effect. Doesn't the description tell me that into_iter().rev().last() will drop all elements in reverse order, and if last() shortcuts, it would somehow drop all other elements in batch, probably in an unexpected order. But that doesn't happen because these IntoIter don't redefine last.

Intuitively I'd say that the properties that an iterator must have in order to be able to optimize count vs. being able to optimize len

Did you mean last instead of len?

The iterators I looked in vain for a count are the "ordinary" btree iterators (Iter and IterMut), same in its alloc cousins LinkedList and VecDeque, and the distant cousin HashSet in std.

steffahn · December 12, 2021, 4:40pm

My main point was that a ExactSizeIterator can still have side effects, you cannot generally let count call (or the the same as) len on a ExactSizeIterator. Your statement seemed to be asking in general, for all ExactSizeIterator implementation, why the two aren't "equated".

I'm not sure what your conclusion is here. Looks like everything works as expected and last doesn't shortcut in this case?

Ah, yes indeed!

steffahn · December 12, 2021, 4:43pm

Looks like those would indeed benefit from a custom count implementation. Feel free to open an issue and/or a PR, I guess?

ssomers · December 12, 2021, 6:11pm

I still struggle with that. If count were defined generally by ExactSizeIterator, it would call len, drop the iterator, then return the result. For an iterator with side effects, such as an IntoIter dropping elements from the collection, this would still perform the side effects in the same order as announced by the iterator, right? No, I think that's where I'm wrong. Since an iterator is supposed to be lazy, it shouldn't be iterating when it's dropped, so it doesn't promise to perform the rest of the side effects in any particular order. The caller of count doesn't just want a number, in fact may discard that number, they want to see side effects performed in the iterator's promised order.

Yes, it works fine. My conclusion was that this is because these IntoIter don't redefine last, but that is wrong. The last here is the default implementation of Iterator because iterator adapter Rev doesn't redefine it (and couldn't).

In fact, I now believe it would be fine to redefine last in pretty much any double-ended IntoIter because it would perform next_back, drop the rest of the collection, which generally happens in the forward direction anyway, and return the last element. I don't suggest there would be any benefit to that.

steffahn · December 12, 2021, 6:24pm

Or at all, for that matter. Dropping an iterator.map(|x| foo(x)) will just not call foo on the remaining elements at all.

Yes, that's true.

That's the case, and there's no way for an iterator to optimize this, e.g. there's no last_back or whatever it would need to be called on DoubleEndedIterator that could be overwritten.

No that would have the same problem with the difference between just dropping an iterator vs. consuming all of its items. For e.g. a iterator.map(|x| foo(x)), this approach would result in different behavior, because next_back doesn't evaluate foo on all the other items, and neither does dropping the iterator. (By the way, the question how good/bad it is that Map<_> iterators can/will change the order of side-effects when you use their DoubleEndedIterator implementation is a separate discussion.)

ssomers · December 12, 2021, 7:57pm

Well I contest that, for the typical IntoIter of a collection.

…it wouldn't matter because that Map adapter doesn't define last and never sees the last of the underlying IntoIter. Thus collection.into_iter().map(|x| foo(x).last() doesn't cut anything short.

collection.into_iter().last() would start cutting short, but a typical IntoIter knows that its drop handler will drop elements in the same forward order as the current collection.into_iter().last() does.

By the same reasoning, on such an IntoIter, defining count to return len would also not change anything observable. It wouldn't help anyone, nor hurt.

scottmcm · December 12, 2021, 8:07pm

Note that there's no observable side-effect from iterating, say, a Range<i32> or slice::Iter<'_, _>. That's why it's ok for them to do this.

Well, O(1) is O(n), but that's a nit.

More importantly, I don't know of any place where you should ever call .last() on an iterator expecting it to be O(1) -- you should use .next_back() instead in those situations.

steffahn · December 12, 2021, 8:39pm

Oh, I totally misinterpreted your sentence. You didn't say "pretty much any double-ended iterator", you were talking about iterators called "IntoIter":

The rest of my explanation, in particular the example with iterator.map(|x| foo(x)) assumed you said something that you didn't say.

ssomers · December 16, 2021, 10:56am

There we go: Specialize count and len in ExactSizeIterator implementations by ssomers · Pull Request #91998 · rust-lang/rust · GitHub

system · March 16, 2022, 10:56am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
`ExactSizeIterator` and combinators libs	5	550	September 24, 2022
[Pre-RFC] Change the ExactSizeIterator spec ideas (deprecated)	6	1528	March 25, 2019
Should we implement ExactSizeIterator for Chain? libs	10	2275	September 17, 2020
`ExactSizeIterator` for `slice::Iter`? libs	7	221	September 4, 2024
The Trusted Iterator Length Problem	29	7279	March 25, 2019

Iterator methods count and last

Related topics