[API design question]: Why Iterator::nth() consumes the nth element?

Folyd · May 9, 2020, 3:36pm

Below is the source code and documentation of the Iterator::nth() method.

    #[inline]
    #[stable(feature = "rust1", since = "1.0.0")]
    fn nth(&mut self, mut n: usize) -> Option<Self::Item> {
        for x in self {
            if n == 0 {
                return Some(x);
            }
            n -= 1;
        }
        None
    }

Returns the n th element of the iterator.

Like most indexing operations, the count starts from zero, so nth(0) returns the first value, nth(1) the second, and so on.

Note that all preceding elements, as well as the returned element, will be consumed from the iterator. That means that the preceding elements will be discarded, and also that calling nth(0) multiple times on the same iterator will return different elements.

nth() will return None if n is greater than or equal to the length of the iterator.

Let's said, I don't know this API's design principle, why we need to consume the nth element? It's necessary or ever contrived? Especially, when we index the nth char of &str ( by using chars() method to get the iterator), we cannot re-index the nth one afterward, since it consumed in the previous step.

Sorry, I have no experience in language API design, maybe there are some trivial benefits from this design that I never aware of! I appreciate any comment and explanation to this question. Thanks!

Ixrec · May 9, 2020, 4:14pm

I believe the short answer is that iterators are a very generic interface, and in some important cases we want to be iterable, there's no way to advance the iterator without consuming each item.

Most of these cases involve networking or other I/O where either a) the items being iterated over are being delivered from some external source and do not live in local memory (unless you allocate some memory for them, usually called a buffer) or b) the iteration/access is expensive and random-access indexing would require a much more expensive implementation.

Or to put it another way: not all iterable things are indexable. Indexing is a much richer interface with more guarantees (like being able to re-index as many times as you like) that consequently permits fewer implementations. And in the specific case of Rust, this extra freedom is why indexing usually requires runtime bounds checks while iterating usually does not.

(EDIT: nice, we all ended up giving you short, medium and long versions of the answer to choose from )

kornel · May 9, 2020, 4:14pm

Iterator is an abstract interface that has to work for all cases, not just the simple ones.

There are cases where iterator owns its items, and due to single ownership principle, it can return each item only exactly once.

Rust doesn't do implicit clever things with memory, so it won't do anything (such as adding a refcount or a flag) to help iterator return things more than once.

steffahn · May 9, 2020, 4:18pm

Well, “trivial benefit” is perhaps even an understatement, the TLDR would be: “it’s impossible for .nth(n) to have a non-consuming behavior on Iterator”. The reason is a “mathematical” kind of argument. The relevant question to ask here is: “what is an str::iter::Iterator”?

If you’re familiar with traits in Rust, you’ll know there’s two sides to traits. They are used on generic parameters, e.g. in a function foo<T: Iterator>(x: T) … to add a “constraint” on T. This constrains any uses foo(y) to ys of a type that implements Iterator. On the other side there is the definition of the function foo. In the body of foo the “constraint” is more of a gain, enabling all the methods that Iterator comes with to be called on the parameter x.

It’s generally an all-or-nothing kind of thing: Either a type implements Iterator which means there needs to be a definition for all the methods of Iterator or it doesn’t, which means you can’t call your foo on it, even when the methods the type cannot support are not even used in foo.

As to not to force users of a trait, like the person defining foo, to over-constrain their generic parameters, traits are usually defined to be quite minimalist actually: In essence this is true for Iterator as well. Most methods of Iterator, including nth, really only are part of the trait to (a) allow nice method-syntax and (b) to allow a type to provide a more efficient implementation than the default one, if the type allows it. But conceptually, Iterator has really only one method: fn next(&mut self) -> Option<Self::Item>.

This is important to keep in mind when thinking about the design of the Iterator API: Any type that defines next with the correct signature is a full-fledged iterator. Every additional method in the API must be definable in terms of next.

Now we can start to get mathematical, my initial statement becomes “it’s impossible for .nth(n) to have a non-consuming behavior on Iterator, because such an .nth(n) cannot be defined in terms of .next() only”. The exact argumentation as to why that is true would usually be done only on an intuition level:

The method .nth(n) should return the nth element. In order to even “get to” the nth element, you need to call .next() at least n times, but as soon as you’ve done that, you already consumed at least the first n elements (since .next() does consume the next element).

With this in mind (“you have to consume at least the first n elements”) then, according to its documentation .nth(n) takes the least destructive approach possible: It stating “all preceding elements, as well as the returned element, will be consumed” means that exactly the first n elements are consumed, in other words: as few elements as possible are consumed (one could imagine as an alternative that e.g. the whole iterator could be consumed, too).

bjorn3 · May 9, 2020, 4:29pm

Indexing chars on a string has to be O(n). This is because in Rust strings are stored in UTF-8. UTF-8 uses different amounts of bytes for different characters. Because of this to get the byte offset of the nth character, you have to calculate the total byte length of all n-1 previous characters. This is O(n).

steffahn · May 9, 2020, 4:57pm

You actually can re-index a &str: A &str can be trivially copied. The way to go is to just start over and create a second iterator if you want to re-index from the start again:

fn two_indices(s: &str, n: usize, m: usize) -> Option<(char,char)>  {
    let x = s.chars().nth(n)?;
    let y = s.chars().nth(m)?;
    Some((x,y))
}

Of course, in the case of &str, it would be more efficient to do this in a single iteration.

scottmcm · May 9, 2020, 8:29pm

What I encourage you to do is try to write a generic function that does what you want using the existing Iterator API. That will quickly get you to what steffahn has been talking about: you'll find that you need an extra bound beyond the basic Iterator.

It will probably also bring up a bunch of the questions that you might not have considered yet, like whether it should consume 0 elements or n-1 elements? Should it take a &impl Iterator or a &mut impl iterator (or an impl Iterator that might come from Iterator::by_ref)? Does it need to return the item type by value, or is by reference enough (to be able to work on Peekable)? ...

system · August 7, 2020, 8:30pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
`Iterator::nth` is badly named libs	41	2760	November 30, 2021
Iterator::advance (again) language design	3	1457	January 7, 2021
[Pre-RFC] Iter advance language design	16	1171	May 11, 2019
Get next `n` items from an iterator with const generics	5	1565	March 8, 2021
Thoughts about Iterators - can we have move-or-mutate semantic for functions? language design	6	1119	March 25, 2019

[API design question]: Why Iterator::nth() consumes the nth element?

Related topics