I understood “any() is short-circuiting; in other words, it will stop processing” as saying it will stop checking. I was surprised that this also stops a preceding map() retroactively, skipping my intended side effects.
Not saying this is wrong or right – I guess there are use cases for either behaviour. But just like rayon states such behaviour explicitly, so should std!
This is a fundamental part of Iterators being lazy. It's not just an any() behaviour; every single one of the &mut self methods work this way.
In the same way, just calling .next() (obviously) doesn't call all the maps on everything else first. Calling .find(p) doesn't map all the items first. Etc.
Yeah, when you think about it that way, it makes sence. But TBH, it had never crossed my mind. Whenever I needed to abort iteration prematurely, I had till now switched from methods to for loop and break.
It certainly wouldn’t hurt the Rust learning experience to mention it explicitly!
It doesn’t need to be much. Just adding 3 words would clarify it: “any() is short-circuiting; in other words, it will stop processing the whole iteration”
I don’t see how that clarifies anything. What else could “stop processing” possibly mean?
I suppose one could imagine the function buffering all inputs first, and testing them afterwards in a second loop, but in that case adding the words “the whole iteration” wouldn’t help because this would still imply that only the second loop is being aborted early.
I guess @daniel-pfeiffer thought any() would consume the rest of its source iterator, but not evaluate the test predicate for any further items? The possibility of infinite iterators means that this can't be what it does, but it is easy to forget that infinite iterators are a thing. And there are a lot of Iterator trait methods that do consume the entire source iterator.
I think the Iterator::map documentation is too wishy-washy about discouraging map closures with side effects. I kinda want to rewrite the entire thing, along these lines:
fn map<...>(...)
Transform the items yielded by an iterator, by applying a function to each.
The closure argument, f, will be called for each item that the iterator is asked to produce. It does not have to produce an item of the same type as its input; a common use of map is to convert a sequence of objects to some other type.
Because iterators are lazy, f will not necessarily be called for each item that the source iterator can produce. For example, in
let squares: Vec<_> =
(1..).into_iter().map(|n| n * n).take(10).collect()
the closure will only be called 10 times. (This is different from methods like for_each and fold, which always consume all the elements the source iterator can produce.)
map closures generally should not have side effects, because lazy evaluation makes it hard to be sure how many times the closure will be called. When side effects are necessary, a for loop should be used instead.
I think it is wise to avoid “this is scary” language in library documentation, because in my experience, it gives people incorrect doubts about how things actually work concretely (e.g. the number of times the closure is called is not nondeterministic, but this could be read as if it is). In my opinion, a better description would be something like:
map closures generally should not have side effects, since they are called lazily. map closures which have side effects must be written with the understanding that the number of items which will be passed to the closure, and the timing with which those calls occur, is entirely under the control of the consumer of the iterator.
That’s actually surprising; I had thought that iterator adapters always process one element at a time. But it does make sense. (Another example of possibly surprising ordering is applying map() to a double-ended iterator; abcde can be processed as aebdc if the consumer desires.)
So, the warning should be more like:
map closures generally should not have side effects, since whether they are called, how soon they are called, and the order in which input elements are passed, is under almost complete control by the consumer of the iterator.
I still think it’s worth noting that some side effects are fine; e.g. if you are interning values or otherwise registering them in a table and returning the keys, and you don’t care about the exact ordering or that the contents of the table are complete (or you control the consumer of the iterator), then that’s a reasonable use of map(). Not sure how to phrase that clearly and concisely, though.
So yes, let's workshop some improvements to the "laziness" section of the module-level std::iter documentation. But also let's come up with a set of sentences which we can copy and paste one of into the documentation of every single Iterator trait method, expressing an appropriate level of warning about side effects for that method, and referring the reader to the "laziness" section for more detail.
This is a necessary level of repetition for reference manuals. Every day people go directly to https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter, for example, to remind themselves how .filter() works. One of the things they probably need reminding of is that iterators are lazy and this can have surprising implications. If that sentence isn't there, which it isn't right now, they aren't gonna think to jump up to the module level to reread about laziness, even if it would really help.
That's kind of crappy imo. It's manually reminding people, that surprise surprise, myopic reading misses context... this is true for almost every method. Even the most foundational ones have some prerequisite knowledge. Addition of finite-width integers requires knowledge about modular arithmetic after all.
I think this it is a rustdoc issue to surface whether or not the section, item, supertype or module top level has lots of prose one might want to read. Some modules/types are pretty barebones and it would be wasted time to click through. Others have lots of important information one needs to be aware of.
Documentation is primarily used for reference, rather than as a book which is read end-to-end (unlike, for example, the rust book) and it should be written with the expectation that people will use it this way.
Obviously some surrounding context is assumed, but that does not preclude adding reminders of more unusual/counterintuitive properties.
Few languages, especially those which Rust programmers are liable to come from, use lazy iteration to anywhere near the same extent, and so some sort of reminder of this is likely warranted. JavaScript, for example, also has map but there it is an eager operation on arrays. Even in OCaml, the most familiar form of map is an eager operation of lists. A Haskell programmer is familiar with lazy iteration, but may assume Rust doesn’t have it because in Haskell it is an emergent property of global laziness which Rust doesn’t have. Python has lazy iterators but they’re rarely used in practice because list comprehension is a much nicer and more familiar syntax. Go and C don’t have iterator adaptors. C++ is… well, it’s C++. The iterators there are sufficiently different that the reminder is probably warranted anyway.
The case could be made that there should just be a “reminder: rust iterators are lazy. If you’re not confident in what this means, see [link to laziness section]”, but in general, documentation is a reference which should be designed with random access in mind.
That's a strawman, I did not suggest reading it "end to end". I was pointing at a hierarchy of relevant context that should be read, which is a smaller scope.
APIs are a contract. If you don't read the contract then the counterparty may end up doing all kinds of things that will surprise you.
Few languages, especially those which Rust programmers are liable to come from, use lazy iteration to anywhere near the same extent
All the more reason to read the module documentation at least once.
Java Streams are prior art here. Let's look at their filter() method:
I think this is fine. It documents what that method is and does by listing the properties it has and linking to centralized definitions for the properties. It does not recapitulate, which means there is not much wordsmithing that needs to be maintained in N copies.
IMO, this is exactly what the Rust Iterator method documentation should do and currently does not. The documentation for map() mentions “is lazy” in passing but does not call it out as an important property and does not link to a centralized definition.
It's a really basic property of how people think. It is not going to change just because it's inconvenient for us.
One of the things that makes a good technical reference is when it includes all the reminders that anyone might want in the specific context they searched for, and yes, that means a lot of repetition.
Assuming that the Java Streams documentation has a couple of sentences like this for every iterator method, then this is exactly what I was saying we should do! One or two sentences of reminder of everything you might need to know. If you remember what "intermediate operation", "non-interfering", and "stateless" mean, you're good, otherwise the definitions are one click away.