New function suggestion: iter::take_exactly

The std function iter.take(n) produces an iterator which takes n elements from iter.

Except, that's not what takes does -- it limits the length of the iterator to n, but it may still return fewer elements if iter did not have n elements to produce.

I personally usually want n elements, and less would be a serious error. Therefore I suggest take_exactly, which produces n elements, or panics if less than n elements are contained in iter.

Having a quick look around on github, I see many places where users are assuming they will get exactly n elements from the iterator when using take. It might be all those pieces of code are correct, but if not these pieces of code will silently truncate, rather than give an out-of-bounds (like you would get if you indexed an array out of bounds).

1 Like

What would the type of the method be? Would it check the size_hint() and return Result from take_exactly()? Or would it return Result<Option<T>, _> from the resulting iterator?

1 Like

Sounds like something that would be appropriate to start in itertools or a separate crate.

6 Likes

I agree with @carols10cents here, I thought that it might try to consume n items and produce a new iterator or None. But in a separate crate is probably best.

My thought would be the behaviour would be identical to take, except it will panic if there are not enough elements.

My mental model is turning code like (sorry if I get these wrong, not tested): for i in 0..len { v[i] += 1; } into v.iter().take(len).map(|&x| x+1); -- clippy, and some blogs I've seen, suggest changes like this. When there are not len elements in v something bad has happened, I just don't want the take to silently truncate.

An important question is this:

  • Unless the iterator is ExactSizedIterator, the size hint is not guaranteed to return anything usefully checkable.
  • It would thus be possible to defer the panic until after the iterator is prematurely exhausted. However, this seems like a debugging nightmare, because many iterators are consumed one-by-one, with significant and complicated logic between iterations.
  • It would also be possible to collect everything upfront into a Vec and check its length. This wouldn't work for infinite iterators, though, and it would also incur an allocation.
  • What about the case when there are more items in the iterator than requested? Surely that doesn't fulfil the requirement of "exactly N" elements in the iterator. Should this also cause a panic?
3 Likes

I think the only sensible option is to wait until the iterator is exhausted, and then panic, because as you say, iterators might not even know how many values they will return until they do, for example of they are attached to a network, or user input. The existing code for Take works in this way, using a counter and returning None once the counter is exhausted.

If people are opposed to panic, we could return a Result, which one could of course just unwrap (which would immediately panic).

I think there is a point in what the OP brings up, but having a panicky method should have a very nice name that expresses that. Given that the _exactly part does not apply for the "too many elements", as @djc points out, I personally think this is rather an issue with take not being named take_at_most. This could be fixed by adding that method with a default implementation (maybe through a super-trait to avoid overriding the impl) that'd defer to take, which would be deprecated.

Then, for those really wanting such a panicking variant, either they use ::itertools, or re-write the trivial implementation themselves:

- iterator.take_at_most(n)
+ (0 .. n).map(move |_| iterator.next().unwrap())
// or are the desired semantics to yield an `Option<Iterator>`?

As @H2CO3 points out, the different approaches have non-negligible semantic differences, so it's one of those cases, imho, where being (too) explicit is better than being too terse, at least as far as the std lib is concerned (::itertools can pick a choice and document it well).


FWIW, in the (very common) case of slices, writing v[.. len].iter().map(ā€¦)is both idiomatic and expresses what you want :slightly_smiling_face: (with [.. len].iter_mut() and {vec}.drain(.. len) for the by-mut and owned versions).

4 Likes

@dhm: Thanks, I agree that take was a bad choice of name, but I imagine changing names is hard -- adding take_at_most and deprecating take would at least make clear what the function does.

1 Like

The docs are already pretty clear on what the function does:

If less than n elements are available, take will limit itself to the size of the underlying iterator:

The name of the function is not a description of its behavior, it is a vague handle indicating it. last doesn't tell you if it were the last emitted or the one at the end. filter doesn't tell you if it includes or excludes elements on the predicate. You have to read the docs to know what a function does.

2 Likes

Just saying, the name ā€œtakeā€ is used for a long time in functional programming. Well at least in Haskell, and some more.

e.g.
take (Haskell)
take (Clojure)
take (Kotlin)

I did however find for example racket that does have a take function that fails if there are too few elements.

The Kotlin method is pretty badly documented (it doesnā€™t mention the case where n elements arenā€™t available; though you get an interactive code editor below it where you can test the corner case yourself, and it does kind-of specify that thereā€™s no exception unless n is negative), the Rust method is not perfectly documented either, since youā€™ll have to read over several examples until you get to the point of what happens if less than n elements are available. What really needs fixing here is the documentation, nothing more; in particular the first line of the documentation should probably be more completely describing whatā€™s supposed to happen.

6 Likes

I didn't realise Take had such a notable history, thanks for going the extra mile and gathering some references. That does convince me that while I might not like the name, it does have a good history.

I would still like a function which lets me take an exact number of items, and fail if not. I should try it out myself a while, then try to push it to itertools if I find it useful perhaps.

Do you have an ExactSizeIterator? Part of me thinks that the best way to do a panic for something like this would just be to write assert!(it.len() >= n); somewhere -- I think that's clearer than putting the panic in a method anyway.

1 Like
2 Likes

While this is a bit tangential, I found myself desiring the following trait recently:

impl TryFromIterator<T> {
  type Error;
  fn try_from_iter<I>(iter: I) -> Result<Self, Self::Error>
  where I: IntoIterator<Item = T>;
}

and an associated Iterator::try_collect(). The obvious implementation here is

impl<T, const N: usize> TryFromIterator<T> for [T; N] { ... }

which fails if the iterator does not have at least N elements. It's unclear if it should fail if the iterator has more elements than that...

Perhaps it should be to the extent that is possible without creating ridiculously long names (which is very subjective). Saying the name need not convey what is going on because people can read the docs mostly misses the point. When someone is programming and uses intellisense to complete, they may easily forget that take takes at most instead of takes exactly. Making the name clearers makes it less likely for someone to use it incorrectly.

Can always follow arrayvec's precedent: https://docs.rs/arrayvec/0.5.2/arrayvec/struct.ArrayVec.html#impl-FromIterator<<A%20as%20Array>%3A%3AItem>

I didn't say the name need not convey what is going on. I said the name only vaguely conveys what is going on. Deprecating functions to cause churn and/or bloat is not justified just because a name is slightly imprecise, because by that standard, almost all of the names in std should be deprecated.

(And if intellisense is causing people to fail to understand their code or otherwise develop bad habits, perhaps it is not a tool that carries its weight.)

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.