Pre-RFC: Add a chunk iterator to libcore

notgull · July 29, 2021, 1:48am

I'm drafting up an RFC to add an iterator that groups elements it receives into arrays and then returns those arrays. Could I add anything to this RFC?

github.com

notgull/rfcs/blob/chunks/text/0000-iterator-chunks.md

- Feature Name: `iterator_chunks`
- Start Date: 2021-06-28 
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

This feature adds a new method to the `Iterator` trait: `chunks()`. This method returns the `Chunks` type, which this feature also adds. Both `chunks()` and `Chunks<>` and parameterized by a const-generic usize `N`. `Chunks`
is an `Iterator` that collects the elements of the underlying iterator into an `[Item; N]` and iterates over that.

# Motivation
[motivation]: #motivation

A common use case of iteration is to transform a series of elements into another series of elements. 1:1 transformations are accomplished via `map()` and 1:N transformations are accomplished via `flat_map()`. However, there is
no easy way to accomplish N:1 transformations. In certain cases you can use `fold()`, but this tends to become messy in the long run. You could also use `collect()` and then operate on the resulting container type, but this requires an allocation, which is slow and sometimes not available.

This RFC provides a more elegant solution to 1:N transformations, by allowing the user to group elements into arrays of length N. These arrays can be transformed via adaptors like `map()` into the data type that the user wishes.

# Guide-level explanation

This file has been truncated. show original

mbrubeck · July 29, 2021, 3:23am

Perhaps the new method should be called chunks_exact, because it behaves like slice.chunks_exact() rather than like slice.chunks().

It could also use a method like ChunksExact::remainder to access the left-over elements.

SkiFire13 · July 29, 2021, 8:39am

remainder is definitely a concern, and it's probably the most complicated part of the API:
- What should it return?
  - a (possibly mutable) slice?
  - some kinds of ArrayVec?
  - an iterator over owned items? Could this be an already advanced array::IntoIter?
- Even if we ignored it, it can still affect other methods
  - For example implementing DoubleEndedIterator should require the underlying iterator to implement DoubleEndedIterator, otherwise calling .next_back() would yield different chunks than .next()
Do we really need ChunkBuffer? Couldn't this reuse collect_into_array?
N possibly being 0 is also a concern. Note that this is currently a blocker for stabilizing the array_chunks feature.

scottmcm · July 29, 2021, 5:50pm

One thing you might consider is how you're going to get the chunks, and whether that's well optimized today. Even on a simple iterator like a slice one, https://rust.godbolt.org/z/fPdjWYjT5 ends up having poor codegen.

So consider whether a first step here might be a next_chunk method on Iterator so that different implementations have have it behave more efficiently that the default try_fold-based one would.

(Or maybe you've found a smarter implementation than I could for it and none of this is a concern.)

notgull · July 29, 2021, 6:10pm

I guess, but it doesn't really behave like chunks_exact either, since it returns arrays instead of slices.

I've added remainder and remainder_mut as methods. They return a slice and a mutable slice, respectively.

ChunkBuffer provides functionality that is now used in the remainder method, as well as making folding/try-folding much easier.

I've added a check to the chunks() method that panics if N is 0, since that's what array_chunks currently does.

There's probably a better way of using ChunkBuffer, I agree. I wonder if using InPlaceIterable would help at all, like it does in Zip.

SkiFire13 · July 29, 2021, 8:33pm

It is quite unfortunate that they don't give owned access to those items. I would at least mention this.

I would still consider extending collect_into_array to allow those functionalities, or maybe a potential ArrayVec, whenever that will be introduced. Anyway, even if it doesn't end up like this, I think this is something that should be discussed, so I would put it under a "possible concerns" or "unresolved questions" section.

Note that this is not supposed to be the correct/final behaviour, but you make it look like it is. I would mention that it will temporarily panic if N is 0, but it is supposed to give a compile time error and that this should be a blocker for stabilization.

Zip doesn't use InPlaceIterable, that trait is used for in place collection. You might be thinking of TrustedRandomAccess. I think it might be worth specializing on TrustedLen and/or TrustedRandomAccess. The former might be more supported, but the latter may enable more optimizations.

By the way the proposed implementation of advance_by is wrong, it discards the remainder. The implementation of DoubleEndedIterator also ignores the remainder.

notgull · July 29, 2021, 10:05pm

I've added an into_remainder method to Chunks that gives the remainder in the form of an owned iterator.

I've noted that in the unresolved questions section.

Also noted this.

Chunks already specializes on TrustedLen.

Could you elaborate on this? How could I fix it so that it doesn't ignore the remainder?

SkiFire13 · July 30, 2021, 7:26am

I didn't mean to implement TrustedLen on Chunks, but to specialize Chunks::next for iterators that implement TrustedLen, this way it doesn't need to do a check for each item, you can just check that the number of remaining items is bigger than the chunk size and then you can just repeatedly call .next().unwrap_unchecked(). Similar for TrustedRandomAccess, except in that case you can call __iterator_get_unchecked.

For DoubledEndedIterator I think you have to require that the underlying iterator implements ExactSizeIterator, then use its size hint to know how many items will be in the remainder. Then you can first take the remainder and then call .next_back() N times.

For advance_by I think it may be a bit more complex. In the worst case you could keep the default implementation.

notgull · July 31, 2021, 1:40am

Yeah I see that now. I've reworked the proposal as a whole to accommodate this.

SkiFire13 · July 31, 2021, 9:19am

I still think advance_by's implementation is wrong. For example I would expect this to succeed, instead the assertion after the advance_by fails:

fn test_advance_by() {
    let elems = &[1, 2, 3, 4, 5, 6, 7];

    // Manually advance 4 times
    let mut chunks = elems.iter().copied().chunks::<2>();
    chunks.next(); chunks.next(); chunks.next(); chunks.next();
    assert_eq!(chunks.remainder(), &[7]);

    // Use advance_by(4)
    let mut chunks = elems.iter().copied().chunks::<2>();
    chunks.advance_by(4);
    assert_eq!(chunks.remainder(), &[7]);
}

notgull · July 31, 2021, 4:47pm

I see. I've removed the optimized advance_by for the time being until I can think of a better solution.

notgull · October 14, 2021, 9:43pm

Is there anything else this proposal needs? I'm going to propose the RFC later today.

scottmcm · October 14, 2021, 11:31pm

I think the RFC should contain the signatures of the methods on Chunks (remainder, remainder_mut, and into_remainder) directly, not just under a link.

The core of Chunks is built around the private PartialArray struct. Its definition looks like this:

If this is an implementation detail, it probably shouldn't be in the RFC.

Relatedly, fn into_remainder(self) -> IntoRemainder<I::Item, N> is introducing another new type -- which isn't even mentioned in the RFC -- and it probably shouldn't. I think that remainder type is functionally an array::IntoIter<T, N>? Can it just be that?

(In general, the more public items something needs the more justification it needs.)

However, they can still be accessed via the remainder() and remainder_mut() methods.

What do these do when the iterator has not yet been exhausted?

illicitonion · October 15, 2021, 12:21am

A while ago I prototyped a crate with a similar API - it was for collecting one array rather than into chunks, but the analogue here would be to implement Iterator<Item = ArrayOrRemainder>, with

enum ArrayOrRemainder<T, const N: usize> {
  Array([T; N]),
  Remainder { values: [MaybeUninit<t>; N], init_count: usize },
}

forcing the caller to handle the possibility of a remainder, rather than to require they know to call remainder() after iterating... What do you think about that as an alternative?

notgull · October 15, 2021, 3:48am

An interesting proposition; I'd prefer to use a Result<[T; N], PartialArray<[T; N]>> or something like that. However, this would require end users to pipe the iterator through a .filter_map(Result::ok) to use it for the intended N:1 transform purpose. Not sure how idiomatic that would be.

illicitonion · October 15, 2021, 1:14pm

I quite like it - I think in general "force the user to think about the extra cases, and be explicit about ignoring them" is definitely idiomatic rust

SkiFire13 · October 15, 2021, 2:01pm

Already proposed here, didn't get much attention from op.

I don't like the fact that this forces the caller to handle the possibility of a remainder even in the middle of the iterator, which should be impossible.

scottmcm · October 15, 2021, 10:50pm

I'm not fond of that for lazy iterators, since it keeps the type system from knowing that it'll always be the array for a while, then only the last one can be partial. And, relatedly, that keeps one from doing, say, .map(u32::from_ne_bytes) on the chunks directly.

(I do really like that for slices, where the chunking is eager and cheap, though: https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.as_chunks)

itertools has had questions about things somewhat like this related to zip_eq. Here's a rust conversation about the same: [ER] Iterator::zip_exact · Issue #85342 · rust-lang/rust · GitHub

There are a bunch of possible options about the remainder -- like requiring ESI and assert!ing there won't be one, or panic!king on Drop if there's known to be a remainder but it wasn't looked at, or ...

system · January 13, 2022, 10:50pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[pre-RFC] TryFromIterator and try_collect to enable collecting to arrays libs	26	2440	September 8, 2021
Pre-RFC: Iterator::extend_into libs	17	1739	March 25, 2019
Iterators v2 libs	9	1044	May 8, 2021
Pre-RFC: make `&[T]` implement `IntoIterator<Item = T>` (where `T: Copy`) libs	5	796	March 25, 2019
[Pre-RFC] Replace IteratorExt::zip with tuple iteration ideas (deprecated)	25	5087	March 25, 2019

Pre-RFC: Add a chunk iterator to libcore

Related topics