Pre-RFC: Add a chunk iterator to libcore

I'm drafting up an RFC to add an iterator that groups elements it receives into arrays and then returns those arrays. Could I add anything to this RFC?

1 Like

Perhaps the new method should be called chunks_exact, because it behaves like slice.chunks_exact() rather than like slice.chunks().

It could also use a method like ChunksExact::remainder to access the left-over elements.

6 Likes
  • remainder is definitely a concern, and it's probably the most complicated part of the API:
    • What should it return?
      • a (possibly mutable) slice?
      • some kinds of ArrayVec?
      • an iterator over owned items? Could this be an already advanced array::IntoIter?
    • Even if we ignored it, it can still affect other methods
      • For example implementing DoubleEndedIterator should require the underlying iterator to implement DoubleEndedIterator, otherwise calling .next_back() would yield different chunks than .next()
  • Do we really need ChunkBuffer? Couldn't this reuse collect_into_array?
  • N possibly being 0 is also a concern. Note that this is currently a blocker for stabilizing the array_chunks feature.
3 Likes

One thing you might consider is how you're going to get the chunks, and whether that's well optimized today. Even on a simple iterator like a slice one, https://rust.godbolt.org/z/fPdjWYjT5 ends up having poor codegen.

So consider whether a first step here might be a next_chunk method on Iterator so that different implementations have have it behave more efficiently that the default try_fold-based one would.

(Or maybe you've found a smarter implementation than I could for it and none of this is a concern.)

I guess, but it doesn't really behave like chunks_exact either, since it returns arrays instead of slices.

I've added remainder and remainder_mut as methods. They return a slice and a mutable slice, respectively.

ChunkBuffer provides functionality that is now used in the remainder method, as well as making folding/try-folding much easier.

I've added a check to the chunks() method that panics if N is 0, since that's what array_chunks currently does.

There's probably a better way of using ChunkBuffer, I agree. I wonder if using InPlaceIterable would help at all, like it does in Zip.

It is quite unfortunate that they don't give owned access to those items. I would at least mention this.

I would still consider extending collect_into_array to allow those functionalities, or maybe a potential ArrayVec, whenever that will be introduced. Anyway, even if it doesn't end up like this, I think this is something that should be discussed, so I would put it under a "possible concerns" or "unresolved questions" section.

Note that this is not supposed to be the correct/final behaviour, but you make it look like it is. I would mention that it will temporarily panic if N is 0, but it is supposed to give a compile time error and that this should be a blocker for stabilization.

Zip doesn't use InPlaceIterable, that trait is used for in place collection. You might be thinking of TrustedRandomAccess. I think it might be worth specializing on TrustedLen and/or TrustedRandomAccess. The former might be more supported, but the latter may enable more optimizations.

By the way the proposed implementation of advance_by is wrong, it discards the remainder. The implementation of DoubleEndedIterator also ignores the remainder.

I've added an into_remainder method to Chunks that gives the remainder in the form of an owned iterator.

I've noted that in the unresolved questions section.

Also noted this.

Chunks already specializes on TrustedLen.

Could you elaborate on this? How could I fix it so that it doesn't ignore the remainder?

I didn't mean to implement TrustedLen on Chunks, but to specialize Chunks::next for iterators that implement TrustedLen, this way it doesn't need to do a check for each item, you can just check that the number of remaining items is bigger than the chunk size and then you can just repeatedly call .next().unwrap_unchecked(). Similar for TrustedRandomAccess, except in that case you can call __iterator_get_unchecked.

For DoubledEndedIterator I think you have to require that the underlying iterator implements ExactSizeIterator, then use its size hint to know how many items will be in the remainder. Then you can first take the remainder and then call .next_back() N times.

For advance_by I think it may be a bit more complex. In the worst case you could keep the default implementation.

Yeah I see that now. I've reworked the proposal as a whole to accommodate this.

I still think advance_by's implementation is wrong. For example I would expect this to succeed, instead the assertion after the advance_by fails:

fn test_advance_by() {
    let elems = &[1, 2, 3, 4, 5, 6, 7];

    // Manually advance 4 times
    let mut chunks = elems.iter().copied().chunks::<2>();
    chunks.next(); chunks.next(); chunks.next(); chunks.next();
    assert_eq!(chunks.remainder(), &[7]);

    // Use advance_by(4)
    let mut chunks = elems.iter().copied().chunks::<2>();
    chunks.advance_by(4);
    assert_eq!(chunks.remainder(), &[7]);
}

I see. I've removed the optimized advance_by for the time being until I can think of a better solution.

Is there anything else this proposal needs? I'm going to propose the RFC later today.

I think the RFC should contain the signatures of the methods on Chunks (remainder, remainder_mut, and into_remainder) directly, not just under a link.

The core of Chunks is built around the private PartialArray struct. Its definition looks like this:

If this is an implementation detail, it probably shouldn't be in the RFC.

Relatedly, fn into_remainder(self) -> IntoRemainder<I::Item, N> is introducing another new type -- which isn't even mentioned in the RFC -- and it probably shouldn't. I think that remainder type is functionally an array::IntoIter<T, N>? Can it just be that?

(In general, the more public items something needs the more justification it needs.)

However, they can still be accessed via the remainder() and remainder_mut() methods.

What do these do when the iterator has not yet been exhausted?

A while ago I prototyped a crate with a similar API - it was for collecting one array rather than into chunks, but the analogue here would be to implement Iterator<Item = ArrayOrRemainder>, with

enum ArrayOrRemainder<T, const N: usize> {
  Array([T; N]),
  Remainder { values: [MaybeUninit<t>; N], init_count: usize },
}

forcing the caller to handle the possibility of a remainder, rather than to require they know to call remainder() after iterating... What do you think about that as an alternative?

An interesting proposition; I'd prefer to use a Result<[T; N], PartialArray<[T; N]>> or something like that. However, this would require end users to pipe the iterator through a .filter_map(Result::ok) to use it for the intended N:1 transform purpose. Not sure how idiomatic that would be.

I quite like it - I think in general "force the user to think about the extra cases, and be explicit about ignoring them" is definitely idiomatic rust :slight_smile:

6 Likes

Already proposed here, didn't get much attention from op.

I don't like the fact that this forces the caller to handle the possibility of a remainder even in the middle of the iterator, which should be impossible.

6 Likes

I'm not fond of that for lazy iterators, since it keeps the type system from knowing that it'll always be the array for a while, then only the last one can be partial. And, relatedly, that keeps one from doing, say, .map(u32::from_ne_bytes) on the chunks directly.

(I do really like that for slices, where the chunking is eager and cheap, though: https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.as_chunks)

itertools has had questions about things somewhat like this related to zip_eq. Here's a rust conversation about the same: [ER] Iterator::zip_exact · Issue #85342 · rust-lang/rust · GitHub

There are a bunch of possible options about the remainder -- like requiring ESI and assert!ing there won't be one, or panic!king on Drop if there's known to be a remainder but it wasn't looked at, or ...