I'm drafting up an RFC to add an iterator that groups elements it receives into arrays and then returns those arrays. Could I add anything to this RFC?
Perhaps the new method should be called chunks_exact
, because it behaves like slice.chunks_exact()
rather than like slice.chunks()
.
It could also use a method like ChunksExact::remainder
to access the left-over elements.
-
remainder
is definitely a concern, and it's probably the most complicated part of the API:- What should it return?
- a (possibly mutable) slice?
- some kinds of
ArrayVec
? - an iterator over owned items? Could this be an already advanced
array::IntoIter
?
- Even if we ignored it, it can still affect other methods
- For example implementing
DoubleEndedIterator
should require the underlying iterator to implementDoubleEndedIterator
, otherwise calling.next_back()
would yield different chunks than.next()
- For example implementing
- What should it return?
- Do we really need
ChunkBuffer
? Couldn't this reusecollect_into_array
? -
N
possibly being 0 is also a concern. Note that this is currently a blocker for stabilizing thearray_chunks
feature.
One thing you might consider is how you're going to get the chunks, and whether that's well optimized today. Even on a simple iterator like a slice one, https://rust.godbolt.org/z/fPdjWYjT5 ends up having poor codegen.
So consider whether a first step here might be a next_chunk
method on Iterator
so that different implementations have have it behave more efficiently that the default try_fold
-based one would.
(Or maybe you've found a smarter implementation than I could for it and none of this is a concern.)
I guess, but it doesn't really behave like chunks_exact
either, since it returns arrays instead of slices.
I've added remainder
and remainder_mut
as methods. They return a slice and a mutable slice, respectively.
ChunkBuffer
provides functionality that is now used in the remainder
method, as well as making folding/try-folding much easier.
I've added a check to the chunks()
method that panics if N
is 0, since that's what array_chunks
currently does.
There's probably a better way of using ChunkBuffer
, I agree. I wonder if using InPlaceIterable
would help at all, like it does in Zip
.
It is quite unfortunate that they don't give owned access to those items. I would at least mention this.
I would still consider extending collect_into_array
to allow those functionalities, or maybe a potential ArrayVec
, whenever that will be introduced. Anyway, even if it doesn't end up like this, I think this is something that should be discussed, so I would put it under a "possible concerns" or "unresolved questions" section.
Note that this is not supposed to be the correct/final behaviour, but you make it look like it is. I would mention that it will temporarily panic if N
is 0, but it is supposed to give a compile time error and that this should be a blocker for stabilization.
Zip
doesn't use InPlaceIterable
, that trait is used for in place collection. You might be thinking of TrustedRandomAccess
. I think it might be worth specializing on TrustedLen
and/or TrustedRandomAccess
. The former might be more supported, but the latter may enable more optimizations.
By the way the proposed implementation of advance_by
is wrong, it discards the remainder. The implementation of DoubleEndedIterator
also ignores the remainder.
I've added an into_remainder
method to Chunks
that gives the remainder in the form of an owned iterator.
I've noted that in the unresolved questions section.
Also noted this.
Chunks
already specializes on TrustedLen
.
Could you elaborate on this? How could I fix it so that it doesn't ignore the remainder?
I didn't mean to implement TrustedLen
on Chunks
, but to specialize Chunks::next
for iterators that implement TrustedLen
, this way it doesn't need to do a check for each item, you can just check that the number of remaining items is bigger than the chunk size and then you can just repeatedly call .next().unwrap_unchecked()
. Similar for TrustedRandomAccess
, except in that case you can call __iterator_get_unchecked
.
For DoubledEndedIterator
I think you have to require that the underlying iterator implements ExactSizeIterator
, then use its size hint to know how many items will be in the remainder. Then you can first take the remainder and then call .next_back()
N times.
For advance_by
I think it may be a bit more complex. In the worst case you could keep the default implementation.
Yeah I see that now. I've reworked the proposal as a whole to accommodate this.
I still think advance_by
's implementation is wrong. For example I would expect this to succeed, instead the assertion after the advance_by
fails:
fn test_advance_by() {
let elems = &[1, 2, 3, 4, 5, 6, 7];
// Manually advance 4 times
let mut chunks = elems.iter().copied().chunks::<2>();
chunks.next(); chunks.next(); chunks.next(); chunks.next();
assert_eq!(chunks.remainder(), &[7]);
// Use advance_by(4)
let mut chunks = elems.iter().copied().chunks::<2>();
chunks.advance_by(4);
assert_eq!(chunks.remainder(), &[7]);
}
I see. I've removed the optimized advance_by
for the time being until I can think of a better solution.
Is there anything else this proposal needs? I'm going to propose the RFC later today.
I think the RFC should contain the signatures of the methods on Chunks
(remainder
, remainder_mut
, and into_remainder
) directly, not just under a link.
The core of
Chunks
is built around the privatePartialArray
struct. Its definition looks like this:
If this is an implementation detail, it probably shouldn't be in the RFC.
Relatedly, fn into_remainder(self) -> IntoRemainder<I::Item, N>
is introducing another new type -- which isn't even mentioned in the RFC -- and it probably shouldn't. I think that remainder type is functionally an array::IntoIter<T, N>
? Can it just be that?
(In general, the more public items something needs the more justification it needs.)
However, they can still be accessed via the
remainder()
andremainder_mut()
methods.
What do these do when the iterator has not yet been exhausted?
A while ago I prototyped a crate with a similar API - it was for collecting one array rather than into chunks, but the analogue here would be to implement Iterator<Item = ArrayOrRemainder>
, with
enum ArrayOrRemainder<T, const N: usize> {
Array([T; N]),
Remainder { values: [MaybeUninit<t>; N], init_count: usize },
}
forcing the caller to handle the possibility of a remainder, rather than to require they know to call remainder()
after iterating... What do you think about that as an alternative?
An interesting proposition; I'd prefer to use a Result<[T; N], PartialArray<[T; N]>>
or something like that. However, this would require end users to pipe the iterator through a .filter_map(Result::ok)
to use it for the intended N:1 transform purpose. Not sure how idiomatic that would be.
I quite like it - I think in general "force the user to think about the extra cases, and be explicit about ignoring them" is definitely idiomatic rust
Already proposed here, didn't get much attention from op.
I don't like the fact that this forces the caller to handle the possibility of a remainder even in the middle of the iterator, which should be impossible.
I'm not fond of that for lazy iterators, since it keeps the type system from knowing that it'll always be the array for a while, then only the last one can be partial. And, relatedly, that keeps one from doing, say, .map(u32::from_ne_bytes)
on the chunks directly.
(I do really like that for slices, where the chunking is eager and cheap, though: https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.as_chunks)
itertools
has had questions about things somewhat like this related to zip_eq
. Here's a rust conversation about the same: [ER] Iterator::zip_exact · Issue #85342 · rust-lang/rust · GitHub
There are a bunch of possible options about the remainder -- like requiring ESI
and assert!
ing there won't be one, or panic!
king on Drop
if there's known to be a remainder but it wasn't looked at, or ...
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.