The context
Let's take the example of a data structure supporting "batched" insertion, but with the requirement of reserving enough size to insert the batch first.
ExactSizeIterator
seems to be a good match as a bound for a generic insertion parameter, but it's not really nice to use, and most API will prefer using IntoIterator
instead, why not with IntoIterator<IntoIter: ExactSizeIterator>
.
However, if the reservation fails, we may want to return the parameter in an Err
, and then, the API looks immediately less nice. We have indeed been forced to call into_iter
inside the insertion function, so we cannot return the the parameter anymore.
Well, let it go and just accept Iterator + ExactSizeIterator
, but just to be sure, let's also add an implementation for arrays, and look at the generated assembly to check if calling insert_array([42])
is different than insert([42].into_iter())
. Oups, it is! In addition to have a bigger size, core::array::IntoIter
will also add a branch when you iterate it (because you could have passed an already used iterator, obviously). That's even less nice...
The issue
Using IntoIterator
together with ExactSizeIterator
can be tedious, because you need to retrieve the iterator first in order to get its length. It doesn't work when you need the length before the iteration, with an early return in between. And when the iteration is far from the into_iter
call, the compiler may not optimize it the way it could have when iterating directly using into_iter
.
That's quite frustrating when we know that most (all?) of the collections implementing IntoIterator<IntoIter: ExactSizeIterator>
already have their length known before calling into_iter
...
A possible solution
Here comes the ExactSizeIntoIterator
proposal (edited, see below):
pub trait IntoIterator {
type Item;
type IntoIter: Iterator<Item = Self::Item>;
fn into_iter(self) -> Self::IntoIter;
// New method
fn size_hint(&self) -> (usize, Option<usize>) {
(0, None)
}
}
pub trait ExactSizeIntoIterator: IntoIterator<IntoIter: ExactSizeIterator> {
fn len(&self) -> usize {
let (lower, upper) = self.size_hint();
assert_eq!(upper, Some(lower));
lower
}
fn is_empty(&self) -> bool {
self.len() == 0
}
}
Obviously, the method will be redundant with implementors methods, but inherent methods will have the priority, and contrary to ExactSizeIterator
, this trait doesn't need to be in the prelude, so I don't think it's an issue.
I was wondering if the bound IntoIterator<IntoIter: ExactSizeIterator>
was necessary, instead of just IntoIterator
, but I assume that bound would be kind of mandatory on IntoTrustedLen: IntoIterator<IntoIter: TrustedLen>
, so I decided to keep it. Anyway, implementing ExactSizeIntoIterator
without having IntoIter: ExactsizeIterator
is a nonsense by the definition of the trait.
EDIT: @jrose also mentioned adding size_hint
method to IntoIterator
, and I fully agree with this proposal. Then ExactSizeIntoIterator::len
could have the same default implementation than ExactSizeIterator::len
.
This addition would in fact be necessary to add an IntoTrustedLen
trait, and I think some unsafe code writers (including me) would be really happy with it!
Other solutions
Actually, I haven't found other solutions than implementing this trait myself in my crate, but I really feel like I'm filling a missing abstraction in the standard library myself.
EDIT: I can't implement the trait myself, because the blanket implementation impl<I: ExactSizeIntoIterator> ExactSizeIntoIterator for I
conflict with the implementations for array, vec and so on. It needs to be in the standard library.
I could also have one implementation for arrays, and a more generic for iterators — that's in fact what I have for now — but that's uselessly complex.
Disclaimer: I'm encountering this issue in my swap-buffer-queue crate, which supports batched insertions (on top of being maybe the fastest Rust MPSC channel by accident, that's why I care a lot about performance and useless branches in assembly)