ExactSizeIntoIterator trait

The context

Let's take the example of a data structure supporting "batched" insertion, but with the requirement of reserving enough size to insert the batch first. ExactSizeIterator seems to be a good match as a bound for a generic insertion parameter, but it's not really nice to use, and most API will prefer using IntoIterator instead, why not with IntoIterator<IntoIter: ExactSizeIterator>.

However, if the reservation fails, we may want to return the parameter in an Err, and then, the API looks immediately less nice. We have indeed been forced to call into_iter inside the insertion function, so we cannot return the the parameter anymore.

Well, let it go and just accept Iterator + ExactSizeIterator, but just to be sure, let's also add an implementation for arrays, and look at the generated assembly to check if calling insert_array([42]) is different than insert([42].into_iter()). Oups, it is! In addition to have a bigger size, core::array::IntoIter will also add a branch when you iterate it (because you could have passed an already used iterator, obviously). That's even less nice...

The issue

Using IntoIterator together with ExactSizeIterator can be tedious, because you need to retrieve the iterator first in order to get its length. It doesn't work when you need the length before the iteration, with an early return in between. And when the iteration is far from the into_iter call, the compiler may not optimize it the way it could have when iterating directly using into_iter.

That's quite frustrating when we know that most (all?) of the collections implementing IntoIterator<IntoIter: ExactSizeIterator> already have their length known before calling into_iter...

A possible solution

Here comes the ExactSizeIntoIterator proposal (edited, see below):

pub trait IntoIterator {
    type Item;
    type IntoIter: Iterator<Item = Self::Item>;
    fn into_iter(self) -> Self::IntoIter;
    // New method
    fn size_hint(&self) -> (usize, Option<usize>) {
        (0, None)
    }
}

pub trait ExactSizeIntoIterator: IntoIterator<IntoIter: ExactSizeIterator> {
    fn len(&self) -> usize {
        let (lower, upper) = self.size_hint();
        assert_eq!(upper, Some(lower));
        lower
    }
    fn is_empty(&self) -> bool {
        self.len() == 0
    }
}

Obviously, the method will be redundant with implementors methods, but inherent methods will have the priority, and contrary to ExactSizeIterator, this trait doesn't need to be in the prelude, so I don't think it's an issue.

I was wondering if the bound IntoIterator<IntoIter: ExactSizeIterator> was necessary, instead of just IntoIterator, but I assume that bound would be kind of mandatory on IntoTrustedLen: IntoIterator<IntoIter: TrustedLen>, so I decided to keep it. Anyway, implementing ExactSizeIntoIterator without having IntoIter: ExactsizeIterator is a nonsense by the definition of the trait.

EDIT: @jrose also mentioned adding size_hint method to IntoIterator, and I fully agree with this proposal. Then ExactSizeIntoIterator::len could have the same default implementation than ExactSizeIterator::len.
This addition would in fact be necessary to add an IntoTrustedLen trait, and I think some unsafe code writers (including me) would be really happy with it!

Other solutions

Actually, I haven't found other solutions than implementing this trait myself in my crate, but I really feel like I'm filling a missing abstraction in the standard library myself.
EDIT: I can't implement the trait myself, because the blanket implementation impl<I: ExactSizeIntoIterator> ExactSizeIntoIterator for I conflict with the implementations for array, vec and so on. It needs to be in the standard library.

I could also have one implementation for arrays, and a more generic for iterators — that's in fact what I have for now — but that's uselessly complex.

Disclaimer: I'm encountering this issue in my swap-buffer-queue crate, which supports batched insertions (on top of being maybe the fastest Rust MPSC channel by accident, that's why I care a lot about performance and useless branches in assembly)

1 Like

Playing devil’s advocate: how far can you get using size_hint’s lower bound?

1 Like

I didn't understand the question, could you reformulate please?

ExactSizeIntoIterator isn't related to size_hint, because it isn't related to an iterator directly.

I’m sorry, I misinterpreted which part was the problem. I thought it was ExactSizeIterator specifically that made things hard, but it’s more that into_iter is consuming and there’s no trait that will let you ask for a length prior to calling that. A parallel proposal would have been adding size_hint to IntoIter. (And maybe that doesn’t work out as well for the perf considerations you mention at the end.)

1 Like

I really like it! Then we could add a default implementation for ExactSizeIntoIterator::len.
I definitely want it to be part of the proposal, so I will edit my post to mention it.

You might be interested in this attempt I made a number of years ago now: Produce better size_hints from Flatten & FlatMap by scottmcm · Pull Request #48544 · rust-lang/rust · GitHub

The current Flatten/FlatMap impls use specialization for this. But it depends heavily on the bounds being known at compile time. So an ExactSizeIntoIterator trait that provides runtime information wouldn't help here. At least not until we have maybe-const traits and can specialize on those.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.