About optimizations of `for` loops

It is bothering me that despite claiming that Rust provide zero-cost (or zero-overhead) abstractions, certain cases of idiomatic for loops are really ineffecient compared to handwritten implementations (or calls of .fold(...), sum(), etc.).

It is mainly because of hard to optimize branches inside the iterator next calls.

Examples of such iterators: skip, skip_while, chain, RangeInclusive<T>.

Maybe we can introduce some kind of transforming iterators in the beginning of such for loops? We know that loop would consume whole iterator so we can eagerly do some preprocessing first and use another iterator.

E.g. examples for iterators:

// Remove branching on value of `n` from for  loop.
fn Skip::prepare_for_for(mut self)->impl Iterator<Item=Self::Item>{
     if self.n > 0 {
        self.inner.nth(n-1);
     }
     self.inner.prepare_for_for()
}

// Remove branching for checking if we need to return last value.
fn RangeInclusive<u32>::prepare_for_for(mut self)->impl Iterator<Item=Self::Item>{
    let cast_back = |x| x as u32;
    if self.last_returned {
        (0..0).map(cast_back)
    }
    else {
         (u64::from(self.start()) .. u64::from(self.end())+1).map(cast_back)
    }
}

// Default case for unknown iterator
default fn Iterator::prepare_for_for(self)->impl Iterator<Item=Self::Item>{
   self
}

Is it possible to implement some way in compiler/standard library?

P.S. I also thought about if it is worthy to automatically convert for loops to for_each but I think it would be too complex because of control flow operators inside of loops and it can also hamper debugger usage.

That's an old, well-known issue.

Is it possible to implement some way in compiler/standard library?

For some cases, not for all. for loops allow more complex control flow than closures passed to for_each due to being able to break, ?, return and possibly yield in the future out of the parent scopes, not just the loop itself. Which means we can't transform the loop body into a closure.

Another approach is to turn them into counted loops via specialization (I tried in #93243) but that required additional traits and indirection that it negatively affected compile times. This approach might still be worth it but it'll need very careful tweaking. But that doesn't help with complicated adapters like Chain or Flatten, those want to be separate or nested loops, driving next with a single induction variable could help a bit, but not as much as complely reorganizing the loops.

An additional complication is that the desugaring happens fairly early in ast->hir lowering, at which point type information isn't available. To do something like specialization but perhaps with fewer helper traits the loop desugaring would have to happen later.

Maybe we can introduce some kind of transforming iterators in the beginning of such for loops? We know that loop would consume whole iterator so we can eagerly do some preprocessing first and use another iterator.

That's the setup step in the PR above, you could get the same effect with an advance_by(0) call today. That'd help with skip at least.

There already is an identity "prepare for for" step done, since the for loop desugar uses IntoIterator. It can't really be meaningfully specialized since that would require changing the IntoIter type, it would be theoretically possible to a further subtrait that uses an opaque associated type instead of Self::IntoIter (which is just Self for Self: Iterator).

Changing the <(impl Iterator) as IntoIterator>::IntoIter type directly would likely break effectively no code, but it's still breaking.

Adding a default fn into_iter_immediate(self) -> impl Iterator<Item = Self::Item> { self.into_iter() } to IntoIterator (or even Iterator where Self: Sized itself, but that would prohibit custom IntoIterator from specializing for immediate iteration) would be sufficient if the default impl works out properly.

But making it a new trait of core::ops::For has some merit to it, though a bit questionable merit.

Note that it's not solely "prepare for for;" it's performing a general setup transformation before iteration. Iterator::try_fold would want to perform the setup as well, and doing so could even eliminate the need for some try_fold implementations. (Any which take the general shape of self.prepare(); inner.try_fold(...args).)

What stops us from doing 2 calls there?

Something like:

Current:

for x in y {...} =>

let mut iter = y.into_iter()
while let Some(x) = iter.next(){...}

Future:

for x in y =>

let mut iter = y.into_iter().prepare_for_for();
while let Some(x) = iter.next(){...}

And make prepare_for_for some Iterator method with default implementation and anonymous return type.

I think about something like this but part of of Iterator trait.

Note that your specialization isn't correct for Skip, because it doesn't handle a non-fused inner iterator correctly.

Note also for the time being that

error: method with return-position `impl Trait` in trait cannot be specialized
  --> src/lib.rs:12:5
   |
12 |     default fn into_immediate_iter(self) -> impl Iterator<Item = I::Item> {
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: specialization behaves in inconsistent and surprising ways with `#![feature(return_position_impl_trait_in_trait)]`, and for now is disallowed

The reason for the error is that RPITIT is transparent for known implementors (exactly like named associated traits), meaning that the call site can rely on knowing the type returned. Even without explicit impl refinement, the caller gets additional knowledge about the autotraits which "leak" through RPIT. The useful definition for default fn RPITIT would be to make it always an opaque type.

Barring that error, a proper implementation might potentially look like (modulo bikeshed ofc)

// core::ops

pub trait For {
    type Output;
    fn into_immediate_iter(self) -> impl Iterator<Item = Self::Output>;
}

impl<I: IntoIterator> For for I {
    type Output = I::Item;
    default fn into_immediate_iter(self) -> impl Iterator<Item = I::Item> {
        self.into_iter()
    }
}

// core::iter

impl<I> For for Skip<I>
where
    I: Iterator,
{
    fn into_immediate_iter(self) -> impl Iterator<Item = I::Item> {
        let mut iter = self.iter.into_immediate_iter();
        iter.nth(self.n).chain(iter)
    }
}

impl<A: Step> For for RangeInclusive<A> {
    default fn into_immediate_iter(self) -> impl Iterator<Item = A> {
        if self.exhausted {
            (self.end.clone()..self.end).chain(None)
        } else {
            (self.start..self.end.clone()).chain(Some(self.end))
        }
    }
}

impl<A: Step + Copy> For for RangeInclusive<A> {
    fn into_immediate_iter(self) -> impl Iterator<Item = A> {
        if self.exhausted {
            (self.end..self.end).chain(None)
        } else if let Some(end) = A::forward_checked(self.end, 1) {
            // NB: would introduce an extra clone in the None case without
            //     specialization unless fn Step::*_checked -> Result<T, T>
            //     I didn't originally since most cases can check steps_between
            (self.start..end).chain(None)
        } else {
            (self.start..self.end).chain(Some(self.end))
        }
    }
}

Nothing directly; I was mostly just pointing out that a pre-iteration step is already performed. Allowing IntoIterator types which are not Iterator to play the game directly seems beneficial, though.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.