About optimizations of `for` loops

AngelicosPhosphoros · May 27, 2023, 6:42pm

It is bothering me that despite claiming that Rust provide zero-cost (or zero-overhead) abstractions, certain cases of idiomatic for loops are really ineffecient compared to handwritten implementations (or calls of .fold(...), sum(), etc.).

It is mainly because of hard to optimize branches inside the iterator next calls.

Examples of such iterators: skip, skip_while, chain, RangeInclusive<T>.

Maybe we can introduce some kind of transforming iterators in the beginning of such for loops? We know that loop would consume whole iterator so we can eagerly do some preprocessing first and use another iterator.

E.g. examples for iterators:

// Remove branching on value of `n` from for  loop.
fn Skip::prepare_for_for(mut self)->impl Iterator<Item=Self::Item>{
     if self.n > 0 {
        self.inner.nth(n-1);
     }
     self.inner.prepare_for_for()
}

// Remove branching for checking if we need to return last value.
fn RangeInclusive<u32>::prepare_for_for(mut self)->impl Iterator<Item=Self::Item>{
    let cast_back = |x| x as u32;
    if self.last_returned {
        (0..0).map(cast_back)
    }
    else {
         (u64::from(self.start()) .. u64::from(self.end())+1).map(cast_back)
    }
}

// Default case for unknown iterator
default fn Iterator::prepare_for_for(self)->impl Iterator<Item=Self::Item>{
   self
}

Is it possible to implement some way in compiler/standard library?

P.S. I also thought about if it is worthy to automatically convert for loops to for_each but I think it would be too complex because of control flow operators inside of loops and it can also hamper debugger usage.

the8472 · May 27, 2023, 7:29pm

That's an old, well-known issue.

Is it possible to implement some way in compiler/standard library?

For some cases, not for all. for loops allow more complex control flow than closures passed to for_each due to being able to break, ?, return and possibly yield in the future out of the parent scopes, not just the loop itself. Which means we can't transform the loop body into a closure.

Another approach is to turn them into counted loops via specialization (I tried in #93243) but that required additional traits and indirection that it negatively affected compile times. This approach might still be worth it but it'll need very careful tweaking. But that doesn't help with complicated adapters like Chain or Flatten, those want to be separate or nested loops, driving next with a single induction variable could help a bit, but not as much as complely reorganizing the loops.

An additional complication is that the desugaring happens fairly early in ast->hir lowering, at which point type information isn't available. To do something like specialization but perhaps with fewer helper traits the loop desugaring would have to happen later.

Maybe we can introduce some kind of transforming iterators in the beginning of such for loops? We know that loop would consume whole iterator so we can eagerly do some preprocessing first and use another iterator.

That's the setup step in the PR above, you could get the same effect with an advance_by(0) call today. That'd help with skip at least.

CAD97 · May 27, 2023, 10:09pm

There already is an identity "prepare for for" step done, since the for loop desugar uses IntoIterator. It can't really be meaningfully specialized since that would require changing the IntoIter type, it would be theoretically possible to a further subtrait that uses an opaque associated type instead of Self::IntoIter (which is just Self for Self: Iterator).

Changing the <(impl Iterator) as IntoIterator>::IntoIter type directly would likely break effectively no code, but it's still breaking.

Adding a default fn into_iter_immediate(self) -> impl Iterator<Item = Self::Item> { self.into_iter() } to IntoIterator (or even Iterator where Self: Sized itself, but that would prohibit custom IntoIterator from specializing for immediate iteration) would be sufficient if the default impl works out properly.

But making it a new trait of core::ops::For has some merit to it, though a bit questionable merit.

Note that it's not solely "prepare for for;" it's performing a general setup transformation before iteration. Iterator::try_fold would want to perform the setup as well, and doing so could even eliminate the need for some try_fold implementations. (Any which take the general shape of self.prepare(); inner.try_fold(...args).)

AngelicosPhosphoros · May 27, 2023, 10:42pm

What stops us from doing 2 calls there?

Something like:

Current:

for x in y {...} =>

let mut iter = y.into_iter()
while let Some(x) = iter.next(){...}

Future:

for x in y =>

let mut iter = y.into_iter().prepare_for_for();
while let Some(x) = iter.next(){...}

And make prepare_for_for some Iterator method with default implementation and anonymous return type.

AngelicosPhosphoros · May 27, 2023, 11:00pm

I think about something like this but part of of Iterator trait.

CAD97 · May 27, 2023, 11:01pm

Note that your specialization isn't correct for Skip, because it doesn't handle a non-fused inner iterator correctly.

Note also for the time being that

error: method with return-position `impl Trait` in trait cannot be specialized
  --> src/lib.rs:12:5
   |
12 |     default fn into_immediate_iter(self) -> impl Iterator<Item = I::Item> {
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: specialization behaves in inconsistent and surprising ways with `#![feature(return_position_impl_trait_in_trait)]`, and for now is disallowed

The reason for the error is that RPITIT is transparent for known implementors (exactly like named associated traits), meaning that the call site can rely on knowing the type returned. Even without explicit impl refinement, the caller gets additional knowledge about the autotraits which "leak" through RPIT. The useful definition for default fn RPITIT would be to make it always an opaque type.

Barring that error, a proper implementation might potentially look like (modulo bikeshed ofc)

// core::ops

pub trait For {
    type Output;
    fn into_immediate_iter(self) -> impl Iterator<Item = Self::Output>;
}

impl<I: IntoIterator> For for I {
    type Output = I::Item;
    default fn into_immediate_iter(self) -> impl Iterator<Item = I::Item> {
        self.into_iter()
    }
}

// core::iter

impl<I> For for Skip<I>
where
    I: Iterator,
{
    fn into_immediate_iter(self) -> impl Iterator<Item = I::Item> {
        let mut iter = self.iter.into_immediate_iter();
        iter.nth(self.n).chain(iter)
    }
}

impl<A: Step> For for RangeInclusive<A> {
    default fn into_immediate_iter(self) -> impl Iterator<Item = A> {
        if self.exhausted {
            (self.end.clone()..self.end).chain(None)
        } else {
            (self.start..self.end.clone()).chain(Some(self.end))
        }
    }
}

impl<A: Step + Copy> For for RangeInclusive<A> {
    fn into_immediate_iter(self) -> impl Iterator<Item = A> {
        if self.exhausted {
            (self.end..self.end).chain(None)
        } else if let Some(end) = A::forward_checked(self.end, 1) {
            // NB: would introduce an extra clone in the None case without
            //     specialization unless fn Step::*_checked -> Result<T, T>
            //     I didn't originally since most cases can check steps_between
            (self.start..end).chain(None)
        } else {
            (self.start..self.end).chain(Some(self.end))
        }
    }
}

Nothing directly; I was mostly just pointing out that a pre-iteration step is already performed. Allowing IntoIterator types which are not Iterator to play the game directly seems beneficial, though.

system · August 25, 2023, 11:01pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Decide if for-loop iterator is empty given a literal/const iterator compiler	6	279	August 20, 2024
MIR: Optimize for-loop on integer ranges	11	1608	February 10, 2023
Ideas for making Rust easier for Beginners language design	39	6496	March 25, 2019
Optimization comparison: Vec vs array and for vs while compiler	5	1176	July 2, 2022
Delegate trait implementations with procedural macros language design	8	4860	March 25, 2019

About optimizations of `for` loops

Related topics