Suggestion: `iter().map(f).skip(n)` should not invoke`f` while skipping

core::item::Map doesn’t have specialization of advance_by that would skip the call to the closure. This means that in the following snippet, the closure is called 5 times even if the result is only used 3 times (playground):

fn main() {
    let mut x = 0;
    let f = |val| { x+=1; val * val };
    let v: Vec<_> = (1..=5).map(f).skip(2).collect();
    println!("{:?}, {}", v, x); // prints "[9, 16, 25], 5"
}

I thought that Map would have the following specialization of advance_by:

impl<B, I: Iterator, F> Iterator for Map<I, F>
where
    F: FnMut(I::Item) -> B,
{  
    fn advance_by(&mut self, n: usize) -> Result<(), usize> {                 
        for i in 0..n {                                                        
            // Unlike the default in `Iterator`, this does not invoke the                           
            // closure `self.f` since the result is unused.                      
            self.iter.next().ok_or(i)?;               
        }                                             
        Ok(())
    }   
}

I don’t see anywhere in the documentation of Map or Skip that the closure must be called for the skipped element, so I don’t think if this would be considered a breaking change to add it. Even if it’s a breaking change, it should be possible to skip the call to the closure if the closure doesn’t do any write though a reference to an object not own by the closure, which if I understand correctly is equivalent to say that the closure implements FnOnce, or do I miss something?

Hi, what are asking can be viewed as an optimization that automatically transforms iter().map(f).skip(n) into iter().skip(n).map(f), is that correct?

Doing this kind of optimization is fine in a purely functional language like Haskell (and they even have rewrite rules to specify this kind of thing), but I it could be problematic in Rust.

What do you think about a clippy linter that advises the programmer to rewrite the code to skip first, then map?

10 Likes

It certainly is, and the current behavior is the correct one.

Definitely not. FnOnce guarantees nothing about side-effects. It could e.g. write to a static, open a file, print to the screen, etc. etc..

Even if it did not, we still could not do it because we cannot specialize on FnOnce as of today.

5 Likes

Hi, what are asking can be viewed as an optimization that automatically transforms iter().map(f).skip(n) into iter().skip(n).map(f) , is that correct?

Yes it is. You could have more steps between skip and map, but you are right, if we move skip before map it’s functionally equivalent if f doesn’t have side effects.

What do you think about a clippy linter that advises the programmer to rewrite the code to skip first, then map?

That’s a good idea.

Definitely not. FnOnce guarantees nothing about side-effects. It could e.g. write to a static , open a file, print to the screen, etc. etc..

I should have thought more about this, you are totally right.

Even if it did not, we still could not do it because we cannot specialize on FnOnce as of today.

I thought we could since there are no conflicting bounds (Map::advance_by uses the default implementation from Iterator), but that’s indeed not the case (I just tested it, and I effectively get the error "cannot specialize on trait FnMut"), but anyway this would be useless because of what you just said above.


Thanks all, that was a bad idea. And a clippy lint would be much better even if it’s hard to write (since we must correctly assess that f doesn’t touch any global state before displaying this lint).

I don't think the lint needs to correctly assess this, I'd argue that in the presence of side effects the intent of the line

let v: Vec<_> = (1..=5).map(f).skip(2).collect();

is extremely unclear: do they really want f to run 5 times? I'd much rather see

#[allow(clippy::map_then_skip)]
let v: Vec<_> = (1..=5).map(f).skip(2).collect();
16 Likes

While it would be a breaking change, I'm interested in why you think the current behaviour is the "correct one"?

If I ask for some elements of an iterator to be "skipped", I personally don't see why the map has to be applied. Java streams (picking a similar system) will not apply such map calls.

(edit: this claim about Java is wrong. Thanks to @gbutler and @SkiFire13 for pointing out my error)

1 Like

That doesn't sound correct to me. I'm pretty sure the Java stream will behave the same. If you want to skip them without applying the map, then put the skip first.

1 Like

Because you applied the map before you skipped. In other words you skipped the result of the map, not its input.

I can't seem to reproduce your statement. This code:

IntStream
    .range(0, 11).map(i -> {
      System.out.println("mapped " + i);
      return i;
    })
    .skip(5)
    .forEach(i -> System.out.println("foreach " + i));

Prints:

mapped 0
mapped 1
mapped 2
mapped 3
mapped 4
mapped 5
foreach 5
mapped 6
foreach 6
mapped 7
foreach 7
mapped 8
foreach 8
mapped 9
foreach 9
mapped 10
foreach 10

Showing it mapped the skipped items too.

5 Likes

Apologises, this was a mistake.

I went back to where I learnt this, and it turns out it was a talk about a Java stream implementation which could do various fusing and skipping, and I had misremembered it as what Java actually did.

1 Like

The only possibility is to add a variant of map() that behaves this way.

It's a bit specialized though, so maybe submitting it to itertools could be more appropriate.

Maybe it's just me, but what I don't understand is why the code isn't just rewritten from iter.map(f).skip(n) to iter.skip(n).map(f), and why this is an internals topic at all since the answer is so obvious.

I believe that anywhere the former is written, it can be changed to the latter, and thus I don't see the issue with this at all.

1 Like

That's obviously wrong since you could be passing an iterator to a generic function that uses skips, and then you can't do what you propose.

As a (non-practical) side note, if Rust tracked function purity, then we could hypothetically have Map::advance_by be specialized on whether the parameter is a pure function.

1 Like

There's nothing wrong about it, let alone it being obviously wrong.

If what you say is the case then that needs to be fixed in that function, not the language.

It's pretty trivial and obvious that you can't turn x.map().skip() into x.skip().map() if the x.map().skip() comes from calling f(x.map()) where f (the "generic function") calls skip() internally (an arbitrary number of times with arbitrary parameters) on the parameter and you can't change f (because it's in another crate, or it's a trait method, or anyway changing f would make the design worse); in this case you need instead to call f(x.map_variant()) where map_variant is the proposed map() variant.

This is a good idea. Maybe call it map_pure?