Pre-RFC: `fold_ok` is composable internal iteration

Ok. It’s clear why it wasn’t there already: this didn’t occur to me as a possibility at all! In fact, I still prefer working code concepts for wild ideas rather than discussing what might be possible.

Ok, I was tipped off that there is a new design for “Carrier” in the works. I’ve looked at (comment) and it does not yet seem to have the equivalent of C::from_success(x) as used in the code above, so I don’t see a nice implementation of “fold_ok” in terms of the new formulation yet.

This is an exciting proposal!

I see that with this proposal, internal iterators end up being a subset of external iterators; a next() implementation is still required. To be entirely honest, one of my own primary motivations for using internal iteration is the simple fact that implementing next() can be challenging! In the context of writing an application or computational code, one may often1 run into cases where there is a loop which ought to be abstracted out, but which may have a long runtime or a potentially unbounded number of iterations (preventing e.g. a Vec-returning cop-out):

fn each_simulation_step<F:FnMut(&State)>
        (initial_state: State, mut callback: F) -> State {
    // ...implementation which would be awfully nasty to turn into an Iterator
    // but is simple to turn into an internal iterator
}

If fold_ok was on a separate InternalIterator trait which did not require Iterator, then each_simulation_step could perhaps be rewritten to impl that trait, and code using it could benefit from the folds and searches.

However, I’m not sure how it would be possible to allow the existing methods on Iterator to be implemented in terms of an InternalIterator trait in a backwards compatible fashion, which likely spells the death for this alternative. Certainly, creating a common base for the existing methods on Iterator is a far more important goal, and not one to compromise!


1: I am, of course, speaking entirely for myself.

.next() can actually be implemented in terms of fold_ok (ugly and not important exactly how, but like this: .fold_ok((), |(), elt| Err(elt)).err())

So fold_ok requires updating the iterator’s state the same way. Implementing fold is much easier, because it consumes the iterator with no requirement to remember progress (but it must start off from the iterator’s current state, of course).

In this sense, I don’t think this is the simpler internal iteration that you are looking for, but maybe fold is.

I was about to ask a dumb question about why fold_ok takes a mutable receiver when there is by_ref, but then I realized the extremely obvious answer. Now I’ll just leave this paragraph here for people Ctrl+F’ing “by_ref” to give them a hint to just think about it a little bit. :slight_smile:

Of course, even if this could be changed, it wouldn’t help my use case, as it is the very capability itself of being able to use an iterator twice which is fundamentally at odds with having a simple implementation.

Edit: Hm, I spoke too soon. The “obvious answer” I was alluding to was that you cannot recover the modified state (or so I figured). I somehow concluded this on the basis that fold_ok doesn’t return the modified iterator; but this reasoning is clearly nonsense, because the same circumstances also apply to next()!

Edit 2: …and that’s why next() also has a mutable receiver. Right. I think I should shut up now. :stuck_out_tongue:

An open question could indeed be if fold_ok should use a self receiver.

  • Advantage: Much easier to implement, no need to keep partially iterated state around
  • Disadvantage: Cannot use the specific implementation of fold_ok through &mut, cannot be the common base for any, all, find etc which use a &mut receiver.

My driving purpose for this RFC is as a largely silent way to make every other iterator method faster. I don’t even particularly care how easy it is to use, just how easy it is to write for a particular iterator type, because I don’t intend it to be used that much directly. Thus I don’t think it’s a good idea to make this more general, like extending it to the carrier trait or wrapping it around fold in some magic way, because those concerns cater to the user, not the implementor.

Similarly, I want this to be implemented on &mut self because if it isn’t there are a lot of things it can’t make faster. Ideally I even want the method to be such that if we ever get await/async style generators, they can implement this themselves.

5 Likes

My motivation was more or less the same (as indicated), but I embraced the Result formulation since it fits very well, and I think the same thing about the std::ops::Carrier formulation now, it just fits.

Now it might not be guaranteed that Carrier takes that path, but if it has an impl where there is no error branch, then that resolves to a non-short-circuiting fold by itself, kind of how @briansmith described it.

You don’t particularly gain anything from Carrier, but you lose a lot of helpful methods. For example, I might want to write take as, I dunno,

    let result = self.iter.fold_ok(init, move |acc, elt| {
        *n -= 1;
        match (g(acc, elt), *n > 0) {
            (Ok(acc), true) => Ok(acc),
            (last, _) => Err(last),
        }
    });
    result.or_else(|last| last)

Using Carrier would prevent this kind of composition.

If you look at post #9, I don’t think you lose much. fold_ok is generic over the carrier, so if you want to run a fold_ok with Result being the carrier, it implements that.

Itertools is actually bluss’ own project. :wink:

And Itertools::fold_while, which I added to rust-itertools a while back (for making an ML optimizer loop let mut-free), was the direct inspiration for fold_ok, afaik.

So as far as I understand bluss’ intentions for fold_ok are basically to promote/move fold_while into the stdlib, making the former obsolete (deprecated and removed even at some point?).

3 Likes

After some experiments, I think that the “new carrier (QuestionMark)” version of fold_ok does not work well in practice (for fallible operations): type inference fails too often. So it doesn’t just need the inject(x) -> Self function, I think it needs a real monad trait for good type inference.

Which is to say, it puts the horizon of generalization too far away, let’s try to base this on Result only.

This hasn’t moved for some reasons, for example christmas, but also that the following does not compile without additional type hints.

It’s more or less so that any nested use of try!() or ? will have this kind of issue.

Hence my quipping some late night that I wanted a try operator that didn’t convert errors, let’s say expr=? for example, that would have to be used inside fold_ok's closure here.

#![feature(fold_ok)]

use std::io::{BufRead, stdin, self};

fn try_main() -> Result<(), io::Error> {
    let s = stdin();
    let r = s.lock();
    let lines = r.lines().fold_ok(Vec::new(), |mut v, line| {
        v.push(line?);
        Ok(v)
    })?;
    println!("lines: {:?}", lines);
    Ok(())
}

fn main() {
    try_main().unwrap();
}

I hate suggesting things like this, but…

IIUC, the problem is that type inference hits what amounts to <io::Error as From>::<X>::from(From::<Y>::from(line)), and it can’t work out what X is. Could the language be extended so that you can mark the From trait such that when type inference sees this pattern, it just assumes that X = Y and moves on? Does the algorithm allow you to mark a type variable as "if all else fails, just assume Y"?

This is actually one of the major pain points with closures: propagating errors inside to outside via try!.

So I saw a talk at a conference that seems pretty relevant to this. The strymonas project is a “stream fusion” compiler in o’caml. It is able to eliminate a lot of the inefficiencies that would be involved in a naive implementation of Iterator. We are also able to do so, but they do more. =) It seems worth a look to see if any of the lessons they learn can be included into this proposal.

5 Likes

Someone asked a question about how to do this in one pass https://www.reddit.com/r/rust/comments/6jpbtl/how_do_i_make_this_code_efficient_and_idiomatic/

Do I understand correctly that under this proposal you can write this code with iterators in one pass?

Does Try fix any of the problems Carrier caused in this? Something like this:

fn try_fold<F, T: Try>(self, init: T::Ok, mut f: F) -> T where
    Self: Sized, F: FnMut(T::Ok, Self::Item) -> T,
{
    let mut accum = init;
    for x in self {
        accum = f(accum, x)?;
    }
    Try::from_ok(accum)
}

Since the closure constructs a concrete impl Try, the experiment I tried had the method inferring fine.

1 Like

Thinking more about this, it seems to me that instead of a traditional internal iterator, what’s really wanted here is the dual of a generator, i.e. some sort of “sink”.

With flexible enough generators, one could write yield to suspend and receive the next value when the generator is resumed.

The take example could look like this:

trait Iterator {
    fn sink_into<G, R>(&mut self, mut g: G) -> R
        where G: Generator<Option<Self::Item>, Yield=(), Return=R>
    {
        loop {
            match g.resume(self.next()) {
                Yield(()) => {},
                Return(x) => return x
            }
        }
    }

    fn try_fold<F, T: Try>(self, init: T::Ok, mut f: F) -> T
        where Self: Sized, F: FnMut(T::Ok, Self::Item) -> T,
    {
        let mut accum = init;
        self.sink_into(|| {
            while let Some(next) = yield {
                accum = f(accum, next)?;
            }
            Try::from_ok(accum)
        })
    }
}

struct Take<I> {
    n: usize,
    iter: I,
}

impl<I> Iterator for Take<I> where I: Iterator {
    fn sink_into<G, R>(&mut self, mut g: G) -> R
        where G: Generator<Option<Self::Item>, Yield=(), Return=R>
    {
        if self.n == 0 {
            return iter::empty().sink_into(g);
        }
        let n = &mut self.n;
        self.iter.sink_into(move || {
            *n -= 1;
            loop {
                match g.resume(yield) {
                    Yield(()) => {
                        if *n == 0 {
                            return iter::empty().sink_into(g);
                        }
                    }
                    Return(x) => return x
                }
            }
        })
    }
}

Moreover, I’m now wondering if generators could automatically be used to generate code like this, i.e. transform yield x to g.resume(x), alongside the next method, from the same source, i.e. you’d only write:

    fn take(self, mut n: usize) -> impl Iterator<Item=Self::Item> {
        if n == 0 {
            return;
        }
        for x in self {
            n -= 1;
            yield x;
            if n == 0 {
                break;
            }
        }
    }
1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.