Pre-RFC: `fold_ok` is composable internal iteration

Draft RFC: fold_ok is composable internal iteration

This RFC has several motivations, that were developed in the order listed(!)

  1. A common base for searching and folding methods (for internal iteration)
  2. A common base for reversible iterator methods; i.e having fold_ok and its corresponding rfold_ok brings symmetry and equal power to the reversed iterators.
  3. A useful iterator method

Summary

Add the iterator method fold_ok that exends fold to be fallible and short-cirtuiting. fold_ok generalizes the existing methods all, any, find, position and fold, and will be their new common base; iterators can provide one specific traversal implementation for all of them. Iterators can additionally implement rfold_ok to have the same search and fold methods improved through their reversed iterator as well.


cc @Veedrac and thank you for the feedback so far! The draft has an example of the take adaptor and why fold_ok can be general in the error type of the result. This made it really click into place for me, the Result version is versatile and clearly superior.

Previous discussion on the same topic:

6 Likes

Why does it need Self: Sized?

1 Like

fold_ok and similar methods must be excluded from the Iterator trait object using where Self: Sized, because they have generic parameters.

Re: existing methods in the Rust ecosystem, I’m sure you’re aware of Itertools::fold_while. On the plus side for fold_ok, this means it doesn’t conflict with Itertools, which is probably the biggest potential source of a conflict in the ecosystem for this proposal. The minus side is that there’s a strong precedent for fold_while, which makes fold_ok less appealing. The flip side is of course that following that precedent would result in the aforementioned conflict. This latter point could be listed as a drawback of the fold_while alternative in the proposal.

Why would TryInto and TryFrom use a try_ prefix but these methods use an _ok suffix?

It would be good to explain why instead the existing fold() isn’t specialized so that it is short-circuiting for all V, E where Acc = Result<V, E>.

Also, it seems like one should be able to create a combinator that wraps another folding function and causes the result to be the first Err(error) encountered or the final Ok(result). If so, then it seems like the existing fold() could be specialized for the case where such a combinator is passed to it, to make it short-circuiting.

More generally, it seems strange that we need to define so many things like Into and TryInto and fold and fold_ok like this. It would be nice to find a way to abstract these patterns into something that lets us write only the fallable/short-circuiting versions and still get the benefits of the infallible versions.

IIRC, somebody mentioned in the discussion of TryFrom and TryInto using ! as the error type for infallible operations.

Cheers, Brian

Suggestions for better names are welcome. Since it's an operation composed of many parts, I didn't initially think of it as a try- something that either fails or succeeds.

I don't think this is possible? I would be blown away if it were. Regardless, Iterator::fold already exists and is not specified to be short-circuiting, so I don't think that can be changed.

The pre-RFC proposes to base the default implementations of all, find, position, fold etc on fold_ok, so it's along those lines of defining an operation only once and reusing it. The question is of course if we can make it yet more general.

The haskell counterpart is:

foldM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b

But fold_ok is restricted to just the “Result<_, E> monad”. Do we need to consider something more general than that?

This all sounds like it could be built on top of the “Carrier” mechanism that ? is based on, which would give Rust at least a bit of a foothold into monad-like controlflow abstraction territory without HKT.

I think fold is a good name, meaning that it is worth trying to generalize the existing fold to enable the short-circuit behavior.

Right. I think this is the explanation that should be added to the RFC. Expanding on it, there are many possible functions with the signature (Result<Acc, E>, Self::Item) -> Result<Acc, E>. Only one of them is the short-circuiting behavior. Thus, automatically specializing fold() for such functions to do short-circuiting would prevent any other kinds of folds with the same signature, and would be generally surprising.

Yes, that's good. But like I said above, it would be better to find a way to generalize fold() so that it can do the short-circuiting, without defining a new fold_ok().

In particular, we want to fold using a function (Acc, Self::Item) -> Result<Acc, E> such that the folding is terminated as soon as the function returns Err(e). We already have a fold function that takes a function with signature (Acc, Self::Item) -> Acc. Thus, I'm suggesting we define a combinator that converts a function of type (Acc, Self::Item) -> Result<Acc, E> into a function-like object of type (Result<Acc, E>, Self::Item) -> Result<Acc, E>. Let's say the function is named short_circuit and the function-like object it returns is of type ShortCircuit.

Then, the existing fold() could be specialized for ShortCircuit to end the fold early when the ShortCircuit returns Err(e).

Note that this may not be possible with the current type system of Rust. But, I'd say that it seems worthwhile to improve the type system to make this possible (if necessary) rather than adding a new fold_ok function.

Edit: Even better, if we define such a short_curcuit function that returns such a ShortCircuit function-like object, then the optimizer should often (in theory) be able optimize the function to do short-circuiting as long as the folding function is side-effect-free(-ish), AFAICT.

1 Like

Since my starting point was creating a common base for the internal iterator methods, and then improving forward/reverse correspondance, this is taking me far from the starting point. But so far it seems pretty exciting.

It does seem to work well with carrier, the “fold_ok” implementation would look like this:

/// Starting with initial accumulator `init`, combine the accumulator
/// with each iterator element using the closure `g` until it returns
/// an error or the iterator’s end is reached.
/// The last carrier value is returned.
fn fold_ok<C, G>(&mut self, init: C::Success, mut g: G) -> C
    where Self: Sized,
          G: FnMut(C::Success, Self::Item) -> C,
          C: Carrier,
{
    let mut accum = init;
    while let Some(elt) = self.next() {
        accum = g(accum, elt)?;
    }
    C::from_success(accum)
}

And let’s look at the complicated example with Take that recurses by switching “monad” instance, and that works with carrier as well. We just use it as a generalized Result, though:

struct Take<I> {
    n: usize,
    iter: I,
}

enum TakeStop<L, R> {
    Their(L),
    Our(R),
}

impl<I> Iterator for Take<I> where I: Iterator
{
    fn fold_ok<C, G>(&mut self, init: C::Success, mut g: G) -> C
        where G: FnMut(C::Success, Self::Item) -> C,
              C: Carrier,
    {
        if self.n == 0 {
            return C::from_success(init);
        }
        let n = &mut self.n;
        let result = self.iter.fold_ok(init, move |acc, elt| {
            *n -= 1;
            match g(acc, elt).translate() {
                Err(e) => Err(TakeStop::Their(e)),
                Ok(x) => {
                    if *n == 0 {
                        Err(TakeStop::Our(x))
                    } else {
                        Ok(x)
                    }
                }
            }
        });
        match result {
            Err(TakeStop::Their(e)) => C::from_error(e),
            Err(TakeStop::Our(x)) => C::from_success(x),
            Ok(x) => C::from_success(x)
        }
    }
}

Ok. It's clear why it wasn't there already: this didn't occur to me as a possibility at all! In fact, I still prefer working code concepts for wild ideas rather than discussing what might be possible.

Ok, I was tipped off that there is a new design for “Carrier” in the works. I’ve looked at (comment) and it does not yet seem to have the equivalent of C::from_success(x) as used in the code above, so I don’t see a nice implementation of “fold_ok” in terms of the new formulation yet.

This is an exciting proposal!

I see that with this proposal, internal iterators end up being a subset of external iterators; a next() implementation is still required. To be entirely honest, one of my own primary motivations for using internal iteration is the simple fact that implementing next() can be challenging! In the context of writing an application or computational code, one may often1 run into cases where there is a loop which ought to be abstracted out, but which may have a long runtime or a potentially unbounded number of iterations (preventing e.g. a Vec-returning cop-out):

fn each_simulation_step<F:FnMut(&State)>
        (initial_state: State, mut callback: F) -> State {
    // ...implementation which would be awfully nasty to turn into an Iterator
    // but is simple to turn into an internal iterator
}

If fold_ok was on a separate InternalIterator trait which did not require Iterator, then each_simulation_step could perhaps be rewritten to impl that trait, and code using it could benefit from the folds and searches.

However, I’m not sure how it would be possible to allow the existing methods on Iterator to be implemented in terms of an InternalIterator trait in a backwards compatible fashion, which likely spells the death for this alternative. Certainly, creating a common base for the existing methods on Iterator is a far more important goal, and not one to compromise!


1: I am, of course, speaking entirely for myself.

.next() can actually be implemented in terms of fold_ok (ugly and not important exactly how, but like this: .fold_ok((), |(), elt| Err(elt)).err())

So fold_ok requires updating the iterator’s state the same way. Implementing fold is much easier, because it consumes the iterator with no requirement to remember progress (but it must start off from the iterator’s current state, of course).

In this sense, I don’t think this is the simpler internal iteration that you are looking for, but maybe fold is.

I was about to ask a dumb question about why fold_ok takes a mutable receiver when there is by_ref, but then I realized the extremely obvious answer. Now I'll just leave this paragraph here for people Ctrl+F'ing "by_ref" to give them a hint to just think about it a little bit. :slight_smile:

Of course, even if this could be changed, it wouldn't help my use case, as it is the very capability itself of being able to use an iterator twice which is fundamentally at odds with having a simple implementation.

Edit: Hm, I spoke too soon. The "obvious answer" I was alluding to was that you cannot recover the modified state (or so I figured). I somehow concluded this on the basis that fold_ok doesn't return the modified iterator; but this reasoning is clearly nonsense, because the same circumstances also apply to next()!

Edit 2: ...and that's why next() also has a mutable receiver. Right. I think I should shut up now. :stuck_out_tongue:

An open question could indeed be if fold_ok should use a self receiver.

  • Advantage: Much easier to implement, no need to keep partially iterated state around
  • Disadvantage: Cannot use the specific implementation of fold_ok through &mut, cannot be the common base for any, all, find etc which use a &mut receiver.

My driving purpose for this RFC is as a largely silent way to make every other iterator method faster. I don’t even particularly care how easy it is to use, just how easy it is to write for a particular iterator type, because I don’t intend it to be used that much directly. Thus I don’t think it’s a good idea to make this more general, like extending it to the carrier trait or wrapping it around fold in some magic way, because those concerns cater to the user, not the implementor.

Similarly, I want this to be implemented on &mut self because if it isn’t there are a lot of things it can’t make faster. Ideally I even want the method to be such that if we ever get await/async style generators, they can implement this themselves.

5 Likes

My motivation was more or less the same (as indicated), but I embraced the Result formulation since it fits very well, and I think the same thing about the std::ops::Carrier formulation now, it just fits.

Now it might not be guaranteed that Carrier takes that path, but if it has an impl where there is no error branch, then that resolves to a non-short-circuiting fold by itself, kind of how @briansmith described it.

You don’t particularly gain anything from Carrier, but you lose a lot of helpful methods. For example, I might want to write take as, I dunno,

    let result = self.iter.fold_ok(init, move |acc, elt| {
        *n -= 1;
        match (g(acc, elt), *n > 0) {
            (Ok(acc), true) => Ok(acc),
            (last, _) => Err(last),
        }
    });
    result.or_else(|last| last)

Using Carrier would prevent this kind of composition.

If you look at post #9, I don’t think you lose much. fold_ok is generic over the carrier, so if you want to run a fold_ok with Result being the carrier, it implements that.