Post: Fallible Iterator Adapters

Hey all, I wrote a post surveying all existing adapters on std::iter::Iterator and checked whether they're capable of propagating errors upwards, or if they have a counterpart that can.

The tldr is that out of 60 adapters, 30 adapters take closures, of which 20 don't support fallible operations. Practically that means that for 1/3rd of methods on iterator there is no way to handle errors other than calling unwrap inside of them, which means it's usually better to write a manual loop instead.

This post was motivated by a friend sharing their frustration at wanting to use Iterator::min_by using fallible operations for use in their database, but not being able to do so without unwrapping, which meant they had to use a manual for loop instead. They mentioned this not being the first time they ran into this, which also matches my experiences.

Either way, this seemed relevant for internals since it covers the stdlib. Hopefully this comes in useful. Thanks!

https://blog.yoshuawuyts.com/fallible-iterator-adapters/

9 Likes

For the conservative position, one point in favor of counting methods is that we don't want to bloat trait objects, making everyone pay for codegen of fringe methods that few use. However this doesn't apply to generic trait methods, which are excluded from dyn objects.

2 Likes

If a person wants to perform IO operations while filtering some input, they should be able to, without needing to start over and write manual loops from scratch.

It sounds like this is an axiom for the post's point of view, but I don't necessarily agree. It would be good to hear some justification for this rather than assuming it as an axiom.

The methods on Iterator are intended to provide conveniences for particularly common patterns. I think it's wrong to expect that it would be a goal to cover all variations of use cases with Iterator methods. Loops are good too, and in the language for a reason, and probably undervalued due to largely unjustified hype about Iterator combinators. When it comes to relatively less common use cases like try_min_by I think it's fair to expect the programmer to write a simple loop using ? or express it using try_fold or write their own extension trait and adapter or get it from crates.​io, depending on how often this needs to appear in their code.

Good proxies for how commonly some proposed Iterator method would be useful are whether it's ever been requested in itertools, whether it exists in itertools, and how many uses of the method from itertools appear on GitHub. As far as I can tell none of the missing "fallible counterpart" methods are provided by itertools, and from my attempts at searching their issues I don't think there have been any requests for any of them, which is a useful data point.

11 Likes

This post was motivated by a friend sharing their frustration at wanting to use Iterator::min_by using fallible operations for use in their database, but not being able to do so without unwrapping, which meant they had to use a manual for loop instead.

It's certainly possible to address this by jamming the Iterator trait with more and more niche methods in an attempt to find the optimum between everyone-finding-what-they-need and nobody-being-able-to-find-what-they-need-anymore.

I wonder if it's better to consider instead how to defuse the hype that causes people to feel frustrated when their use case isn't among the most common set.

7 Likes

Wrt to fallability, would it be enough to have a method that returns a result and accepts a closure of what you want to on the iterators assuming everything succeeded?

Something that could be used like:

let biggest_file = my_vec
    .into_iter()
    .map(|file| Ok((get_file_len(&file)?,file)))
    .fallible(|len_file_iter| len_file_iter
        .max_by_key(|(len, _)| len).1
    )?

I'm not sure what the signature would need to be. But here's how I imagine it working:

  • Fallible only works on Iter<Item=Result<,>>
  • Fallible wraps the iter into one that unwraps the result into ok value passing it to the closure.
  • Fallible calls the closure on the wrapped iter
  • The wrapped iterator stops when it encounters an error, returning None, and storing the error.
  • If no error was encountered, return result of the closure, in Ok; otherwise discard the closure result and return the err value in Err.

Granted, I don't think it'll work with lifetimes very well, and doesn't seem like a serious solution. But I wanted to throw it out there. It seems like error throwing vs non-error throwing methods on types are sort of like the red/blue method color problem. And it was fun to throw out a solution to bridge the two lands instead of the usual approach.

Edit: This doesn't work well if there is something inside the fallible closure that is expensive and should only really be called upon the end of successful iteration, but nothing comes to mind except collecting into a container from a fixed size collection. (I can't imaging the wrapped iter preserving a know size)

This seems to be an issue with Option as well, I keep finding people on users.rust-lang.org who have have a problem that can be solved with a single method in Option, but there are so many methods on Option that they didn't even realize that it existed! I think we should be more cautious when adding methods so that we don't overwhelm people with the sheer number of things to look at.

Option is a goldmine filled with trinkets, but only if you know where to look. Said another way, Option's docs is an impenetrable wall of text where only a few can find what they need, and not the one's who need it most.

Iterator is already showing signs of this, and I don't think it is wise to add too many more methods, least it suffers the fate of Option, an impenetrable goldmine.

10 Likes

Then the need is for a structured guide to the methods of Option. This seems to be more of a concept keyword search issue. In other words, the need is for clustering of Option methods perhaps via some sort of multi-dimensional index.

3 Likes

As an aside, there are 5 different methods in Option that could be replaced with a null coalescing operator (which would compose more organically than having to remember 5 different names for the same concept).

But Rust isn't really moving that direction so far.

Agreed.

I've just spent a year working on a C++ codebase, where manual loops or .begin()/.end() pairs were the norm. The only iterator operations I really found myself wanting we sorted and filter.

(filter especially has a very high product of "how useful it is" times "how annoying it is to rewrite it using loops")

1 Like

It sounds like this is an axiom for the post's point of view, but I don't necessarily agree. It would be good to hear some justification for this rather than assuming it as an axiom.

Sure thing. The fact that these methods are part of stdlib indicates they're useful, and the fact that they don't work within fallible (or async) contexts seems like a bug. I'm not really thinking of this as adding new functionality to Iterator, but allowing existing functionality operate in fallible contexts as well.

Allowing the same methods to work in async contexts is already in the process of being solved: the Stream trait is an async version of Iterator and async versions of all Iterator methods exist in the wild. I believe there's general consensus that a lot of this should eventually make its way into std. Which is nice, because it allows existing sync logic to be ported to async mostly by sprinkling in some async/.await.

To answer the question of: "Why not use loops instead?" – if a method has been included in std already then there's likely a good reason for that. Whether it's difficulty of implementation, brevity, or the ability to introduce a common idiom for operations. Writing the same method by hand does not have any of those benefits, and being forced to do so just because a user also wants to handle errors seems like a bug.


I probably should've been clearer in my post: I don't necessarily want to argue that we should add all fallible counterparts to infallible adapters here and now. But more that if we consider these limitations to be bugs, to what degree can we fix it? The most logical step seemed to be to add fallible counterparts to existing methods. And the first step in having that conversation would be to enumerate which adapters don't work in fallible contexts, which was the main objective of my post.

So far I've seen two clear downsides to adding fallible counterparts brought forward in this thread:

  • performance: Methods that don't take generic arguments could end up bloating trait objects.
  • discoverability: Adding more methods could lead to people no longer being able to find what they need.

There are problably more reasons pro / against, and I would love to understand them better.


Before closing out it might be useful to spell out: I've noticed two different perspectives in the thread so far. My attempt to summarize them would be:

  • These methods have already proven to be useful by virtue of inclusion in the stdlib, the fact that they don't work in all contexts is a bug to be fixed.
  • These are new methods that have not yet proven to be useful. The way to prove their use is for example to look at ecosystem implementations and individual demand.

I think both perspectives agree on that only proven methods should go into the stdlib. But we disagree on what that means. I think it's interesting that we get to talk about this from different perspective, and think there are probably good arguments to be made for both viewpoints!

The usual suggestion I see for this is

Any thoughts on why it would be better or worse to use that, vs more methods on Iterator?

4 Likes

@scottmcm: The fallible-iterator crate is more general because it supports a fallible next method in addition to fallible closures; the blog's "fallible counterpart" methods use fallible closures but only work if next is infallible.

Interestingly fallible-iterator used to be only focused on fallible next, and for its first 3 years it still only took infallible closures. Throughout that time it was used in many postgres-related crates. The discussion around generalizing to fallible closures is in sfackler/rust-fallible-iterator#6 -- only one user expressed a need for them in the 6 months that the issue was open, but I think it was accepted because it didn't require duplicating try_ and non-try_ versions of everything since the crate was built around fallible next already anyway.

2 Likes

What methods are those? or, or_else...maybe unwrap and friends?

I was thinking of map_or, map_or_else, unwrap_or, unwrap_or_default, and unwrap_or_else.

or, or_else (and maybe ok_or and ok_or_else) could be replaced with a JS-style short-circuiting OR operator, which isn't quite the same as null coalescing.

Does Haskell have any generic solution to this issue of composing the Iterator and Result monads?

I looked a bit into it, and it appears their "solution" is the ExceptT monad transformer, that however essentially ignores errors, so basically they implement Iterator<Item = T> for ExceptT<Iterator<Item = Result<T, E>>> as if they did a iter.filter_map(|x| x.ok()), which is obviously not what is desirable in general.

But I'm not very familiar with it, so maybe I missed a way to actually get result propagation as in Rust without writing the code yourself.

Hm, trying to visualize this. So, essentially, if there was a null coalescing operator ??, then (where optT is an Option<T> and t is an unwrapped T) optT ?? t evaluates to a T and optT1 ?? optT2 evaluates to an Option<T>? So then the following methods would be equivalent to:

  • opt.unwrap() => opt ?? panic!("tried to unwrap a None value")
  • opt.unwrap_or(1) => opt ?? 1 (but the 1 is, presumably, executed lazily)
  • opt.unwrap_or_else(|| 1) => opt ?? 1
  • opt.unwrap_or_default() => opt ?? Default::default()
  • opt1.or(opt2) => opt1 ?? opt2 (but, again, opt2 is lazy)
  • opt1.or_else(|| opt2) => opt1 ?? opt2
  • opt.map_or(|t| t + 1, 1) => opt.map(|t| t + 1) ?? 1 (lazy)
  • opt.map_or_else(|t| t + 1, || 1) => opt.map(|t| t + 1) ?? 1

What do you mean?

To answer the question about Haskell, you need to specify what you mean with Iterator in the context of Haskell. AFAIK, usually you operate on data structures directly via higher order functions, you don’t need some of the things that Rust’s iterators offer like not iterating the whole list (lazyness got you covered), or mutating things (mutation won’t usually happen in Haskell’s data structures). For performance, turning multiple operations into a single tight loop, there’s optimizations called “fusion” that are utilized by some libraries.

The iteration API for data structures revolves around the type classes (i.e. traits) Functor, Foldable, and Traversable. The main function from Traversable, traverse allows iteration with monadic effects like errors. (Edit: By “iteration” I mean a map operation in the previous sentence; traverse is also aliased as mapM with a (for historic reasons and possibly performance reasons) slightly less general type.) But these type classes don’t offer generic approaches for stuff like filtering. For these kind of operations there is usually an API inspired by the standard libraries Data.List’s API that other list-like types reproduce. These usually include monadic variants of all kinds of operations.

Lets look at the types of filter on lists and its monadic variant filterM in detail:

filter :: (a -> Bool) -> [a] -> [a]

could be translated into Rust syntax

fn filter<A>(f: impl Fn(A) -> bool, l: List<A>) -> List<A>

And filterM (in the module Control.Monad)

filterM :: Monad m => (a -> m Bool) -> [a] -> m [a]

is something you can’t translate into Rust directly, since Rust doesn’t support traits with higher-kinded types (i.e. types that take Parameters) as Self. Monad is such a trait. If Rust did support those, it would perhaps look like

fn filterM<M: Monad, A>(f: impl Fn(A) -> M<bool>, l: List<A>) -> M<List<A>>

In particular you can apply filterM as if it had the type

filterM :: (a -> Either e Bool) -> [a] -> Either e [a]

or

filterM :: (a -> Maybe Bool) -> [a] -> Maybe [a]

since both Either e and Maybe are instances of (i.e. they implement) Monad.

These type signatures could be translated as:

fn filterM<E, A>(f: impl Fn(A) -> Result<bool, E>, l: List<A>) -> Result<List<A>, E>

and

fn filterM<A>(f: impl Fn(A) -> Option<bool>, l: List<A>) -> Option<List<A>>
4 Likes

It would be more like

  • opt1.or(opt2) => opt1 || opt2
  • opt1.or_else(|| opt2) => opt1 || opt2

(the idea being that "??" takes a nullable value on the left side and a raw value on the right side, while "||" takes a nullable on each side)

EDIT: Yes to everything else.

2 Likes

I've just realized we have Option::map_or, which is a little bit concerning. Why ? It's a composition of map + unwrap_or.

1 Like

Any chance r.map_or_else(f,g) optimizes better than r.map(f).unwrap_or_else(g)? Also r.map_or(x,f) vs f.map(f).unwrap_or(x)?

If not, we might reduce their visibility in the docs page for Option and suggest these alternatives.

Related https://github.com/rust-lang/rfcs/pull/2897