[Pre-RFC] Iterator adaptor for handling intermediate Results

iox · May 28, 2019, 1:23am

When writing iterator chains, I often run into awkwardness in handling fallible steps along the chain. For example, I often want to write something like the following:

let files = vec!["foo.txt", "bar.txt", "baz.txt"];
let sum: i64 = files
    .into_iter()
    .map(File::open)
    // do something to handle unopened files
    .map(BufReader::new)
    .flat_map(BufReader::lines)
    // do something to handle unreadable lines
    .map(|line| line.parse::<i64>())
    // do something to handle bad parses
    .sum();
println!("{}", sum);

If I want to ignore errors entirely, this is easy: all of the // do something lines can be .filter_map(Result::ok). But I find that the behavior I want is often to iterate down the chain, stopping at the first Err value, and returning that error. The best idiom I’ve seen for this is the following:

let mut open_err = Ok(());
let mut read_err = Ok(());
let mut parse_err = Ok(());
let files = vec!["foo.txt", "bar.txt", "baz.txt"];
let sum: i64 = files
    .into_iter()
    .map(File::open)
    .map(|res| res.map_err(|e| open_err = Err(e)))
    .filter_map(Result::ok)
    .map(BufReader::new)
    .flat_map(BufReader::lines)
    .map(|res| res.map_err(|e| read_err = Err(e)))
    .filter_map(Result::ok)
    .map(|line| line.parse::<i64>())
    .map(|res| res.map_err(|e| parse_err = Err(e)))
    .filter_map(Result::ok)
    .sum();
open_err?;
read_err?;
parse_err?;
println!("{}", sum);
Ok(())

There are a few problems with this approach. First and foremost, you have to manually check the Results, and if you don’t you print an incorrect partial sum. Secondly, this error handling pattern is pretty hard to discover (I’m having trouble just finding the blog post where I originally found it), and it’s pretty noisy. I don’t have a good solution to the first issue (and would love to see one), but to help noise and discoverability I propose an iterator adaptor PutErr which does the .map(|res| res.map_err(|e| parse_err = Err(e))).filter_map(Result::ok) part. With this, the above chain looks like this:

let sum: i64 = files
    .into_iter()
    .map(File::open)
    .put_err(&mut open_err)
    .map(BufReader::new)
    .flat_map(BufReader::lines)
    .put_err(&mut read_err)
    .map(|line| line.parse::<i64>())
    .put_err(&mut parse_err)
    .sum();

I’ve implemented this as an addition to bluss’s itertools crate here, but before submitting it there I thought I would see if the libs team had any interest in adding something like this to the standard library. So, what do you think? My biggest concern is the need to manually check the Results, so I’d be interested if anyone has come up with a better idiom there.

CAD97 · May 28, 2019, 2:21am

Unfortunately, this pattern falls short of one thing that .collect<Result<Vec<_>, _>>() gets right: actually stopping on the first error. In the above you go through every item and then check for errors.

The imperative version (unchecked) for comparison:

// whatever accumulation you care about
let mut acc = Accumulator::default();
for file in files {
    let file = File::open(file)?;
    let reader = BufReader::new(file)?;
    for line in reader.lines() {
        let line = line?;
        let val: Value = line.parse()?;
        acc += val;
    }
}

With temporary vecs, which serializes the process rather than pipelines it (unchecked):

let files: Vec<_> = files.into_iter()
    .map(File::open).collect::<Result<_, _>>()?;
let readers: Vec<_> = files.into_iter()
    .map(BufReader::new).collect::<Result<_, _>>()?;
let lines: Vec<_> = readers.into_iter()
    .flat_map(BufRead::lines).collect::<Result<_, _>>()?;
let vals: Vec<_> = lines.into_iter()
    .map(FromStr::parse).collect::<Result<_, _>>()?;
let acc: Accumulator = vals.into_iter().collect();

Version that maintains the errors but almost certainly needs (a lot) more type hints (unchecked):

let values: Vec<_> = files.into_iter()
    .map(|file| Ok(File::open(file)?))
    .map(|file| Ok(BufReader::new(file?)?))
    .map(|reader| Ok(reader?.lines()))
    .map(|lines| Ok(lines?.map(|line| Ok(line?.parse()?))))
    .collect::<Result<_, _>>()?;
// There's still a level of nesting to remove
let acc: Accumulator = values.into_iter()
    .flatten().collect::<Result<_, _>>()?;

Conclusion: I think we could benefit from Iterator::and_then<B>(self, impl FnOnce(Item::Ok) -> Result<B, impl Into<Item::Err>>) where Item = Result or similar that applies the Result::and_then operation to all results in the stream along with a ?-style conversion, as well as a flatmap variant. As well as a fairly obvious .map_ok() as well.

I’m just not certain whether these “specializations” can be provided in a nice manner by core. Here’s how I think it could look (this should be type hinted enough):

let acc: Accumulator = files.into_iter()
    .map(File::open)
    .map_err(FnError::into)
    .and_then(BufReader::new)
    .flatand_then(BufRead::lines)
    .and_then(FromStr::parse)
    .collect::<Result<_, _>>()?;
// or also
let acc: Accumulator = files.into_iter()
    .map(|file| Ok(File::open(file)?))
    // ..
    .collect::<Result<_, FnError>>()?;

newpavlov · May 28, 2019, 2:32am

I believe that ideally cases like this should be handled by generators, i.e. you create a Generator<Yield=T, Return=io::Result<()>> and use combinators on it. The semantic difference between such generators and Iterator<Item=io::Resutl<T>> was briefly discussed in this thread.

CAD97 · May 28, 2019, 2:43am

Generators are a long way off and still haven’t been RFC’d (the one you might be thinking of in rebuttal to that statement is an eRFC), so there’s no guarantee they’ll even ever make it to the surface language. (Though there definitely is interest in bringing them here.) There’s still value in discussing how to do things today with the Iterator trait. We have Result<impl FromIterator, _>: FromIterator today for just this short circuit behavior case.

BufRead::lines at least is already -> impl Iterator<Item=io::Result>, so we have to live with that. (I think ideally it’d be -> impl Generator<Yield=Result<String, (UTF8Error, Vec<u8>)>, Return=io::Result>, though?)

iox · May 28, 2019, 2:48am

Unfortunately, this pattern falls short of one thing that .collect<Result<Vec<_>, _>>() gets right: actually stopping on the first error. In the above you go through every item and then check for errors.

Oh, good catch. My example implementation does the "right thing" here by returning None on the first Err value, but I had missed that .map(|res| res.map_err(|e| parse_err = Err(e))).filter_map(Result::ok) does not do that.

newpavlov · May 28, 2019, 2:54am

My argument is that we probably should not introduce half-baked ad hoc solutions into std while there are better alternatives on (far away) horizon. Either they will be deprecated on extended Generator introduction, or in the worst case scenario they will conflict with Generator in some way. Yes, ad hoc solutions are easier in the short term, but in the long term they are hazardous and will relieve some pressure which could power work on generators.

I think it should be possible to create an iterator wrapper which will implement short-circuiting behavior in a separate crate.

CAD97 · May 28, 2019, 3:10am

Here's an example of an Iteresult::and_then Iterator extension. A flat version shouldn't be too different, just more type variables.

I don't really see a future for Rust where Iterator ⟺ Generator<Yield=Item, Return=()> currently due to all of the type resolution problems it could cause, and it'd be very strange for it to be implied one way and not the other. I can see interop via IntoIterator/IntoGenerator, but I doubt there'll exist blanket impls unless a default impl in both directions are tenable.

And as BufRead::lines shows, Iterator<Item=Result> is still useful, as trivially "transient" errors exist in streams, but sometimes you do want to stop on the first error, so it is still worth providing Iterator<Item=Result> utilities.

CAD97 · May 28, 2019, 3:14am

Itertools::map_results is “map_ok”. An issue results to track “and_then” and “and_then_results” functions.

EDIT: Put an implementation of Itertools::and_then on the linked issue.

HeroicKatora · May 28, 2019, 3:35am

I think we can nicely abstract away the collection into a predetermined place already. And don’t have to restrict ourselves to the iterator combinators given. Maybe this is something that the iterator tools library is interested in adding, I might have used the function below once or twice already as well or would have if it had existed.

// Ugh, explicit lifetime bounds but we borrow the result collector.
fn split_err<'a, I: 'a, E>(
    iter: impl Iterator<Item=Result<I, E>> + 'a,
    err: &'a mut impl Extend<E>,
) -> impl Iterator<Item=I> + 'a {
    iter
        .map(move |res| res.map_err(|e| err.extend(iter::once(e))))
        .filter_map(Result::ok)
}

If this is a trait function for itertools this would work nicely. Although Result doesn’t implement Extend (by the way, maybe it should, same with Option) but we can just invent our own type.

enum Collector<T> {
    None, // Default here.
    Last(T),
}

impl<T> Extend<T> for Collector<T> {
    fn extend<A>(&mut self, iter: A) where A: IntoIterator<Item=T> {
        for item in iter.into_iter() { *self = Collector::Last(item) }
    }
}

Full code then exactly as wanted, but avoiding newly introducing a trait while keeping it generic over keeping only the last error or collecting all errors into a Vec<_>.

let mut open_err = Collector::None;
let mut read_err = Collector::None;
let mut parse_err = Collector::None;
let sum: i64 = files
    .into_iter()
    .map(File::open)
    .split_err(&mut open_err)
    .map(BufReader::new)
    .flat_map(BufReader::lines)
    .split_err(&mut read_err)
    .map(|line| line.parse::<i64>())
    .split_err(&mut parse_err)
    .sum();

iox · May 28, 2019, 4:17am

I think the big difference between put_err and and_then is that the latter requires each subsequent adaptor method to deal with the fact that it is being given a Result. As I see it, the benefit there is that you can’t forget to deal with the Err case, but the downside is that it hurts composability in the happy path. In the linked example, this is evident in the need to use the more awkward try_fold instead of sum. Personally, I would lean towards enabling more composability with put_err, but I can definitely see the argument for having and_then instead.

HeroicKatora · May 28, 2019, 5:46am

For the sake of argument, here is an implementation on the playground. It uses the trait bound of impl Extend<E> for the argument to the error sink which I think is arguably much more generic since it permits collecting errors into a Vec<_> instead of only remembering the last one. Sadly not implemented for Result<_, E> but that has a simple replacement.

cuviper · May 28, 2019, 6:42am

FWIW, there is an impl Sum for Result too.

system · August 26, 2019, 6:50am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Refinement: Iterator adaptor for handling intermediate Results libs	13	1125	June 15, 2022
Post: Fallible Iterator Adapters	20	2536	July 16, 2020
impl<T, E, Ts, Es> FromIterator<Result<T, E>> for Result<Ts, Es> libs	7	373	March 6, 2024
Working with fallible iterators libs	9	1144	November 4, 2022
Idea: Fallible iterator mapping with `try_map` language design	11	4422	May 17, 2022

[Pre-RFC] Iterator adaptor for handling intermediate Results

Related topics