An optimistic version of collect::<Result<T, E>>?

TennyZhuang · October 18, 2022, 8:01am

I found that .map(f).collect::<Vec<T>>() is as fast as the loop version, but not .map(f).try_collect.

The code:

#![feature(iterator_try_collect)]
#![feature(test)]
extern crate test;
use test::Bencher;

#[inline(never)]
fn handle(v: i32) -> Option<i32> {
    Some(v)
}

fn f1(v: &Vec<i32>) -> Option<Vec<i32>> {
    v.into_iter().map(|x| handle(*x)).try_collect()
}

fn f2(v: &Vec<i32>) -> Option<Vec<i32>> {
    let mut result = Vec::with_capacity(v.len());
    for val in v {
        let res = handle(*val)?;
        result.push(res);
    }
    Some(result)
}

static BENCH_SIZE: usize = 100;

#[bench]
fn bench_f1(b: &mut Bencher) {
    let v = test::black_box(vec![1; BENCH_SIZE]);
    b.iter(|| {
        f1(&v);
    })
}

#[bench]
fn bench_f2(b: &mut Bencher) {
    let v = test::black_box(vec![1; BENCH_SIZE]);
    b.iter(|| {
        f2(&v);
    })
}

The result tested on my macOS:

test bench_f1 ... bench:         403 ns/iter (+/- 244)
test bench_f2 ... bench:         158 ns/iter (+/- 82)

After some investigations, I found that the implementation of collect<Result<>> is a pessimistic version, which will assume the residential is a likely path so the Vec will not use the TrustedLen specified version. In detail, the internal iterator GenericShunt can't implement TrustedLen, and the lower bound of its size_hint is always 0 (Error on the first element). As the result, the Vec can't initialize all items at once.

But we all know that Try::Output is the likely path in almost all cases, should we implement that as an optimistic version? Otherwise we have to expand the loop manually.

steffahn · October 18, 2022, 8:19am

Or prepare the Vec with the desired capacity and use extend?

RinChanNOWWW · October 18, 2022, 12:30pm

Related issue: Collecting into a Result<Vec<_>> doesn't reserve the capacity in advance · Issue #48994 · rust-lang/rust · GitHub

scottmcm · October 18, 2022, 4:06pm

Only the lower-bound of the size_hint is really ever used today: Is .size_hint().1 ever used?

Personally, I think the answer is that size_hint is the wrong API for reserveing -- there should be something like reserve_guess with weaker guarantees so that it can both overestimate and underestimate, depending on what the adapter expects to happen.

TennyZhuang · October 18, 2022, 11:13pm

reserve_guess is very hard for filter to implement, there are too many cases so that any guess is not good enough.

I’d prefer to implement two versions, collect_optimistic and collect_pessimistic, which will use size_hint.0 and size_hint.1, and leave collect as an opaque implementation (even leave for PGO?). When in the critical path, users can choose the version manually.

The collect can use a specialized implementation for Result<I, E> that forward to optimistic version, and keep the original behavior for other types.

We can also introduce a new unsafe trait TrustedUpperBound for optimization.

toc · October 19, 2022, 3:03am

Perhaps as a stopgap a with_size_hint function could be added to Iterator which injects the appropriate hints. This would be useful for me in typically unguessable cases like filter, sometimes I know that few/most items will be filtered.

scottmcm · October 19, 2022, 4:42am

My thoughts on with_size_hint: https://github.com/rust-lang/rust/issues/68995#issuecomment-588569824.

TennyZhuang · November 26, 2022, 5:21am

I don’t think with_size_hint is a good idea. Sometimes, the size is derived with some complicated logical, e.g. repeat, array_chunks. If we need to provide the size_hint value explicitly, this is a violation to DRY. Personally, I’d prefer let users to specify optimistic or pessimistic.

system · February 24, 2023, 5:21am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Idea: Fallible iterator mapping with `try_map` language design	11	4422	May 17, 2022
Idea: APIs for less `into_iter()`s & `collect::<Vec<_>>()`s libs	4	671	October 5, 2023
[pre-RFC] TryFromIterator and try_collect to enable collecting to arrays libs	26	2392	September 8, 2021
impl<T, E, Ts, Es> FromIterator<Result<T, E>> for Result<Ts, Es> libs	7	373	March 6, 2024
Should we add `Iterator::try_collect()`? libs	7	3297	November 10, 2019

An optimistic version of collect::<Result<T, E>>?

Related topics