I found that .map(f).collect::<Vec<T>>()
is as fast as the loop version, but not .map(f).try_collect
.
The code:
#![feature(iterator_try_collect)]
#![feature(test)]
extern crate test;
use test::Bencher;
#[inline(never)]
fn handle(v: i32) -> Option<i32> {
Some(v)
}
fn f1(v: &Vec<i32>) -> Option<Vec<i32>> {
v.into_iter().map(|x| handle(*x)).try_collect()
}
fn f2(v: &Vec<i32>) -> Option<Vec<i32>> {
let mut result = Vec::with_capacity(v.len());
for val in v {
let res = handle(*val)?;
result.push(res);
}
Some(result)
}
static BENCH_SIZE: usize = 100;
#[bench]
fn bench_f1(b: &mut Bencher) {
let v = test::black_box(vec![1; BENCH_SIZE]);
b.iter(|| {
f1(&v);
})
}
#[bench]
fn bench_f2(b: &mut Bencher) {
let v = test::black_box(vec![1; BENCH_SIZE]);
b.iter(|| {
f2(&v);
})
}
The result tested on my macOS:
test bench_f1 ... bench: 403 ns/iter (+/- 244)
test bench_f2 ... bench: 158 ns/iter (+/- 82)
After some investigations, I found that the implementation of collect<Result<>>
is a pessimistic version, which will assume the residential is a likely path so the Vec
will not use the TrustedLen
specified version. In detail, the internal iterator GenericShunt
can't implement TrustedLen
, and the lower bound of its size_hint
is always 0 (Error on the first element). As the result, the Vec can't initialize all items at once.
But we all know that Try::Output
is the likely path in almost all cases, should we implement that as an optimistic version? Otherwise we have to expand the loop manually.