Recently I needed to return multiple elements from a flat_map
closure so I used once(x).chain(once(y))
, which turned out to be slower than returning vec![x, y]
instead. Surprising, given that std::iter::Chain
has fold
and try_fold
implementations, so internal iteration should be fast.
So I copied the implementations of the std::iter::Once
and std::iter::Chain
types to my crate and used those instead, and without changing them at all it made the chain implementation as fast as using a vector. Weirdly enough, using either of the standard library’s Once
or Chain
types will prevent the speedup – I have to copy both types to my own crate for it to be fast.
Here are some benchmarks:
test benches::bench ... bench: 3,507,853 ns/iter (+/- 433,627)
test benches::bench_my_chain ... bench: 3,502,656 ns/iter (+/- 384,330)
test benches::bench_my_once ... bench: 3,482,771 ns/iter (+/- 373,445)
test benches::bench_my_both ... bench: 635,188 ns/iter (+/- 82,182)
test benches::bench_vec ... bench: 641,547 ns/iter (+/- 42,438)
Here’s the benchmarking code. I’ve excluded the implementations of irrelevant traits and methods of the Chain
and Once
types for brevity, they did not affect the benchmarks. Note that you’ll have to run them locally since the playground doesn’t support benchmarks.
Any idea how this was able to cause a speedup? The types themselves seem to be fine, but something is stopping an optimization from happening. I thought that maybe the issue was that some methods aren’t being inlined, but adding the #[inline]
attribute to the methods of Chain
and Once
that are being called didn’t change the results.