Recently I needed to return multiple elements from a flat_map closure so I used once(x).chain(once(y)), which turned out to be slower than returning vec![x, y] instead. Surprising, given that std::iter::Chain has fold and try_fold implementations, so internal iteration should be fast.
So I copied the implementations of the std::iter::Once and std::iter::Chain types to my crate and used those instead, and without changing them at all it made the chain implementation as fast as using a vector. Weirdly enough, using either of the standard library’s Once or Chain types will prevent the speedup – I have to copy both types to my own crate for it to be fast.
Here are some benchmarks:
test benches::bench ... bench: 3,507,853 ns/iter (+/- 433,627)
test benches::bench_my_chain ... bench: 3,502,656 ns/iter (+/- 384,330)
test benches::bench_my_once ... bench: 3,482,771 ns/iter (+/- 373,445)
test benches::bench_my_both ... bench: 635,188 ns/iter (+/- 82,182)
test benches::bench_vec ... bench: 641,547 ns/iter (+/- 42,438)
Here’s the benchmarking code. I’ve excluded the implementations of irrelevant traits and methods of the Chain and Once types for brevity, they did not affect the benchmarks. Note that you’ll have to run them locally since the playground doesn’t support benchmarks.
Any idea how this was able to cause a speedup? The types themselves seem to be fine, but something is stopping an optimization from happening. I thought that maybe the issue was that some methods aren’t being inlined, but adding the #[inline] attribute to the methods of Chain and Once that are being called didn’t change the results.