Proposal: implement `iter::Chain` using `Fuse<>`

Chain currently has a ternary flag to track which of its two sub-iterators it's exhausted, and thus should not call again. In practice, maintaining this correctly is very subtle in anything that can early-exit.

Proposal: Change Chain from

pub struct Chain<A, B> { a: A, b: B, state: ChainState }

to

pub struct Chain<A, B> { a: Fuse<A>, b: Fuse<B> }

That would have a bunch of advantages:

  • All the methods are simpler. For example, count is just a.count() + b.count(), no match. Similarly, even things like fold are easy as it can just fold both of them. It means no state maintenance in &mut self methods like next or try_fold. And less code is generally good for compile times.
  • For the common case of iterators that are also fused, the type is smaller, as there's no extra flag & padding. Thanks @Yato for pointing out that this is incorrect given the current state of specialization.
  • For already-fused iterators, it's probably cheaper to just look at them than to match on the state flag then look at them.
  • Chain would itself always be fused (rather than only when both sub-iterators are fused)
  • Two binary flag checks are probably not materially different from the one ternary flag check, perf-wise, if neither underlying iterator is fused.
  • Because of the multiple match arms, many of the methods have multiple callsites for the same side. (For example, Chain::next has two callsides for a.next() and two more for b.next().) Removing the match would eliminate that, potentially reducing code side in the common case of inlined methods (like next) typically is.

Downsides:

  • If neither iterator is fused, the struct is slightly larger as two state flags (+ padding) are bigger than one (+ padding).
  • Tree-nested chains have more work to do in .next(), as they need to look at all the sub-iterators to know to skip that side, rather than just checking the state flag and skipping them.
    Mitigations:
    • Long chains are somewhat of an anti-pattern regardless, and are more often linear than tree'd
    • One could wrap a side of a tree with an explicit fused flag to allow skipping that side, if one really wanted it
    • Anything using internal iteration would be unaffected by this, as doing so automatically splits the loops. (And Chain is already one of the go-to examples where internal iterator is worthwhile.)
  • If we eventually made Chain variadic, we might end up undoing this change as a numeric range for which iterators have been exhausted would then be valuable again. (But that could still easily be fused -- because the numeric range would empty -- so doesn't impact anything concretely today. And it's not like variadics are coming soon enough that this change would get undone a few weeks after being made.)

Thoughts? Worth trying? I'd make the initial PR not include the Fused bound expansion so it could be reverted at any time if problems do happen.

As a simple demo, https://rust.godbolt.org/z/F4FQDk shows a basic for loop that shrinks from 48 lines of assembly to 23 by removing the state maintenance.

1 Like

Why not try it and benchmark the differences? If someone later wants to write an RFC for variadic Chain, they can evaluate optimizing that presumably-rare code as part of their RFC.

1 Like

Note: std::iter::Fuse always carries around a bool flag, even if the inner Iterator implements FusedIterator. So this flag is always there. Currently, specialization can't properly handle specialized associate types, so this can't easily be changed. The chain state combines these two flags into one.

1 Like

Wouldn't each branch of the tree have Fuse wrappers too? When an inner chain has exhausted both its sides, it will return None, setting its fuse too. Then that subtree will never be visited again.

Has Fuse ever tried wrapping with Option instead of a manual flag? Then it could take advantage of niche-filling.

(Yes in layout, as Yato pointed out, but) no in runtime checks, because of specialization:

That sounds like an awesome idea. I should try that first, as separable work. Even if it was tried before, slice::Iter looks like it only got NonNull a few months ago.

Ugh, I didn't think of how the specialized versions would ignore the flag. But the Option idea might work directly to avoid that, like a mini-fuse:

struct Chain<A, B> { a: Option<A>, b: Option<B> }

That would still require some manual bookkeeping, but may be easier than the current ternary.

2 Likes

Here's my attempt at Chain with Options, but I haven't benchmarked it yet.

3 Likes

I'm glad you tried it out. It still seems to require the "remember to set to None on the correct side of the ?", though, and it's disappointing that it looks an awful lot like "fuse, but twice".

Maybe we could just do Fuse<HideFused<A>>, where HideFused just forwards everything other than the FusedIterator bound? Though of course this is all predicated on the flag even being needed in real code...

1 Like

Although I don't know how significant it is, it does change the behavior slightly, as A and B are dropped as soon as they return None rather than when the Chain itself is dropped.

Here's a new attempt with Chain built on Fuse and a Defuse wrapper that just hides FusedIterator, and also converting Fuse to use Option. The latter requires some unsafe code to assume it's always Some, versus just ignoring a done flag, which is why I isolated this in a new module.

2 Likes

Wow, this change looks amazing! This kind of diff is exactly what I was hoping for.

This makes me think if unwrap_unchecked should exist, but that's a different question. It's certainly not needed, as this shows.

Very nice touch. Honestly this looks ready for a PR already (at the very least as a way to run a perf run against it, but I'd be happy for it to go further than that).

I suggest adding a comment in the code explaining why the Defuse wrapper is needed.

Great! Here you go:

1 Like

The btree implementation does have this, so that could indeed find more general use.