Add "iterate with separators" iterator function

I often see the need to handle "items with separators". Some are so common that we have specific functions like join() and intersperse, but there is no "generic" functional pattern. I even asked about it on stackoverflow, but no good solutions were given, so I would like to propose a new function:

let items = vec!["Hello", "World", "!"];
items.foreach_with_separator(
    |elem| print!("{elem}"),  // This is called on each element
    || print!(", "), // This is called "between" each element
);

This approach allows arbitrary complex element handling, as well as any kind of separation logic, without duplicating element handling -- I often see it duplicated for the first item and for subsequent ones (or all_items + last item). This will also avoid the need to maintain the boolean state of "is this the first element or not".

The separator callback might benefit from a few additional params:

  • index parameter counting separator callback invocations - |_idx: usize| print!(", "),
  • neighboring elements, e.g. |_elem_before: &T, _elem_after: &T| print!(", "),
  • all 3 params |_idx, _elem_before, _elem_after| print!(", "),

Even more generically, we may want to introduce an accumulator version (this example re-implements the join() function):

let items = vec!["Hello", "World", "!"];
let result = items.fold_with_separator(
    String::new(),
    |accumulator, elem| accumulator.append(elem),
    |accumulator| accumulator.append(", ")
);

Alternatives

intersperse allows a similar but more convoluted/verbose way to do this, and only work for Clone-able items. In this example, it converts values to Options, with None being the separator, thus it might bloat the binary if T::clone() is not used anywhere else, and compiler doesn't realize it will never get called in this code. Note that this function has also been implemented as unstable in the stdlib.

let items = vec!["foo", "bar", "baz"];
for v in items.iter().map(|v| Some(v)).intersperse(None) {
    if let Some(v) = v {
        print!("{v}")
    } else {
        print!(", ")
    }
}

enumerate() offers a more classical way to do this:

let items = vec!["foo", "bar", "baz"];
for (idx, v) in items.iter().enumerate() {
    if idx > 0 {
        print!(", ")
    }
    print!("{v}")
}
1 Like

Like so?

You can't give away your ownership to the previous item by calling the item closure and then pass a reference to it to the separator closure in the general case.

1 Like

@quinedot this is exactly what I had in mind! I added a simple testcase to your implementation.

What do you think about the more generic fold pattern? And even more important - what would be the best way to be considered for this to become a part of stdlib?

I am not certain I understood your point about giving ownership - are you saying an iterator cannot call my callbacks with references more than once? I may need to dig deeper into that. In any case, those might be nice to have but not required.

I can think of a couple of possibilities for iterator adapters that might make this easier without necessarily directly solving the problem:

  • delay(n) to prepend a string of Nones to an iterator:
    let items = vec!["foo", "bar", "baz"];
    for (sep, v) in zip(repeat(", ").delay(1), items) {
        if let Some(s) = sep {
            print!("{s}")
        }
        print!("{v}")
    }
  • Add some adapters to make Peekable more usable:
    let items = vec!["foo", "bar", "baz"];
    for (x, has_tail) in items.iter().peekable().map_peek(|x, n| (x, n.is_some())) {
        let sep = if has_tail { ", " } else { "" };
        print!("{x}{sep}");
    }

(or)

    let items = vec!["foo", "bar", "baz"];
    items.iter().peekable().for_each_peek(|x, next| {
        let sep = match next {
            Some(_) => ", ",
            None => ""
        };
        print!("{x}{sep}");
    });
1 Like

Either an ACP or just a PR with the implementation behind some feature gate (which may be accepted, rejected, or result in a "file an ACP" response).

A team member may have more specific advice (I'm not a team member).

I mean there's no way to support the pattern of

main_item_closure_that_takes_ownership(item_0);
seperator_closure_that_takes_references(&item_0, &item_1);

Because you've given away ownership of item_0 (unless it implements Copy, or you Clone it, etc), so there's no way for you to preserve a reference to it.

2 Likes

I think this is unlikely for core. Particular combinations of things usually aren't added, especially when it's just consuming it, not an adapter. The usual response is that you can just put the function in your own code if you want it.

The enumerate approach is fine, or Itertools::with_position if you want to do more complicated stuff.

In general, I'm not a fan of multi-closure APIs. For example, you can't have both of them push into a &mut String, because you can't give both of the closures the mutable reference at the same time. When it's one closure with a branch, though -- like with .enumerate().for_each(…) -- you don't hit that problem.

8 Likes

The with_position API is interesting, thx, but it might be a bit too cumbersome to use because one would have to list all use cases to consume the value, i.e. with_position(|v| match v { First(v) => ..., Middle(v) => ..., Last(v) => ..., Only(v) => ...}) is a pain to write frequently - so won't be used as much. And good feedback about the multi-closure aspect.

So perhaps a special case of the intersperse API instead would address the same issue better?

for v in elements.iter().intersperse_nones() {
    if let Some(v) = v { print!("{v}") } else { print!(", ") }
}

Note that intersperse is a nightly API at the moment, and requires an extra map and a param to get the same result:

for v in elements.iter().map(|x| Some(x)).intersperse(None) {
    if let Some(v) = v { print!("{v}") } else { print!(", ") }
}

You don't, because there's Position::into_inner for that.

So you could write something like

    if !matches(x, Position::First(_)) { ... }
    handle(x.into_inner());

if you want.

I tend to use std::iter::once for these kind of tasks.

std::iter::once("")
    .chain(std::iter::repeat(", "))
    .zip(items.iter())
    .for_each(|(sep, elem)| print!("{sep}{elem}"));

or

let mut sep = std::iter::once("");
items.iter().for_each(|elem| print!("{}{}", sep.next().unwrap_or(", "), elem));

It's particularly nice when the first separator is a non-empty string (e.g. ?param1&param2&param3)

1 Like

Thanks for all the suggestions. I guess the most important aspect in addition to "intent clarity" is to ensure compiler is able to optimize away all the abstractions, and produce the most performant and small code possible. I will try to play with some micro-benchmarking too to see if any of the suggested patterns perform as well as a new dedicated function.

In alternatives you mention the (unstable) intersperse. Under the same feature gate is intersperse_with, which lifts the clone limitation by taking a closure instead.

2 Likes

Thanks @Sky9, I didn't notice it, but it seems it would still be somewhat un-ergonomic (not certain of how optimal the resulting binary will be, also needs testing) (playground)

let elements = vec!["a", "b", "c"];
for v in elements.iter().map(|x| Some(x)).intersperse_with(|| None) {
    if let Some(v) = v { print!("{v}") } else { print!(", ") }
}

This might not be useful more generally, but that if-let-else could be shortened to print!("{}", v.unwrap_or(&", ")).

Edited to add: The for loop could be

for v in elements.iter().intersperse(&", ") {
    print!("{v}")
}

That seems too simple, so I guess it likely isn't useful.

I have ran some basic benchmarks on 4 variants. The dedicated iterator function that converts Iter::Item into Option<Iter::Item>, and injects None as a separator is ~2.5 times faster than the other methods proposed here (raw speed of the iteration itself, not counting additional processing like writing to a string buffer in the second benchmark)

All times are in milliseconds (µs)

Simple iteration

each value is called with black_box(v), and each separator is black_box(0)

                      run #1: min/avg/max      run #2: min/avg/max
intersperse_nones    [0.5622 0.5627 0.5633]   [0.5623 0.5627 0.5633]
foreach              [0.5594 0.5598 0.5603]   [0.5591 0.5592 0.5594]
for                  [0.7722 0.7742 0.7766]   [0.7716 0.7719 0.7724]
intersperse_with     [1.4002 1.4163 1.4411]   [1.3987 1.3993 1.3999]
intersperse          [1.4012 1.4016 1.4021]   [1.4007 1.4008 1.4009]
with_position        [1.1193 1.1194 1.1195]   [1.1195 1.1197 1.1201]

make a string

Convert a vec of numbers to a comma-separated string of numbers -- similar to join(",")

                         run #1: min/avg/max      run #2: min/avg/max
intersperse_nones-join  [18.238 18.521 18.902]   [17.775 17.817 17.870]
foreach-join            [17.624 17.662 17.708]   [17.628 17.665 17.711]
for-join                [17.803 17.816 17.832]   [17.781 17.792 17.807]
intersperse_with-join   [19.293 20.109 21.137]   [18.551 18.590 18.653]
intersperse-join        [18.579 18.647 18.753]   [18.397 18.445 18.502]
with_position-join      [18.394 18.469 18.592]   [18.275 18.320 18.376]

Benchmarks and their results were heavily updated in the prior comment. I also added a benchmark for the @scottmcm suggestion (had to fix it a bit). Please let me know if additional approaches should be tried, or if the benchmarks themselves can be made more accurate. So far it seems like adding a dedicated function would significantly boost performance for these types of workload.

I'm not sure that I would expect this to be performance-sensitive, but the benchmark to beat, both in timing and clarity, is:

let mut use_sep = false;
elements.iter().for_each(|v| {
    if use_sep {
        handle_box(None);
    } else {
        use_sep = true;
    }
    handle_box(Some(v));
});
3 Likes

@jongiddy thx, I added two more benchmarks - one with a for v in elem.iter() and one with elem.iter().for_each(...). The results are somewhat surprising:

  • All results fluctuate a few percent in each run
  • For simple iter, the intersperse_nones and foreach tend to be nearly identical (foreach being about 0.1% faster?)
  • For simple iter, the for loop is somehow about 35% slower (?!?)
  • When writing a CSV string, the performance of all 3 are within 1% of each other

So it seems there is still a valid case to introduce a new iter function? Consider these two closest contenders:

for v in elements.iter().intersperse_nones() {
    if let Some(v) = v {
        black_box(v);
    } else {
        black_box(0);
    }
}

let mut use_sep = false;
elements.iter().for_each(|v| {
    if use_sep {
        black_box(0);
    } else {
        use_sep = true;
    }
    black_box(v);
});

Yes, internal iteration is sometimes faster. Usual link: Rust’s iterators are inefficient, and here’s what we can do about it. | by Veedrac | Medium

This sounds like it's probably within noise to me. And the ones that are nanobenchmarks I just don't really trust -- sticking a black box in the middle of a loop makes it non-representative of the actual cost of the optimized loop.

Note that performance usually isn't a sufficient reason to introduce a new iterator to std -- if it can be written outside it, like you have demonstrated is possible here, then the default is still that it should be a crate.

5 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.