[Question - Resolved] add or, or_else, and, and_then to std Iterator trait

Hello, Recently I experienced fun issue and thought it would be good to be in Iterator. Iterator trait would be good to have methods for checking whether iterator instance is empty or not, before it is evaluated.
Of course, I can figure out the iterator is whether empty using next() method.
But the problem is after I call next() like method, it evaluates iterator.
What I just want to do is just adding some code logic to Iterator, without evaluating.

Fortunately, it is possible to do using existing methods, though it looks little bit verbose. This code example below shows implementing or,

fn main() {
    // Case 1. if a is empty, then return b
    let a = vec![];
    let a = Box::new(a.into_iter());

    let b = vec![4, 5, 6];
    let b = Box::new(b.into_iter());

    let c = or(a, b).collect::<Vec<i32>>();
    println!("Case 1. [].or([4, 5, 6]) => {:?}", c);
    assert_eq!(c,  vec![4, 5, 6]);

    // Case 2. if a is not empty, then return a
    let a = vec![1, 2];
    let a = Box::new(a.into_iter());

    let b = vec![4, 5, 6];
    let b = Box::new(b.into_iter());

    let c = or(a, b).collect::<Vec<i32>>();
    println!("Case 2. [1,2].or([4, 5, 6]) => {:?}", c);
    assert_eq!(c,  vec![1, 2]);
}

fn or(
    a: Box<dyn Iterator<Item = i32>>,
    b: Box<dyn Iterator<Item = i32>>
) -> Box<dyn Iterator<Item = i32>> {
    let c = a
        .map(|v| {
            let is_a = true;
    
            (is_a, v)
        })
        .chain({
            let is_a = false;
    
            b.map(move |v| {
                (is_a, v)
            })
        })
        .scan(true, |is_a_empty, (is_a, v)| {
            match (&is_a_empty, is_a) {
                (true, true) => {
                    *is_a_empty = false;
    
                    Some(v)
                }
                (true, false) => Some(v),
                (false, true) => Some(v),
                (false, false) => None,
            }
        });

    Box::new(c)
}

What I wrote was (1) which is combination of map-chain-scan, but it can be replaced simply with or.

Let's assume both A and B implemented Iterator trait.

  • A.or(B)
    Return A if A is not empty, else if A is empty then return B.
    Iterator B will be evaluated only if (A is evaluated and then A is empty) case.

Basically I got interested at first was only or.
But if I add those feature, maybe it's natural to add other related methods:

  • and, and_then and or_else (like Option and Result enums have.)

How do you think of these?

[Updated]
Oh, sorry too too much. I shouldn't be rushed to do like this.
I changed the code example above,
and you can also execute it via this link below:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=456892aa5dba1dd77e93cf077032b13f

Would you please correct your example code? Please actually at least check that it compiles, understanding wrong code is hard, I can't always guess what you mean. Furthermore iter_b does not even appear in your first version.

3 Likes

Oh, sorry and thanks a lot for the fast reply. It will be definitely better to.. replace code to be something executable one. I will change this code example to be something really executable, please wait for a moment.

Iterator trait would be good to have methods for checking whether iterator instance is empty or not, before it is evaluated . Of course, I can figure out the iterator is whether empty using next() method. But the problem is after I call next() like method, it evaluates iterator.

Does Iterator::peekable help?

2 Likes

@steffahn Again, thanks so so much for the fast feedback.
If you didn't, I may get asleep and wake up and see what a sad mistake I made last night. I changed the code and now you can also execute it.
and I got... solved, thanks to @pcpthm
I will change the title into [Question Resolved]

@pcpthm Oh, thanks, I didn't considered using peekable, it looks really helpful.
function in my writing fn or(...) can be replaced with this below,

fn simplified_or(
    a: Box<dyn Iterator<Item = i32>>,
    b: Box<dyn Iterator<Item = i32>>
) -> Box<dyn Iterator<Item = i32>> {
    let mut a = a.peekable();

    match a.peek().is_some() {
        true => Box::new(a),
        false => b,
    }
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=2b28461e1274f98af813f4255c9c448b

Peekable... directly hits my need, wow.. thanks so much!

There's also the standard library's Iterator::chain() method. Couldn't you just use it? It's probably more efficient in most cases because it doesn't force boxing all the iterators.

Thanks for the comment, but isn't Iterator::chain enough for this case?

Because if I chain 2 iterators, (assume A and B)
if A is empty, then its ok.
But if A is not empty, then chained iterator contains A + B, but what I need is only A.

Makes sense. Still, you could follow its implementation for some inspiration as to how to avoid boxing everything. t

1 Like

Here, I built something in this regard.

1 Like

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c56d094a4a2f131ba951ab731179c20d

fn or(
    a: impl Iterator<Item = i32>,
    b: IntoIter<i32>, // or if I can specify
) -> impl Iterator<Item = i32> {

When I saw your comment at first, I didn't catch what you mean so I needed to some time for thinking about it.
And yesterday I thought that I got the point, but it wasn't.
Now I can tell that I understand...
It was because I didn't aware of "impl Trait" type.
Thank you so so so much.

1 Like

Iterator implementation code was quite new to me, so it takes some time for me to digest.
Your code helped me really really a lot, thanks so much, I learned a lot.
At first, I also didn't understand one of your comments "performance reasons", but now I got it, thanks!

In my project, iterators I used are expected to fetch something from file system or from network.
and it can also have some time gap between building and evaluating iterators.
So I wanted to ensure that iterators are not evaluated at all by using "or".
I tried changing your example code little bit for ensuring iterator's lazy evaluation, does this code look ok?

I'd really appreciate of your help, so kind answer for the quite silly question.
Thank you, and I now know https://users.rust-lang.org/ so I will use that forum when I got similar issue like this later. :smiley:

Yeah, your code looks good. I worked on it a bit anyways and got rid of the Peekable which should get rid of some extra checks that happen inside Peekable since we never actually peek and keep a peeked value. The impls are in some part inspired by the source code of Peekable. I also made a size_hint implementation that should be compatible with implementations for ExactSize (and perhaps TrustedLen) that one might want to add.

1 Like

Wow.... your code is... really awesome.

    enum State { Initial, InIter1, InIter2 }

Before I used Peekable, I tried implementing based on wrapping existing iterators using Option but stuck at.. what I actually needed was only 3 states but If I add two Options then it makes 4 possiblilities.
that state enum is so clear, obvious and also even easy to understand.

And at first I simply thought that Peekable is enough to use so I wrote the prev code example, but it's really good point you mentioned:
"we never actually peek and keep a peeked value."
That's great. And your implementation is also really neat and straight forward. I like it, thanks for showing this good example, good learning material for me.

size_hint.. thanks. At first I thought there's no way to provide size_hint because now we cannot know what to use between two, but wow... I was wrong.
And not only you did calculate min & max of iter1 and iter2, but also you prevent unnecessary calculation.

match self.iter1.size_hint() {
    i1_hint@(1..=usize::MAX,_) => i1_hint,
    (_, Some(0)) => self.iter2.size_hint(),
    ...
}

beautiful...

And,
could you consider uploading your "Iterator or" implementation code to itertools or something like that crates?
Now it looks really nice enough and that's definitely what I look forward to use.
It's even much more than I firstly thought, your code is even performance optimized.. so awesome. (Though current Rust stable version does not support Try yet, so try_fold function should wait for a while, I really want to use Try in stable.)

Again, thanks so much for both your support and providing good learning materials, I learned a lot.

it’s not only about preventing calculation, it is also to get correct behavior for a potential ExactSizeIterator instance. Now that you’re pointing at my code, on a second look I’m noticing that I’ve been a bit too conservative with just using min as lower bound in the last case. Especially since you only get to that case if the first entry i1_hint is 0, so the minimum will always be 0 anyways. This is a new version:

fn size_hint(&self) -> (usize, Option<usize>) {
    match self.state {
        Initial => {
            match self.iter1.size_hint() {
                (_, Some(0)) => self.iter2.size_hint(),
                (0, i1_high) => {
                    let (i2_low, i2_high) = self.iter2.size_hint();
                    let low = if i2_low > 0 { 1 } else { 0 };
                    let high = i1_high.and_then(|h1| i2_high.map(|h2| max(h1,h2)));
                    (low,high)
                }
                i1_hint => i1_hint,
            }
        }
        InIter1 => self.iter1.size_hint(),
        InIter2 => self.iter2.size_hint(),
    }
}

(playground)

I don’t feel like taking up the effort to contribute there, currently. Furthermore I’m not sure of the general need for a method like this or. Feel free to use and adapt the code though, and put it into a crate yourself or try contributing it to somewhere if you feel the need for it.

1 Like

Sorry for the late response,
If you didn't mention ExactSizeIterator, I may not even try reading that the impl code of ExactSizeIterator. Thanks so much and new code you attached also looks nice.
Wow, yeah let low = if i2_low > 0 { 1 } else { 0 }; this... is really cool.
I hadn't think there's more improvement can be made in size_hint, it's so good.

And also thanks you allowed me to use the code, I published to https://crates.io/crates/or-iterator

I thought this feature or is quite necessary, but I also understand about that you don't agree.
In fact, it's true that I only have a single use case for this, nothing more I have.

The only use case is...
I'm currently working on an open source SQL database, I may be able to open to public in late Summer, this year.
Where I use this or function is on my LEFT JOIN implementation.

SELECT * FROM TableA LEFT JOIN TableB ON {join constraint}

I needed to know each LEFT JOIN fetched rows are whether empty or not empty without evaluation.

In case of INNER JOIN, I can simply filter out the original row if join fetched rows are empty.
But for LEFT JOIN, if join fetched rows are empty then I still need to keep the original row.

For handling LEFT JOIN, I used quite verbose codes using map-chain-enumerate-filter_map. But now, I simply replaced using or.

And again, thank you so much spending lots of time for help. I'd really appreciate.
Thanks!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.