Implement Extend trait for Option

Very frequently Option is treated as a special kind of collection with a single element. Rust implements Extend trait for most of the collections. Option already can be turned into iterator through IntoIterator implementation.

Here is my thoughts why I think it might be a good idea.

Implementation of Extend on Option would allow to write generic implementations which relies on IntoIterator, Iterator and Extend and make no distinction between collections like Vec or Option.

At the moment, such implementation probably imposible because of lack of specialization support. So, we can't provide generic implementation for iterable types which implement Extend and make a specialized implementation for Option which does not implement Extend.

One particular use case which I have in mind is implementation of MultiUnzip from itertools for Option.

With Extend trait implementation we have to make a bit of a stretch and assume that we updating existing elements when extending Option. Though, Extend trait behavior description makes such a stretch completely valid.

When extending a collection with an already existing key, that entry is updated or, in the case of collections that permit multiple entries with equal keys, that entry is inserted.

In case of extending Option::Some with several elements we'll use replace with last element from the provided iterator, in case of extending Option::None we'll return None or might swap it to Option::Some with last element from the iterator given to extend method.

Please, give your thoughts on this idea and if you think it is valid use case I would appreciate your suggestions how to submit that proposal to stdlib team!

1 Like

Definition of Extend is

pub trait Extend<A> {
    // Required method
    fn extend<T>(&mut self, iter: T)
       where T: IntoIterator<Item = A>;

    // Provided methods
    fn extend_one(&mut self, item: A) { ... }
    fn extend_reserve(&mut self, additional: usize) { ... }
}

How would it work if you pass iterator of multiple items to extend? Option could contain only a single item. It means that the rest would be silently lost.

2 Likes

In case of extending Option::Some with several elements we'll use replace with last element from the provided iterator, in case of extending Option::None we'll return None or might swap it to Option::Some with last element from the iterator given to extend method.

Alternatively we can panic in runtime when we try to extend Option with iterator of more then one element. That would address fears that from data is silently eaten by Option.

What are some concrete use cases for treating an Option generically as an Extend, given that it cannot fulfil the contract that can be reasonably expected of Extend? Panicking and silently dropping data would both be, honestly, very fragile and error-prone semantics. In particular, an implementation is absolutely not supposed to panic unless the trait interface explicitly says that it may.

The only reasonable case I can think of is a function that takes a &mut impl Extend and promises in its contract to only extend the argument with exactly zero or one elements. Which doesn’t feel like something that happens in the real world – such a function should simply be made type-safe by returning an Option instead!

7 Likes

I mentioned it in my original post

One particular use case which I have in mind is implementation of MultiUnzip from itertools for Option.

At the moment multiunzip method is very likely impossible to implement for Option as we need Extend for Option or specialization feature.

Here is a code I would like to see implemented from the issue I mentioned above.

struct A {
    a: u32,
    b: u32,
    c: u32,
}

let opt = Some(A { a: 1, b: 2, c: 3 });
let non: Option<A> = None;

let (a, b, c): (Option<_>, Option<_>, Option<_>) = opt
    .map(|value| (value.a, value.b, value.c))
    .multiunzip(); // a = Some(1); b = Some(2); c = Some(3)

let (a, b, c): (Option<_>, Option<_>, Option<_>) = non
    .map(|value| (value.a, value.b, value.c))
    .multiunzip(); // a = None; b = None; c = None

Here is current implementation of MultiUnzip which relies on Extend:

/// An iterator that can be unzipped into multiple collections.
///
/// See [`.multiunzip()`](crate::Itertools::multiunzip) for more information.
pub trait MultiUnzip<FromI>: Iterator {
    /// Unzip this iterator into multiple collections.
    fn multiunzip(self) -> FromI;
}

macro_rules! impl_unzip_iter {
    ($($T:ident => $FromT:ident),*) => (
        #[allow(non_snake_case)]
        impl<IT: Iterator<Item = ($($T,)*)>, $($T, $FromT: Default + Extend<$T>),* > MultiUnzip<($($FromT,)*)> for IT {
            fn multiunzip(self) -> ($($FromT,)*) {
                // This implementation mirrors the logic of Iterator::unzip resp. Extend for (A, B) as close as possible.
                // Unfortunately a lot of the used api there is still unstable (https://github.com/rust-lang/rust/issues/72631).
                //
                // Iterator::unzip: https://doc.rust-lang.org/src/core/iter/traits/iterator.rs.html#2825-2865
                // Extend for (A, B): https://doc.rust-lang.org/src/core/iter/traits/collect.rs.html#370-411

                let mut res = ($($FromT::default(),)*);
                let ($($FromT,)*) = &mut res;

                // Still unstable #72631
                // let (lower_bound, _) = self.size_hint();
                // if lower_bound > 0 {
                //     $($FromT.extend_reserve(lower_bound);)*
                // }

                self.fold((), |(), ($($T,)*)| {
                    // Still unstable #72631
                    // $( $FromT.extend_one($T); )*
                    $( $FromT.extend(std::iter::once($T)); )*
                });
                res
            }
        }
    );
}

impl_unzip_iter!();
impl_unzip_iter!(A => FromA);
impl_unzip_iter!(A => FromA, B => FromB);
impl_unzip_iter!(A => FromA, B => FromB, C => FromC);
impl_unzip_iter!(A => FromA, B => FromB, C => FromC, D => FromD);
impl_unzip_iter!(A => FromA, B => FromB, C => FromC, D => FromD, E => FromE);
impl_unzip_iter!(A => FromA, B => FromB, C => FromC, D => FromD, E => FromE, F => FromF);
impl_unzip_iter!(A => FromA, B => FromB, C => FromC, D => FromD, E => FromE, F => FromF, G => FromG);
impl_unzip_iter!(A => FromA, B => FromB, C => FromC, D => FromD, E => FromE, F => FromF, G => FromG, H => FromH);
impl_unzip_iter!(A => FromA, B => FromB, C => FromC, D => FromD, E => FromE, F => FromF, G => FromG, H => FromH, I => FromI);
impl_unzip_iter!(A => FromA, B => FromB, C => FromC, D => FromD, E => FromE, F => FromF, G => FromG, H => FromH, I => FromI, J => FromJ);
impl_unzip_iter!(A => FromA, B => FromB, C => FromC, D => FromD, E => FromE, F => FromF, G => FromG, H => FromH, I => FromI, J => FromJ, K => FromK);
impl_unzip_iter!(A => FromA, B => FromB, C => FromC, D => FromD, E => FromE, F => FromF, G => FromG, H => FromH, I => FromI, J => FromJ, K => FromK, L => FromL);

I experimented with it trying to find a way to implement MultiUnzip for Option but as I mentioned we need either specialization or Extend for Option and both seems reasonable to have to me, but Extend for Option is easier to get for this particular use case.

This is not the only use case for having Extend for Option, and you may imagine other implementations which try to do different generic conversions between collections/iterables, tuples and may need a special handling for Option or generalization over iterables and Option (as a special case of iterable or single element). For read path it is covered by IntoIterator trait, but for write it is more complicated.

Note that we already have Extend implementations that silently lose elements: HashMap and BTreeMap (and HashSet and BTreeSet in a slightly less problematic way). It's logical because if the iterator contains duplicated keys they can't all be added to the collection, however the choice of which value to keep is arbitrary.

3 Likes

I don't like that Option is an iterator at all. Pretending it's a "collection" is a bit of semantic cleverness that isn't useful in practice, and only obscures intention of the code.

Option already has a ton of dedicated helper methods for setting its contents, and it works well with pattern matching. It doesn't need more ways of doing the same thing, especially with a less clear interface.

Use of Option as an Iterator is such an annoying gotcha that we already have the for_loops_over_fallibles lint. If Option supported Extend, we'd probably end up adding a similar lint against using that.

9 Likes

Pretending it's a "collection" is a bit of semantic cleverness that isn't useful in practice, and only obscures intention of the code.

I personally always found ability to chain .map on Option is very elegant! itertools makes a lot of cool things to Option as well! Things like that in my opinion is highly practical in first place! But I can understand that people with different past backgrounds and preferences may does not like to think about Option as a monad, but rather just two values.

Thank you for the opinion and reference to the lint rule!

2 Likes

Alternative to your multizip example: Rust Playground

@idanarye Thank you for the suggestion, but my intention was slightly different, not just to get code running.

It is actually multiunzip, not multizip which is a different function, but I guess it is just typo.

    let opt = Some(A { a: 1, b: 2, c: 3 });
    let non: Option<A> = None;

    let (a, b, c): (Option<_>, Option<_>, Option<_>) = opt
        .map(|value| (Some(value.a), Some(value.b), Some(value.c)))
        .unwrap_or_default();
    println!("[opt] a = {a:?}, b = {b:?}, c = {c:?}");

I was aware about possibility of explicit wrapping. Whole purpose of multiunzip is that you don't need to do it yourself. One can rely on monadic behavior while working only with clean values.

Whole idea of applying multiunzip to Option comes from real project where code much more messier and necessity to wrapping things explicitly makes it even worse. So, I would like to leave notion of optionality (is there a word?) once execution enter map's closure.

If you just try to extract map's closure to external function return type get messy which gives a bit of feeling what's going on in real code.

Compare:

fn extract_fields(value: A) -> (Option<u32>, Option<u32>, Option<u32>) {
    ...
}

to

fn extract_fields(value: A) -> (u32, u32, u32) {
    ...
}

In addition second variant of function is much more reusable as well! It has no knowledge that values has to be packed back into Option again because of the way they're used, so extract_fields function can be reused in more places.

So, yeah functionally it is the same but not aesthetically. And later is the whole purpose of doing it.

1 Like

Thank you all for your opinions!