The in and !in operators?

leonardo · October 12, 2018, 12:35pm

There’s “in interval” in Rust, so has someone proposed to use “in” to allow simpler (more ergonomic) code like this (similar to D language, and Python) to search for values and subsequences? This is D language:

void main() {
    auto h = [10: 20, 30: 40]; // Associative array.
    if (30 in h) {}
    if (30 !in h) {}
}

This doesn’t add power to language, and adds another obvious way to do something that can be done already.

#![feature(range_contains)]
fn main() {
    use std::collections::{HashSet, HashMap};

    let s = HashSet::<u32>::new();
    if s.contains(&1) {}
    // if 1 in s {}
    // if 1 !in s {}

    let h = HashMap::<u32, u32>::new();
    if h.contains_key(&2) {}
    // if 2 in h {}
    // if 2 !in h {}

    let r = 1 .. 10;
    if r.contains(&3) {}
    // if 3 in r {}
    // if 3 !in r {}

    let t1 = b"Hello World";
    if t1.contains(&b'o') {}
    // if b'o' in t1 {}
    // if b'o' !in t1 {}

    let v1 = vec![10, 20, 30];
    if v1.contains(&20) {}
    // if 20 in v1 {}
    // if 20 !in v1 {}

    if t1.windows(b"llo".len()).any(|w| w == b"llo") {}
    // if b"llo" in t1 {}
    // if b"llo" !in t1 {}

    let v2 = vec![10, 20, 30, 40, 50, 60];
    if v2.windows([20, 30, 40].len()).any(|w| w == [20, 30, 40]) {}
    // if [20, 30, 40] in v2 {}
    // if [20, 30, 40] !in v2 {}

    let t2 = "Hello World";
    if t2.find('W').is_some() {}
    // if 'W' in t2 {}
    // if 'W' !in t2 {}

    if t2.find("Wor").is_some() {}
    // if "Wor" in t2 {}
    // if "Wor" !in t2 {}
}

Riateche · October 12, 2018, 1:06pm

This thread discusses another possible application of the in keyword.

Centril · October 12, 2018, 2:35pm

How does the trait look like that supports this operation for all the relevant types?

skysch · October 12, 2018, 2:53pm

I like the idea (it works well in python), but two things stick out to me:

!in doesn’t look like any other operator in Rust. I would expect this to look like !(x in y) instead. (Compare to the nand and nor !&&, !|| operators we don’t have)
Does this have an unfavorable consequences for practical uses of associative maps? That is, if in only looks at keys, are we going to be unnecessarily introducing temporary structures to use it for looking at values or key-value pairs instead? It would be a shame if in only allowed you to compares keys efficiently, because people will want to use it for the other things as well. I could imagine using some newtype wrappers so that we can have x in hash_map.keys(), but keys is already a method on hash_map (and probably not guaranteed to be efficient for this purpose), but I suppose x in Keys(hash_map) might be a suitable alternative, if a bit confusing.

newpavlov · October 12, 2018, 3:41pm

At the first glance it can look like this:

trait InOp<Value: PartialEq> {
    fn is_in(&self, value: &Value) -> bool;
}

Here is some impls from the OP examples. it’s debatable whether we want PartialEq or Eq bound, we probably want this feature to work with floats, but searching for NaN will always produce false, which could be quite surprising for beginners. We also could add a generic implementation for Iterator+Clone, but we will have to use specialization for other types.

I agree that we probably don’t want !in, but overall I think this feature could be useful.

H2CO3 · October 12, 2018, 4:25pm

I’d strongly favor adding missing .contains() methods to the relevant types instead. Yes, having to write .any(|elem| elem == searched) is a pain, but it would be much easier to just add a slightly less general method wherever it’s missing – it wouldn’t require a language change.

leonardo · October 12, 2018, 5:04pm

But it's quite handy once if you have "in". In Python you write "not in" (special cased syntax, I think). !(x in y) isn't that handy... I think !in didn't cause problems in D language.

That is, if in only looks at keys, are we going to be unnecessarily introducing temporary structures to use it for looking at values or key-value pairs instead? It would be a shame if in only allowed you to compares keys efficiently,

If "in" works on iterators too, then you could use it also for values and key-value pairs (with a linear search). But while in Python the "in" operator is used for linear searches too, in D language its usage is allowed only for constant or O(ln n) searches. This is an important detail.

Yes, I've also asked for few of those recently:

github.com/rust-lang/rust

Subslice search

opened 12:13PM - 10 Oct 18 UTC

leonardo-m

T-libs-api C-feature-request

As enhancement, I think stdlib should contain functions that search a subslice i…nside a given slice: ``` fn contains_subslice<T: PartialEq>(data: &[T], needle: &[T]) -> bool { data .windows(needle.len()) .any(|w| w == needle) } fn position_subslice<T: PartialEq>(data: &[T], needle: &[T]) -> Option<usize> { data .windows(needle.len()) .enumerate() .find(|&(_, w)| w == needle) .map(|(i, _)| i) } fn main() { println!("{}", contains_subslice(b"hello", b"ll")); println!("{:?}", position_subslice(b"hello", b"ll")); } ``` For the common case of T:Copy items the true stdlib functions should specialize using a smarter algorithm, like: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm (Similar functions are useful for iterators too).

Centril · October 12, 2018, 6:15pm

I like this idea, needle in haystack reads really well!

Would it make sense to de-reference-ify the trait and make it?:

trait InOp<Value: PartialEq + ?Sized> {
    fn is_in(self, value: Value) -> bool;
}

Implementations could be: Rust Playground. Some of the lifetimes won't need to be written explicitly once you have the impl header elision that is soon on stable.

It is a bit odd and special cased so maybe we shouldn't do it... but it is also more readable than !(x in y).

Presumably you'd also get the .contains() methods from doing this change if the trait method is named contains. Another advantage is that there's now a trait you can use in generic code.

But really, I think in as a syntax is good for code readability and ergonomics.

Finally, @kennytm has done some nice work on the Needle 3.0 API in https://github.com/rust-lang/rfcs/pull/2500 so I think their input is important here; what we do here should should hopefully interact well with the needle API.

newpavlov · October 12, 2018, 6:46pm

I think generally by using in you don't want to consume tested collection and element, so in my opinion it makes sense to encode this by using &self. Plus impls will look a bit cleaner and we will be able to use this trait with trait objects. (dunno if we'll ever need the latter, but still)

Though self will make possible to search iterators without cloning, not sure if it's worth it.

Centril · October 12, 2018, 6:52pm

I thought we were relaxing things to allow self in trait objects? (see Implement by-value object safety by qnighy · Pull Request #54183 · rust-lang/rust · GitHub)

scottmcm · October 12, 2018, 7:12pm

We already have Range*::contains, so I'm skeptical about in as a full operator.

Why only for in? If this is good, why not have x !& y for bitwise nand, x !|| y for logical nor, etc? Doesn't even need a new trait. (But things like x -+ y for -(x+y) doesn't seem like a good idea...)

Centril · October 12, 2018, 7:23pm

Perhaps just the trait then (Contains), as a start?

leonardo · October 12, 2018, 7:40pm

Because those things don't look good to me.

scottmcm · October 12, 2018, 7:45pm

Or RangeBounds in core::ops - Rust?

That's not a helpful statement to deciding whether they're valuable. What do you like about !in, but not about !&? Couldn't someone else decide that !in "doesn't look good" to them?

leonardo · October 12, 2018, 7:54pm

I've seen the design history of the D language. They have added "in" first, they didn't want to add "!in" at the beginning. Later they have added it because people using "in" found it handy to have the negated version too. It's the same reason Python has "in" and also "not in", ask Python designers why they have added "not in". If you use "in" for some time and you write code like !(something in haystack) some times, you probably start desiring the short version too

But in the end the feature I've proposed here is mostly syntax sugar, it doesn't add much to the language, that's why I've asked for the enhancement request #54961, that's a true improvement, because currently searching for a sub-slice is slow and not obvious to write. Using ".contains()" instead of "in" isn't a large improvement. There are other features I'd like in Rust that are more useful than "in". Another problem with "in" is that it adds a second obvious way to do something (search in a container or sequence or range) and I agree with the Python Zen rule that a well designed language should offer only one obvious way to do something.

I have also shown the other problem with "in", that perhaps we want it to work only when its complexity is sub-linear. So far no one has commented on this important point. I think this point is more important than discussing why we don't want "!in" once we like "in". "!in" is just a bit more of syntax sugar, it doesn't cause troubles (I think) and it just makes the code handier to write and read.

Centril · October 12, 2018, 8:10pm

That one doesn't seem general enough to admit the impls in @newpavlov's playground example. Specifically, there's no start_bound and end_bound to reasonably talk about for a HashMap...

gbutler · October 12, 2018, 8:26pm

Perhaps this quote from Bjarne Stroustrup would be applicable:

Individually, many proposals make sense. Together they are insanity to the point of endangering the future of C++.

Substitute "Rust" for "C++" in the above statement.

leonardo · October 12, 2018, 8:28pm

Together they are insanity to the point of endangering the future of C++.

I think they aren't going to listen to him on this

Centril · October 12, 2018, 9:55pm

I think that compared to C++, and in particular our closer cousin Swift, Rust is quite micro-sized when it comes to syntactic language complexity of a modern general-purpose language. Sure, Rust is not some dependently typed language where you can just get rid of syntactic distinctions between types and terms (like Idris, Agda, or 1ML), but syntactically the language is still not so complex.

The main complexity in Rust derives instead from the type system, provided attributes (e.g. #[repr(...)]) but a lot of that is warranted and we make sure that we get a lot of bang-for-buck for each addition there. In my opinion, we don't have a lot of baggage that should be removed from the language.

Therefore, I think that permitting in in more contexts (i.e. this is a keyword that is already used...) in a way that is instantly grokkable (i.e. I think people can understand if x in array { .. } immediately without having seen it before...) is quite a small syntactic addition.

Fears that this makes us C++, or that this would happen over time, I think are exaggerated. We are not the Vasa and the ship is not sinking.

djc · October 13, 2018, 12:40pm

In Python I really liked the in keyword, and I feel that the underlying contains() operation gets used enough that providing it with syntax sugar makes sense – and contains is also a good conceptual abstraction that has meaning with lots of things other than very straightforward collection types.

The best part is that it just reads very natural in condition contexts like if needle in haystack { }. (Although negation will obviously a bit less nice than the not in that Python has.) I wonder if there is some subtle point here about comprehensibility of haystack contains needle versus needle in haystack and difference in ordering here.

I would say this is definitely worth writing up a full RFC and having the discussion about it more broadly.

Topic		Replies	Views
Current syntax	17	5570	March 25, 2019
Range syntax is confusing language design	23	5996	November 2, 2019
Creating "into operator" language design	15	1501	April 24, 2023
`let .. in` to declare variables with limited scope language design	10	2432	March 25, 2019
Array/vectors parsing	12	4855	March 25, 2019

The in and !in operators?

Related topics