The in and !in operators?

gbutler · October 13, 2018, 1:50pm

I would consider it a “Performance Footgun” to have special, easy syntax for something so fundamental as “Is this thing in this arbitrarily sized collection”? Even things like only allowing it for O(log(n)) doesn’t help because the Constant Factor and the overall size are just as important. It would work find to be O(n) or O(n^2) for small collections, but, O(log(N)) might suck for large collections if the C factor is sufficiently large. What is wrong with “a.Contains(b)”? Why special syntax? What is gained? Make it easier to do something non-performant without thinking about it much? How many things that have perfectly valid functions calls need another special, obfuscated syntactical operator or keyword? Where does this end?

Again, individually, proposals such as this seem reasonable, collectively, they’ll sink the ship.

newpavlov · October 13, 2018, 2:13pm

I believe that small isolated sugars like in will make Rust more pleasant to learn, read and write, at the cost of slightly higher peak of the learning slope. Yes, indeed, such proposals must be carefully scrutinized regarding potential surprising interactions with other features and possible pitfalls, but I don’t see any such problems with this particular proposal. And ergonomic improvements should be carefully weighted against additional mental load, which for in is amazingly small.

Your arguments can be applied to a various “quality of life” features which Rust already have (deref coercions, ?, impl Trait in argument position, type aliases, match ergonomics, etc., and soon NLL, async notation), without which writing and reading in Rust would’ve been a much more tiresome endeavour.

I don’t understand your complexity argument, there is absolutely no difference between using in or contains, the former is just a trivial sugar for the former. Yes, one can argue, that in could encourage potentially inefficient code, e.g. num in [10, 20, 30, 40] vs. num == 10 || num == 20 || num == 30 || num == 40, but I think optimizer should be fairly good with such cases.

gbutler · October 13, 2018, 2:52pm

Isn't that kind of the point. Every feature in isolation can have this said of it. You may be right. Probably are. I'm not an expert. But, I have an opinion about it based on my experience (not that my experience is the be all, end all). I just think individual features being added must carry more weight than "a slightly less verbose" (by about 6 characters) way of doing the same thing to justify the "potential" downsides of adding more and more cruft to the language.

Past "mistakes" (if they are mistaken which is debatable) don't provide justification for additional choices of the same category. Even past "correct choices" don't necessarily provide justification for a new choice of the same relative kind. Just because some sugar is good, doesn't mean I won't die of a diabetic coma if I ingest a 5lb bag in 30 minutes. (a little hyperbole to make the point )

My point exactly. So why add it? Does it "Carry its weight"?

And to carry the metaphor forward: How much "sugar" can the language tolerate before we're forced start regular blood sugar monitoring and infusions of insulin and other medical interventions to keep the eyes from going blind and the limbs from rotting off?

newpavlov · October 13, 2018, 3:18pm

So which of the listed features are mistakes in your opinion? I understand that impl Trait in argument position and match ergonomics were fairly controversial, but after writing code a bit with them I think overall they improve situation.

Yes, I believe so. Ergonomic improvements are obvious, it helps with the learning experience, while weight is negligible. We do not introduce a new keyword, this feature is well known in other languages and AFAIK is not perceived as a problem, and arguably follows the "zero cost"™ principle.

I believe that C++ which you keep to bring as an example is not the case of diabetic coma, but of mixing several sugars which combined transform into a lingering poison. In other words I think we should be wary not about sugar accumulation, but about how different sugars interact with each other. And do not forget that we have editions to remove mistakes to some extent. (I don't argue that we should abuse it and try every suggested feature, but it's useful to keep in mind our differences from C++)

Some have argued (e.g. in try fn threads), that such features hide that language does in reality. But I don't think it's a bad thing. Making learning slope less steep, by making in return a final peak a bit higher without hindering performance, control and guarantees is a good trade-back in my opinion. Otherwise we should deprecate for value in collection { .. } in favour of explicit loop { .. } over iterators.

leonardo · October 13, 2018, 3:21pm

I am still not convinced that impl Trait in argument position was a good idea for Rust...

gbutler · October 13, 2018, 3:24pm

Yes. In argument position it is universal/monomorphosizing whereas in return position it is existential. That seems inconsistent and unhelpful. Having it in struct/field position might be consistent, but, that isn’t permitted and has its own issues. That being said, I don’t think it is a huge baddy. And, the argument can be made that it provides some surface level of consistency to have it allowed in both positions.

gbutler · October 13, 2018, 3:27pm

I fail to see this and disagree (that doesn't make me right though). I just don't see how saving 6 characters of typing provides enough ergonomic improvement to justify the weight of any language change whatsoever.

if a.contains( b ) { ... }
if b in a { ... }

In fact, the latter seems to fade into the background hiding a potentially expensive operation whereas the former draws the eye in to focus attention on the potentially performance killing operation.

gbutler · October 13, 2018, 3:32pm

It's interesting to me that you take the metaphor there. There is a pseudo-science idea that there are "good sugars" and "bad sugars" and the "mixture of sugars" makes the difference. That is universally recognized in the established medical community as bunk. I don't want to over-work the metaphor (we probably already have), but, I think there is a danger in thinking that only the wrong mixture is bad and ignoring the "just too damn much" idea.

newpavlov · October 13, 2018, 3:42pm

It's the same kind of improvement as:

for i in 10..20 { .. }
// vs
for( i = 10; i < 20; i++ ) { .. }

The former is more pleasant to read and write compared to the latter. Also note, that iterators can be expensive as well, but we still happy to use in in for loops, simply because there is no cheaper option for doing what we want to.

I guess the main difference between us is that when reviewing proposed features I don't see "+1 feature" as a major demerit in itself. Yes, it's a demerit, but a minor one, especially if we'll consider existence of Nightly and editions. I think being too conservative will hinder evolution of the language, but of course we shouldn't pull every proposal as well. Finding the golden ratio indeed is not an easy.

gbutler · October 13, 2018, 4:03pm

This, as has been argued in other threads, is kind of an issue in itself. The danger of new features, is once added, it is very difficult to remove them. Every feature added potentially closes off some other more useful future proposal. It requires constant vigilance on behalf of those wanting to keep the language lean, whereas, those wanting things added can fail 100 times and still get in 100’s of changes. In other words, the burden of proof for the true usefulness of a feature falls on those wanting it added - that bar must be high. The burden of proof on those wanting to keep things out, should be low. That is the only way to keep runaway complexity in check.

matt1985 · October 13, 2018, 4:54pm

Instead of adding meaning to the in keyword we could also use traits:

trait Contains<T>{
    fn contains(self,element:T)->bool;
}

trait In{
    fn in_<C>(self,collection:C)->bool
    where C:Contains<Self>
    {
        collection.contains(self)
    }
}

impl<This> In for This{}



assert!(10.in_([10,20,30]))
assert!(20.in_(vec![20,30]))
assert!(20.in_(10..40))
assert!(300.in_(10..=300))
assert!("hello".in_("hello world"))

Edit:if we reuse the in keyword this would be the traits that would have to be added anyway.

josh · October 13, 2018, 5:05pm

I feel like container.contains(elem) should suffice, rather than elem in container. And I’d prefer to avoid the latter for one key reason: it’s entirely unrelated to for elem in container.

for elem in container already exists, and it iterates over the container. For that reason, I’d like to avoid having elem in container also work in a boolean context.

(That’s in addition to the comments about wanting to ensure people have thought about it taking O(n) with some containers.)

kennytm · October 13, 2018, 6:34pm

As you could see from RFC 2500 the <[T]>::contains method is already incompatible with the Needle API . The only intersections are <str>::contains and <OsStr>::contains which should be possible with

impl<P: for<'a> Needle<&'a str>> Contains<P> for str {
    fn contains(&self, needle: P) -> bool {
        core::needle::contains(self, needle)
    }
}

This totally does not work for overloading the in operator. If you write x in y, would x be moved, would y be moved?

trait Contains<Lhs> {
    fn contains(&self, lhs: &Lhs) -> bool; // ?
    fn contains(self, lhs: Lhs) -> bool; // ??
    fn contains(&self, lhs: Lhs) -> bool; // ????
    fn contains(self, lhs: &Lhs) -> bool; // ????????
}

If x would be passed by reference then it cannot support needles that are !Copy. OTOH if x would be passed by value it may be surprising that a value would be consumed by a boolean operator (a == b is not consuming, a + b is consuming). However it is very clear that y.contains(x) will move the x.

If we introduce x in y using for a in b as the model, this means both x and y should be consumed, and you'll need to write &x in &y most of the time. Is this really still ergonomic compared with y.contains(&x)?

Because of this I'm opposed to adding in as an operator in Rust.

CAD97 · October 13, 2018, 6:47pm

Another argument against from the grammar position:

<expr> in <expr> could potentially cause some issues. Currently this cannot be an expression, and instead, in serves as the separator between a pattern and an expression in the for syntax.

IIRC (did not check), the for pattern is not allowed to have an if guard, but there’s an open (pre?) RFC about allowing them to have them. This would mean that now in would separate an expression and an expression, while being valid in expression context itself.

for <pattern> if <expr> in <expr> { .. }
for x if x in list2 in list1 { .. }

I don’t yet see how this could introduce an ambiguity, but it definitely requires infinite, complicated backtracking lookahead. Example:

for x if true in { .. } ..

Without looking past the brackets for another in, you don’t know whether the brackets are the IntoIterator to iterate over, or the collection to check for membership in the guard. Replace { … } with an arbitrary complex expression and you’ve got infinite lookahead even with token trees instead of flat tokens.

lordan · October 14, 2018, 11:57am

Not arguing for or against an in operator, but why would an if guard not follow the (IMHO more readable) Python list comprehension syntax:

for x in list1 if x in list2 { .. }

AFAICT that wouldn’t have any of the lookahead issues (?).

canndrew · October 14, 2018, 1:33pm

If we were going to add this shouldn’t it be something more than just a sugary method call which returns a bool? Like, if used with if it could pattern-match on the value the same way the in of a for-loop does.

let my_collection: Vec<Option<u32>> = ...;
if Some(x) in my_collection {
    // use x here
}

It still hardly seems worth adding extra syntax just for this though.

kennytm · October 15, 2018, 2:11am

The in thing cannot simultaneously support a pattern and an expression on the LHS, otherwise x in y would be ambiguous (the x could mean a new binding matching anything or refer to an existing variable).

newpavlov · October 15, 2018, 2:20am

Can this be disambiguated by requiring let, which will be also useful from consistency point of view?

if a == b { .. }
if let Some(a) == b { .. }
if a in b { .. }
if let Some(a) in b { .. }

Though I am not sure if matching on the first occurrence is that useful. I would imagine something like this instead:

for let Some(a) in b { .. }

But I guess this use-case can be covered by aforementioned if guards:

for x in b if let Some(a) == x { .. }

Though it becomes much more verbose.

canndrew · October 15, 2018, 3:56am

I meant that we should only have it support a pattern and be used with if (and not be a general-purpose boolean operator since we already have .contains()).

Wyverald · October 17, 2018, 8:32am

The latter is not valid syntax (needs assignment, not comparison). This further highlights the possible confusion if in is used both as a test and a binding in for -- right now, in is more analogous to = (introducing bindings), as opposed to == (boolean test).

I guess it's obvious that I'm against this proposal. haystack.contains(needle) is good enough for me in every way.

Topic		Replies	Views
Clippy suggestion `range.contains` may cause Yoda condition - is my fix a good idea? libs	34	3037	September 21, 2022
Propose new operators to reserve / include for edition 2018	124	8258	March 25, 2019
Add `if x in option` syntax sugar language design	27	3487	March 25, 2019
Keywords like `is`, `or`, `isnt`, instead of `==`, `!=` and `\|\|` language design	27	2644	September 26, 2022
Pre-RFC: syntax sugar for `matches!` language design	42	2972	November 1, 2020

The in and !in operators?

Related topics