Clippy suggestion `range.contains` may cause Yoda condition - is my fix a good idea?

Nemo157 · June 19, 2022, 10:32am

Most of these functions will call other functions on the trait. If the extension method wasn’t dynamically dispatched then instead each of these internal calls to the trait would be dynamically dispatched. If there are a lot of these internal calls then that could be much more runtime overhead; and it would suppress a lot of potential optimizations that happen during the monomorphization of the extension method.

tczajka · June 19, 2022, 10:47am

Ah yes, OK. But if the same method is called multiple times, then only a single lookup in the vtable is necessary, so it's roughly a wash. I don't know if this optimization is done in practice.

But you're right that having the method in the same trait potentially allows inlining.

However it seems to me like this optimization should be done by the compiler (in theory it is possible to monomorphize even functions outside of the trait), rather than by making something overridable when it shouldn't be. The compiler would have better information about what can be overridden.

afetisov · June 19, 2022, 2:00pm

Indeed, adding any (including defaulted) methods to the standard traits would be a breaking change. If a method in_range is added to both PartialOrd and MyTrait, and both of them are in scope, then it would cause a method resolution failure.

cuviper · June 19, 2022, 2:39pm

Adding defaulted items is classified as a "minor" breaking change, which means it's allowed, but still deserves caution.

https://rust-lang.github.io/rfcs/1105-api-evolution.html#minor-change-adding-a-defaulted-item

Although this general policy allows some (very limited) breakage in minor releases, it is not a license to make these changes blindly. The breakage that this RFC permits, aside from being very simple to fix, is also unlikely to occur often in practice.

afetisov · June 19, 2022, 3:06pm

I think this should still require an edition change to contain the breakage and allow autofixes, yes?

CAD97 · June 19, 2022, 4:58pm

It's not required. Adding new methods to existing types and traits is explicitly allowed by RFC 1105. It is quite possibly desirable, but not required by current policy.

Jon-Davis · June 19, 2022, 9:15pm

Slight tangent, but I'll come back around to the topic at hand. I've run into this situation a few times, where I would want to make a function that was generic to a container. I had a trait that looked like

trait Container<T> {
   fn does_contain(&self, &T) -> bool, // as to not conflict with contains
}

My specific use case for this function was to filter HTTP requests so I could write code like:

request.filter_method(Method::GET)
request.filter_method([Method::GET, Method::POST])
response.filter_code(404)
response.filter_code(400..500)

It didn't matter what you passed in, as long as it was a Container of type T. I wrote the trait to work on T, Option, [T], Vec, String, Range, HashSet, and more.

If a trait like that would be generally useful, something like it could be added along with a trait like In, which could just reverses the call order.

trait In<C> {
   fn is_in(&self, container: &C) -> bool;
}

impl In<C> for T where C : Container<T> {
  fn is_in(&self, container: &C) -> bool { C.does_contain(self) }
}

Then a user would be able to write

if 5.is_in(0..100) { 
   ...
}

This might be a more generalized solution than InRange. It also would be quite nice for option.

option.filter(|x| x == y).is_some() // I've written this more times than I'd like to admit
y.is_in(option) // this is much nicer

steffahn · June 19, 2022, 9:52pm

There are caveats to the idea of “a consistent interface for collection membership”. Too abstract “collection” interfaces can make it hard to see what operations the data structure you’re using actually supports efficiently. Or, put differently, generic programming on some abstract C: Collection could easily implicitly assume it works with some sort of Set-like data structure where checking for .contains is significantly more efficient than linear time (w.r.t. the size of the collection).

Another problem, for which the example of Haskell’s Foldable trait (well… technically they are called “type class”, not “trait”) is a good demonstration, is that it’s hard to handle additional trait bounds for efficient .contains methods. Haskell’s Foldable is comparable to Rust’s trait IntoIterator, but it offers an elem function

elem :: Eq a => a -> t a -> Bool

i.e. in Rust syntax this would be as if IntoIterator has a method

fn contains(self, &Self::Item) -> bool
where
    Self::Item: Eq;

The important part and the problem is: the interface prescribes the Eq bound on the item type, and this means that all those collections that actually could implement the thing more efficiently (compared to the default of iterating the whole thing linearly) can only do so with additional bounds such as Hash or Ord on the item type. In the example of Haskell, the Set type (which is search-tree based, similar to BTreeSet, so it works only with an Ord bound) implements Foldable, but the elem function should probably never be used on it because it’s inefficient and can't be improved due to the lack of an Ord bound in the element type; instead one must use the member function, a concrete “method” of the Set type.

I wouldn’t want to copy this design to Rust. It’s good that the Rust version spells the inefficient approach as set.iter().find(&elem).is_some() or set.iter().any(|i| i == elem), so it’s clear that this is linear-time and undesirable.

I’m not saying that this problem can’t be avoided; presumably one could make set types simply not implement a Collection trait at all unless the item type is Hash or Ord, respectively. Rust’s traits are different from Haskell in this regard, as those tend to use their support for higher-kinded types (which Rust doesn’t have) and thus use a different API design, without an associated type. If nothing else this example does however point out, again, that .contains operations are significantly different between types that implement them efficiently in a special way, and types that just iterate through the whole collection and check every item for equality.

OOP languages such as Java “solve” this problem by having no traits at all, so everything is assumed to be Hash + Eq (even in cases where this assumption makes absolutely no sense whatsoever), and using the method from the Collection class for membership testing of HashSet just works™.

afetisov · June 19, 2022, 10:55pm

Performance of generic methods is indeed an issue, however I can't see why it would be a worse issue for generic collections than for any other kind of generic code (including generic arithmetic operators, where it is entirely unclear which of the by-value and by-ref operations is the more performant generically). A possible solution is to punt such problems to specialization (if you can do a better impl if Item: Hash, you can provide it along with a generic one). However, specialization is unstable since forever and doesn't currently have a clear way forward, so I won't invoke it.

However, the specific problem you mentioned mostly depends on the design of the collection traits. You wrote

which seems to imply that you expect to put all methods in a single trait together with a generic element, or do some trait hierarchy. This is not the only approach:

pub trait Contains<T> {
    fn contains(&self, item: &T) -> bool;
}

This definition entirely side-steps your issue, since a collection is free to decide which elements under which conditions it can contain (just like with other operator traits). It doesn't even need to have a single static bound for "containable elements", e.g. we can consider (ad-hoc) collection which contains u32 or String, and has two separate impls

impl Contains<u32> for Coll { ... }
impl Contains<String> for Coll { ... }

Trying to call a method with an unsupported element type would just give a nice compile error with a list of possible implementations.

droundy · June 22, 2022, 7:35pm

I think it's worth pointing out that the contains version from the original poster doesn't agree with the original if/else version of the code, it should run 1..42. I think this is a solid argument that clippy is leading you astray here from a clear and efficient version of the code to a hard to read version.

kornel · June 23, 2022, 12:47am

This should be response.filter_code(400..=499) or response.filter_code(400..500).

Kixunil · June 23, 2022, 5:34pm

Wrote it from memory, clippy does suggest the right thing. But yeah, it's funny and it's true that being used to "arrays start at zero", "range is usually used in arrays" could lead to mistakes in these cases.

tczajka · June 23, 2022, 9:35pm

The real cause is starting your versions at 1 instead of the much more natural 0.

Jon-Davis · June 23, 2022, 11:58pm

Thank you, I was a bit sloppy with my example I'll try and clean it up a bit.

I went ahead and just published a crate that does roughly what I was talking about: contains

With it the range code would look like this:

use contains::In;

if version.is_in(&0..42) {
  // ...
}

but you could just as easily use a vec, map, slice, option, result, ect.

system · September 21, 2022, 11:58pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The in and !in operators? language design	41	5930	March 25, 2019
Add a `within` method to `PartialOrd` libs	24	1546	January 22, 2024
Pre-RFC: Fixing Range by 2027 language design	66	6703	March 27, 2024
Enhancing Rust Range?	34	3308	December 5, 2023
`...` vs `..=` for inclusive ranges ideas (deprecated)	104	27460	March 25, 2019

Clippy suggestion `range.contains` may cause Yoda condition - is my fix a good idea?

Related topics