Linting and match ergonomics


#21

That is an absolutely absurd suggestion, along with more zero-sum “win/lose” thinking. The priority is “we can’t fix a problem we don’t understand.” We all understand, very clearly, what you’re saying about local reasoning. But like the word “explicit,” there are many degrees of local reasoning, and Rust has always had less or more of it depending on the context.

Consider that Rust has always had situations where you can’t tell locally whether something is a reference. The type of a struct field, the return type of a function, the inferred type of a variable or complex expression, or even the written-out type of an argument sufficiently far away from its use. Match ergonomics certainly extends this to more scenarios!

But what we need is the next step- things that actually go wrong because we have insufficient or misleading local reasoning. We’ve seen some confusing error messages- that’s a good start! We can change error messages! But if that’s the whole problem, then disabling the feature is nothing more than a hack to bring back the old error messages.

So if “sit down right now and write an Underhanded Rust entry” is too high a barrier, then perhaps the solution is to take a step back, move forward with what we have, and keep an eye out for problematic cases. What do you stand to lose here?


#22

Against my better judgment, I’ll entertain an attempt at an example:

  1. Given the code below, where I’ve left out the definitions of foo1, foo2, bar1 and bar2, can you tell me what the type of s is in each case? Does any individual use have enough information to determine the types involved in that specific case?
  2. If you pretend that you’re shown a code review with code which contains one of the snippets below (the one without the * as a differentiator), how confident are you that you can spot potential performance concerns? Let’s pretend it’s in a loop that may be hot.
struct S(String);

struct T(S);

impl T {
    fn new() -> T {
        T(S(String::new()))
    }
}

fn main() {
    {
        let t = T::new();
        let S(s) = t.foo1();
        bar1(s);
    }
    {
        let t = T::new();
        let S(s) = t.foo2();
        bar2(s);
    }

    {
        let t = T::new();
        let S(s) = t.foo1();
        bar1(s);
    }
    {
        let t = T::new();
        let S(s) = t.foo2();
        bar1(s.clone());
    }

    {
        let t = T::new();
        let S(s) = t.foo1();
        bar2(&s);
    }
    {
        let t = T::new();
        let S(s) = t.foo2();
        bar2(&s);
    }
}

Notes:

  • I compiled this with rustc 1.28.0 (9634041f0 2018-07-30)
  • Please don’t suggest that other parts of Rust had this problem before match ergonomics. That’s irrelevant, because I’d want a lint for those cases too.
  • If you think this example doesn’t meet your standard for good examples, just ignore that I even tried. Arguing about it isn’t useful for either of us.
  • I’m confused by the notion that the bar for getting a lint is the same as the bar for a convincing Underhanded Rust entry. The bar should be that X% of users would find this construct confusing when reading Rust code, where X is higher for deny and is lower for warn and allow. Given that clippy has an assign_ops (allow) lint that prevents using += and friends because “Projects with many developers from languages without those operations may find them unreadable and not worth their weight.” and it’s implemented even though it has a caveat as strong as “Types implementing OpAssign don’t necessarily implement Op .” To make life more interesting, there is a (warn) lint which enforces the opposite, assign_ops_pattern. I.e. you’re allowing a lint to avoid a construct which is available in almost all C-like languages and many that are not, which potentially has no workaround other than locally disabling the lint. Any lint requiring match ergonomics or similar previous language features doesn’t have that issue, since the workaround would be adding a ref, *, & or similar. And surely, the level of potential confusion caused is significantly higher. And the number of lints I looked at before finding this one was 0.
  • And lastly, if I’m going to be labelled a match ergonomics dissenter, can I at least have the honor of being an ergonomics and assign ops dissenter? I.e. match ergonomics isn’t the only ergonomics I dislike and while I quite like assign ops, if I’m going to be labelled, I might as well get one that will make me smile every time I see it.

#23

This has been my argument from the start. Wanting to lint match ergonomics is not just overkill, it doesn’t actually solve anything the problems in a holistic manner, while denying valid use cases (edited). I have been trying the whole time to identify the actual underlying problems that have been amplified by match ergonomics.

The bar is this high because the amount of complaints we get about false positives (“but everything is super clear and obvious in 50% of the cases”) is a pain we feel a lot. We’ll take false negatives any day, but overwhelming false positives are not something we can just throw at users.

Especially not if, as stated above, this problem is actually not caused at all by match ergonomics. It is solely amplified by it.

This lint was added by me without adhering to the rules and I have apologized multiple times for it. I thought I opened an issue for removing it, but I seem to have forgotten that, too.

Now, back to the topic.

We’re either getting ownership of a string and passing that on, or the foo1 function returns a &S. There is no possible performance problem that can occur here that would be more obvious with full annotations

&S(ref s) does show you the reference, but there’s nothing you could have changed about it anyway, so knowing the type is not helpful.

same here

duplicate of the first?

Either you are

  • cloning a String value (wasteful since since you don’t use s after this)
  • are cloning from a &String, thus bar1 takes a String argument
  • are cloning from a &&String, thus clippy will lint about double references and your function foo2 is weird anyway

This snippet is often written as bar1(t.foo2().0.clone()), which also doesn’t show you any types. If you want explicit types in either case you need to specify them on the let binding.

Also there’s an open issue about detecting clone calls on values that are not used again. Imo this is the correct solution to the ambiguity and possible perf problem here.

This is always fine, although the & might be overkill if s is already a reference (which clippy would lint about). It might be that foo1 returns a String, but if the only way to create that value is via a heap allocation, you have no way to get such a value differently. If there is another function foo3 which work without allocating and gives you a reference, then it seems prudent to only expose that one and allow users to choose whether to clone or not.


#24

Please don’t phrase things like that. I’m preparing examples but it will take a bit. Telling us we’re wrong like we haven’t thought about it is not helpful.

It most certainly does solve things, the issue is wether or not you see the things as problems.


#25

That was indeed wrongly phrased, sorry

It doesn’t solve the things you are trying to solve in a holistic manner, while blocking valid use cases


#26

I’m not sure what this means, since clearly we don’t have the same argument?

I never explicitly stated what I wanted the lint to be, but the statement you quoted was meant to imply that I wanted something holistic and not limited to just ‘match ergonomics’. Note, the entire point of lints is to deny valid code and the entire point of having default allow lints is to let users pick whether they want to use that lint or not. I honestly can’t believe that we’re literally arguing about whether lints should deny valid use cases since that’s what they do by definition and can’t do anything else. All invalid use cases which can be efficiently detected are already compiler errors rather than lints.

As for the underlying problems with features like match ergonomics (I have similar issues with autoderef), at a high level is as follows. I remember Niko describing the motivation for autoderef (which has existed since before 1.0). He said that once he’d learned the language, having to type extra *s between & and exp was just busy work and so, it was decided to have the compiler do it for you. The oversight here (which may or may not be sufficient to override the decision, depending on the way you personally weight various things) was that part of the reason he learned the language and felt like the *s were busy work is because compiler errors are a teaching tool for the language. Something not compiling is a forcing function for reevaluating your mental model of the language. This means that whenever a potential syntactic construct can be valid or invalid, the potential for users to develop an inaccurate mental model of the language should be considered. Granted, it is hard to concretely account for this, way more so than it is to account for the number of users complaining about compiler errors and syntactic noise. Note, & and &mut when passing arguments to a function could be removed, C++ doesn’t require any additional syntactic noise when calling functions with a reference instead of a value, and yet, Rust decided that this noise was worth it. We may disagree about the weight of the problem and exactly where the line is on any given construct, but we should be able to understand that this spectrum exists.

Do people complain about the assign_ops lint? Do people even know it exists and if they used it, it would conflict with the assign_ops_pattern lint? Your objections aren’t grounded in the reality of a lint you yourself created and admitted to wanting to remove, but the lack of complaints meant that you not only didn’t remove it but you didn’t file an issue to remove it (and I assume no one else did). If the suggestion was for a deny or warn lint, I would understand the objection. But given the existing set of allow lints, I think you’re standing on quicksand by rejecting any lint that is worth multiple people spending hours merely discussing.

Are you suggesting that you want to make a breaking change to remove a lint from semi-official Rust project that would break users who depend on it without any evidence that people dislike it or want it removed?

Apparently my attempt at an example failed. Perhaps because I didn’t add enough context. My motivation for the example was some experience working on lalrpop and migrating from using the custom string interner to using one from servo (I think). The new one didn’t implement copy, so various value bindings needed to become reference bindings. It turns out there are matches all over lalrpop that involve interned strings. I suppose you would suggest that match ergonomics would result in correct code while not requiring me to have changed as much of it directly. And in this case, that’s true. But interestingly, I learned a lot about matching and bindings from changing that code and additionally, when I was done, I had high confidence that I knew what it did. If match ergonomics were available at the time, do I think the code would’ve become incorrect in some way? No, probably not. Can I imagine a scenario where the code (ignoring unsafe) would become incorrect with match ergonomics in play? Not really. The more I consider it, the more I realize that incorrect code isn’t the point. The primary reason I’m able to currently reason about how match ergonomics would impact that change is because I needed to type it out explicitly at the time. I simply wouldn’t have learned as much without the compiler complaining and having to search the source for every location and therefore learning even more. And that’s ignoring the fact that the process gave me confidence that the code was right. If having a lint (regardless of the specific cases) can result in bringing that possibility for learning/confidence back to the language, I’d vote yes.


#27

It does not fit in the rules specified in the clippy 1.0 RFC, so yes wrt removing, no wrt breaking change. No code will break, just have a lint fewer that is reported.

I have not even considered this perspective. All the discussion I’ve seen so far argued for the dangers of making mistakes due to the lack of clarity.

Having a lint category for full explicitness/maximal verbosity seems to fit into the various ideas for safety critical Rust code (MISRA-Rust and friends). It might not be necessary for correctness, but it will increase confidence in the correctness nontheless in that industry. Partially because “that’s how we’ve always done it”, but also a lot because you need to have much fewer features in your head when reviewing code.


#28

They are the same fundamental concern. One’s perspective on making mistakes is directly linked to one’s confidence in that code being correct. It doesn’t matter that Rust goes a long way to making sure it’s correct (in some cases, at least) and if this were a discussion about C++, no one would suggest that arguing for the dangers of making mistakes is somehow the wrong approach.

Unfortunately, Rust doesn’t stop all errors/mistakes and just because it does better than basically any other language doesn’t mean that we should shift toward stance where we reject defense in depth as a strategy (and more importantly, we shouldn’t reject advocates of that strategy).


#29

Personally I don’t see much harm in having a disabled-by-default clippy lint for locally “turning-off” match ergonomics. (which ideally will not be used for whole crates, but only for selected parts with complex matches) “Unergonomic” matches are still valid Rust and I think it’s somewhat wrong to effectively enforce “ergonomics” by denying tools for tracking it, while this feature is quite easy to slip and hard to notice. To reduce risk of abuse and “dialectization” we could adapt an official stance (e.g. via clippy documentation) which will say that “unergonomic” matches are considered non-idiomatic.


#30

Here are some demonstrations of issues caused by match ergonomics. I hope these will suffice as demonstrations of the kinds of guarantees I’m after.

Case A: T to &mut T

Example

In this case a type changes from an owned value to a mutable reference. Even though there is an API change, the same code dealing with the data still compiles and runs, simply giving different results.

The version with an owned value has this behavior:

has resource 23
finalizing ptr
releasing resources
assume related resource dropped
reusing ptr

While the mutable reference of course has different drop semantics:

has resource 23
finalizing ptr
assume related resource dropped
reusing ptr
releasing resources

When we perform this change in a version of Rust without match ergonomics, we get a clear error message:

error[E0308]: mismatched types
  --> src/main.rs:23:17
   |
23 |                 Some(resource) => {
   |                 ^^^^^^^^^^^^^^ expected mutable reference, found enum `std::option::Option`
...
85 |     fn_handler!();
   |     -------------- in this macro invocation
   |
   = note: expected type `&mut std::option::Option<Resource>`
              found type `std::option::Option<_>`

While this example uses raw pointers to mark the relevant parts, the issue of unexpected changes to drop semantics is certainly not unknown in Rust. FFI resources are simply one of the more severe cases of this problem, due to issues with debuggability and things being outside of the view of the Rust compiler.

The example involves a change from an owned value to a mutable reference. But the other direction is just as undiscoverable and just as problematic. An owned value might get released earlier than expected, leading to dangling pointers.

Case B: Lifetime Semantics Change

Example

In this case a type changes from a reference to a tuple, to a tuple of two references. The same code operating on a single reference with a single lifetime still applies even though now two separate lifetimes are involved.

With match ergonomics, the code compiles and the change is not noticable.

Without match ergonomics, Rust gives us a clear error for our initial written code, asking us to be explicit with our reference semantics:

error[E0308]: mismatched types
  --> src/main.rs:12:17
   |
12 |             let (a, b) = get(data);
   |                 ^^^^^^ expected &(i32, i32), found tuple
...
38 |     fn_handler!();
   |     -------------- in this macro invocation
   |
   = note: expected type `&(i32, i32)`
              found type `(_, _)

If that fix is applied, the change to a flat tuple with reference will cause another clear error:

error[E0308]: mismatched types
  --> src/main.rs:12:17
   |
12 |             let &(a, b) = get(data);
   |                 ^^^^^^^ expected tuple, found reference
...
59 |     fn_handler!();
   |     -------------- in this macro invocation
   |
   = note: expected type `(&i32, &i32)`
              found type `&_`

Communicating Intended Semantics

Example

This case isn’t about an issue due to code changing, but about debuggability and readability of the code.

It demonstrates a state enum containing multiple kinds of cases. It is a mix of values to mutate, values to only introspect, and primitives that are copied out.

The full-on match ergonomics version hides all those details. Copied, introspected and mutated parts of the state all look the same.

A third match is provided detailing how enforcement of having to write ref mut highlights the mutated part and makes it easier to spot something that should not be mutated.

Some notes:

  • Detecting useless &mut _ bindings will not solve the issue, since we might be trying to debug accidental mutation.
  • A stored &mut _ could still allow hidden mutation, but I can force the consumer to write an identifier for the field name, like stream_ref or stream_mut. This isn’t possible with normal patterns.

It should also be noted that since this is a convention-based use-case, there is no way for a lint targetting specific problems to be a proper solution. The value is the enforcement of writing in the conventional, explicit style, even when not necessary. Given the above example, the value comes from being able to enforce that every stream handler change still follows the convention and communicates intent. Occasional uniformity is not uniformity.

Deref Coercions

I do believe that deref-coercions fall into a similar category and might also be worth having an optional lint. There is also the issue of the ergonomics semantics stacking up. After all, it’s the combination of match and deref ergonomics that gives us

let Foo(x): &Foo<i32> = &&Foo(23);

In Defense of Defensive Programming

I think that there is lots of value in the ability to do defensive programming. I also believe that whenever certain guarantees no longer apply due to ergonomics or similar changes, a way to keep the defensive guarantees is of similar value.

We already have facilities for defensive programming in Rust:

  • We use .. when destructuring to mark that there are unmentioned fields in there.
  • We use mut to make bindings that change easily visible. Even with the existence of hidden &mut _ references or internal mutability, this is still a very valued feature by the community.
  • We have must_use to ensure things are explicitly discarded.

I’m sometimes wondering, if we had started out with match ergonomics, would a lint requiring one to explicitly state the reference semantics be this contested?

Other Lints

This is certainly not the only useful lint. There are others that would certainly be useful, like (non-exhaustive):

  • Linting against ref mut bindings that aren’t mutated.
  • Linting against plain match ergonomics bindings on &mut values when nothing or only parts are mutated.
  • Linting against treating single-variant enums as irrefutable.
  • Linting against deref coercions, as mentioned above.

Requiring explicit patterns is certainly not an answer to all the problems, but it certainly is to some. Given any piece of code I write, pre-match-ergonomics Rust certainly has more in-code guarantees than post-match-ergonomics Rust.

Summary

The examples are some specific highlights to issues that can come up during refactoring, but also during prototyping or simply because one misremembers an API or data structure.

Other things to consider are:

  • There might be multiple contributors of various experience levels.
  • The code might not be touched a lot.
  • Hidden semantics are always harder to catch during reviews.
  • The involved pieces might be far apart, even in different crates.
  • The code above uses raw pointers but no unsafe. The actual unsafe usage might be far away.
  • Parts of the code could be obscured by macros.
  • Some things like wrong raw pointer usage might accidentally keep working for a while, until a seemingly unrelated change causes obscure failures.
  • There’s a lot of pattern matching going on in Rust.
  • Patterns can be deeply nested.

I already mentioned that deref coercions can further complicate things. Some things in the future like in-band lifetimes or patterns performing deref coercions themselves would complicate things even further. And they will certailnly increase the cognitive burden for reviewing patches that will probably not include all relevant parts.

As a final note, thanks to @ahmedcharles for highlighting the fundamental issue here: Confidence in the code, guarantees that are upheld by the compiler, and minimizing the amount of features one has to keep in ones head while understanding the current and possible future implications of any given piece of code.


Lived Experiences: Strange match ergonomics
#31

Small nitpick, the general message is fine:

This lint would not be a very good one. If the desire is to have an enum where you can add more variants later, the correct solution is to mark it as such with #[non_exhaustive], not to have a lint prevent you from destructuring the single existing variant without a _ pattern.

Note that match val { Enum::Variant(val) => {} } is still treating the pattern as irrefutable. If it wasn’t, the compiler would complain about a nonexhaustive match. Which is exactly the functionality that #[non_exhaustive] is for.


#32

Yes, that might work. Though in this case I was more thinking about the consistent convention case. There you might want the match to not require a fallback arm, but still want the match to be used for all enums.

But in general, I agree. non_exhaustive is also an important strategy for future-proofing.


#33

I see the code has a call to drop there. I assume that’s not a placeholder, given the comment above it?

If so, maybe it makes sense to uplift this clippy lint that errors on the unmodified “case A” code:

error: calls to `std::mem::drop` with a reference instead of an owned value. Dropping a reference does nothing.
  --> src/main.rs:19:21
   |
19 |                     drop(resource);
   |                     ^^^^^^^^^^^^^^
...
76 |     fn_handler!();
   |     -------------- in this macro invocation
   |
   = note: #[deny(drop_ref)] on by default
note: argument has type &mut Resource
  --> src/main.rs:19:26
   |
19 |                     drop(resource);
   |                          ^^^^^^^^
...
76 |     fn_handler!();
   |     -------------- in this macro invocation
   = help: for further information visit https://rust-lang-nursery.github.io/rust-clippy/v0.0.212/index.html#drop_ref

Can you elaborate on this one? My intuition would be for the opposite: prefer let to match for irrefutable patterns.


#34

It doesn’t have to be an explicit drop call, that was just for the purpose of demonstrating that an explicit one doesn’t currently help in that scenario. But the assumption that something is dropped can exist without it being explicitly mentioned.

Though I agree that “You’re dropping things that aren’t useful to drop” would be a very useful lint as well. That would at least catch the ones where one thinks about explicitly marking the drop intention.

Edit: On the other end of the spectrum, mem::forget could probably use a lint for noop uses as well. Though I haven’t used that one in quite a while.

Assuming you have a bunch of related state types and you want to make sure the handler functions don’t suggest something isn’t an enum. It’s probably not one of the more useful possible additional lints. I just thought of it because I tend to always use match on things that are “semantically many”, even though in specific cases they might have only one case.

Tends to make it nicer to compare things when you open many of those function kinds next to each other. But I agree that that one is purely style.