[Pre-RFC] More use of pipe operator in match

XAMPPRocky · January 16, 2017, 5:22pm

Feature Name: More use of pipe in match.
Start Date: 2017-01-16

Summary

this RFC proposes allowing the | operator to be used within patterns in match statements, to allow for pattern matching with less boilerplate.

Detailed Design

The current pattern matching in match statements is very powerful, however there is a lot of boilerplate code needed to fully take advantage of it. Consider a situation where you are building a state machine that is iterating through chars_indices, and when you see ' ', '\n', '\r' or '\f', you want to change the state. Currently your match statement would look something like the following example. There is no great way of reducing that boilerplate, if anything that boilerplate only grows worse as you have more cases, and bigger tuples.

match iter.next() {
    Some(_, ' ') | Some(_, '\n') | Some(_, '\r') | Some(_, '\f') => {
        // Change state
    }
    Some(index, ch) => {
        // Look at char
    }
    None => return Err(Eof),
}

The solution to this would be to allow for | to be used within tuples. This will significantly reduce how much boilerplate is required to have conditional matches with tuples.

match iter.next() {
    Some(_, ' ' | '\n' | '\r' | '\f') => {
        // Change state
    }
    Some(index, ch) => {
        // Look at char
    }
    None => return Err(Eof),
}

How We Teach This

I think all that would be required is a few examples, and changing existing ones. The syntax is pretty intuitive.

Drawbacks

None that I can think of.

Alternatives

Keep syntax as is.

Unresolved Questions

None at the moment

sfackler · January 16, 2017, 5:40pm

I’m definitely in favor of this in the abstract, but there is already a lot of work to do on improving the existing pattern matching logic in the compiler.

petrochenkov · January 16, 2017, 6:02pm

There's a postponed RFC about this.

XAMPPRocky · January 16, 2017, 9:16pm

That RFC is a lot broader, and I’m not sure if all of that RFC would be accepted today. This is just specifically about | in match expressions.

mbrubeck · January 17, 2017, 3:31am

The solution to this would be to allow for | to be used within tuples.

This should be generalized to patterns other than tuple patterns, for example struct patterns:

Range { start: 0 | 1, end: 5 | 6 }

Would | also be usable within patterns outside of match expressions (for example, in if let expressions)? I hope the answer is yes, for consistency in both use and implementation.

The RFC should probably specify the new grammar rules for match expressions and patterns. Could pats_or be eliminated, replaced by just pat?

withoutboats · January 17, 2017, 4:25am

I think this is probably a good idea, but like @mbrubeck I think the detailed design should be more explicit in discussing what the pattern grammar would be, and should be fully generalized.

It seems like it is along the lines of making the pats_or term a pattern. Right now, 0 | 1 is not a pattern, its two patterns joined by |. But if its a pattern in itself, this all falls out of that.

eminence · January 17, 2017, 8:06pm

Could this be done with a macro?

nikomatsakis · January 19, 2017, 7:41am

I think that making pat | pat be part of the pattern grammar makes sense. I believe it is a reasonably straight-forward extension semantically – that is, at worst, one can imagine just desugaring patterns like this into the more limited form we support today, right?

mgeisler · January 22, 2017, 7:56pm

Nobody mentioned that there is already a syntax for guards in match arm patterns – probably because it’s obvious to people here But for completeness, the example can be changed to use that guard syntax:

match iter.next() {
    Some((_, ch)) if "\n\r\t".contains(ch) => {
        // Change state
    }
    Some((index, ch)) => {
        // Look at char
    }
    None => {
        // Handle EOF
    }
}

That looks pretty short and direct to me. It now scales well if more characters are added to the set of interesting characters – which could easily be defined outside the pattern for better readability and for documentation purposes.

I also quite like how this syntax is general and allows me to express more or less complex conditions for the patterns.

DanielKeep · January 22, 2017, 9:37pm

Question: what happens if/when Rust gets more general constant evaluation, and someone tries to match BitFlags(ONE_FLAG | ANOTHER_FLAG)?

withoutboats · January 23, 2017, 11:17pm

This already exists:

match x {
    ONE_FLAG | ANOTHER_FLAG => { .. }
}

Patterns aren’t expressions and so can’t be eval’d, at compile time or otherwise.

gasche · January 25, 2017, 2:24pm

OCaml has had first-class disjunctive patterns (or-patterns) for a while, and we can confirm that it is a very nice feature. There is one thing that you should be careful about, which is the ordering/preference semantics in case both sides match, and whether/how the specified semantics matches user expectations.

In OCaml, the semantics of | is strongly left-to-right: when you write p | q, if p matches, then q is never considered. In general language design, “strongly in reading order” is a good choice (usually miles better than “unspecified” for example), but my OCaml experience is that for patterns, this semantics does not always correspond to user expectations. It is a declarative operator, and users will naturally think of it as commutative (they may not think of the difference between p | q and q | p, and be surprised that there is).

In most cases, this is just fine. We (OCaml people) discovered and tackled last year a case where this is not fine, involving an interaction between or-patterns and boolean guards (when in OCaml, if in Rust).

(* syntactic trees describing formal expressions in a monoid over base constants of type 'a *)
type 'a monoid_exp =
  | Const of 'a
  | Op of ('a monoid_exp * 'a monoid_exp)

let simplify_op is_neutral exp = match exp with
  | Op ((Const n, e) | (e, Const n)) when is_neutral n -> e
  | e -> e

So this simplification function is parametrized over a is_neutral function that tells you which elements of the base type are neutral elements, that can be simplified away. The problem is with the line

  | Op ((Const n, e) | (e, Const n)) when is_neutral n -> e

which does an or-pattern on a pair, and on the outside a guard. Suppose that you have the addition monoid with neutral element 0, we found out that Op (Const 0, Const 1) would get simplified as expected, but that Op (Const 1, Const 0) does not. This surprises users, and yet it is perfectly normal according to the left-to-right semantics.

I worked, with Luc Maranget and Thomas Réfis, on designing a warning to raise when this specific problem can occur, which we presented at the ML Workshop 2016 (at ICFP); see our 2-pages extended abstract and slides.

This is an anecdotal story, but I don’t think that you should interpret it as “this may introduce many usability issues”. This is the single issue that is specific to having deep or-patterns that I can remember right now – there are not many issues with it. As a language designer, I would definitely recommend adding the feature, I am just trying to warn that it may occasionally require a bit of care in managing its good usability.

One solution to side-step the issue of commutativity vs. left-right choice is to require that the components of an or-pattern be disjoint. With that restriction, all or-patterns are commutative, and the left-right reading or the angelic-choice reading coincide and are correct. It is a bit restrictive in practice: (0, n) | (n, 0) -> ... is a very common clause whose components are overlapping, and asking users to write a separate (0, 0) clause first is a bit irritating.

zackw · January 25, 2017, 3:57pm

 | Op ((Const n, e) | (e, Const n)) when is_neutral n -> e
 | e -> e
Suppose that you have the addition monoid with neutral element 0, we found out that Op (Const 0, Const 1) would get simplified as expected, but that Op (Const 1, Const 0) does not.

Hypothetically, could this be addressed by backtracking within the disjunction when the guard fails? That is, Op (Const 1, Const 0) initially matches Op(Const n, e), but that match fails the guard, so instead of jumping straight to the outer e -> e alternative, consider the Op(e, Const n) alternative next.

gasche · January 25, 2017, 4:11pm

Hypothetically, yes, but that is not necessarily easy/efficient to implement, it may result in side-effectful computations being duplicated, and it requires a careful specification. (It is not difficult to specify, “each possible way to match the value is tried (in left-to-right order)”.)

I do not think that it is a bad idea: it gives a behavior that very closely matches the user expectations, which is a good thing.

XAMPPRocky · January 26, 2017, 4:06pm

I’m sorry, I don’t know OCaml, could you show a example of this problem in Rust’s syntax? As it is hard for me to tell exactly what the problem is, and to update the RFC to address it.

gasche · January 26, 2017, 5:34pm

Here would be a translation in approximate Rust (plus deep or-patterns):

match exp {
  | Op ((Const n, e) | (e, Const n)) if is_neutral n -> e
  | _ -> exp
}

(Note: If you are willing to work on a language design, I think it is also interesting to put in the effort to read code in other languages to be able to communicate across communities.)

comex · January 26, 2017, 8:32pm

That’s not really Rust syntax… here’s a better version:

match exp {
  (Op(Const(n), e) | Op(e, Const(n))) if n.is_neutral() => e,
  _ => exp,
}

(No offense, but if you’re going to criticize someone’s lack of understanding of your language’s syntax, you might as well get their language’s syntax right…)

edit: Er, except what I wrote is already valid Rust syntax if you just remove the outermost parentheses. In the OCaml example, Op syntactically takes a single argument which happens to be a tuple, since the language apparently doesn’t allow variants to act like ‘true’ multi-argument functions. That is, you can’t have Op : monoid_exp -> monoid_exp -> monoid_exp. So it makes sense to have the deep-or within Op, between the two tuples. In Rust, it would be more natural to have Op take multiple arguments, i.e. Op(Expr, Expr), so I switched to that, but then it doesn’t really make sense to have | within the parentheses.

Anyway, the reason I explained that rather than changing it back is that - duh, combining alternations with if clauses is already supported (it just can’t be arbitrarily nested), and it does backtrack, i.e. the clause is run on each alternative that matches. So any coherent extension for nested alternatives (and possibly nested conditions?) would also have to backtrack.

gasche · January 28, 2017, 8:20am

It is more natural in OCaml as well, but I wanted to give an example where the or-pattern occurs in depth rather than at the toplevel, because the backtracking semantics is less natural/obvious in the in-depth case. I hadn't tried the toplevel case with a guard; if Rust backtracks on these (this is not specificed in the Reference), then indeed you need backtracking in depth, and then the problem of mismatch between user expectations and behavior does not happen, but you have to be careful about performance, and some other things become a bit weird (due to the "duplication of effects" issue): for example, 1 .. 3 is not equivalent to 1 | 2 | 3 as one might expect.

XAMPPRocky · January 30, 2017, 2:51am

@gasche Well if I am understanding both yours, and @comex’s example, I wrote this small playground which should show how rust handles this currently. Which is that the left most binding is the one that Rust uses when both are true.

comex · January 30, 2017, 6:18am

The key is not what happens when the guard passes both ways, but what happens if the guard fails the first time: Rust backtracks and tries the guard again with the pattern on the right, while OCaml has already ‘committed’ to the pattern on the left and gives up.

Topic		Replies	Views
Pre-RFC: syntax sugar for `matches!` language design	42	2681	November 1, 2020
[Pre-RFC] Alternative syntax for pattern-matching based branching language design	23	2089	September 22, 2019
Pre-Pre-RFC: `let else match` statement language design	74	4699	March 26, 2023
RFC idea: let ... else, easy early returns when pattern matching	35	2941	January 3, 2020
Pre-RFC: loop/for/else match ideas (deprecated)	5	2620	March 25, 2019