Pre-RFC: View Patterns

TL;DR: Introduce patterns of the form f -> pattern, which have the semantics of applying f to the matched value and then matching pattern on the result. This feature generalizes pattern synonyms, box_patterns, advanced_slice_patterns, etc.


Pattern matching is a very powerful construct. It makes it very easy to work with complex structures like nested slices, lists, and ASTs.

However, pattern matching in its current form still has many limitations. For example, there is no stable way to match the contents of Box or an Rc. There is no way to abstract common patterns, short of using a macro (which has its own limitations; for example, macro-generated patterns defined in a module foo cannot match on private types from fooexample).

It would also be nice if libraries could provide pattern-matching constructs other than what are built in via enums. For example, one might want a way to match a slice that starts with a certain prefix and then bind a variable to the sub-slice that comes after that prefix. There is currently no way to do this.

In short: patterns currently can’t build abstractions.

What I propose is that we add a feature analogous to Haskell’s view patterns (or ‘the one pattern to rule them all’), with a few modifications to account for the specialties of Rust’s borrow checker.

Example

As an example of where this might be useful, consider the following code:

fn to_digit(c: char) -> Option<u32> {
    c.to_digit(10)
}
fn classify(c: char) -> CharClass {
    match c {
        ' ' | '\t' | '\n' => CharClass::Whitespace,
        c where c.is_ascii_alphabetic() => CharClass::Alpha,
        c where to_digit(c).is_some()
            => CharClass::Digit(to_digit(c).unwrap()),
        _ => CharClass::Other,
    }
}

The third arm – the one that checks for digits – is quite ugly. With view patterns, it could be rewritten as:

/* ... */
fn classify(c: char) -> CharClass {
    match c {
        ' ' | '\t' | '\n' => CharClass::Whitespace,
        c where c.is_ascii_alphabetic() => CharClass::Alpha,
        (to_digit -> Some(n)) => CharClass::Digit(n),
        _ => CharClass::Other,
    }
}

Interaction With Ownership

Rust distinguishes between patterns that move the value they match on and patterns that only borrow it. This is an important distinction because a construct like a @ subpattern can only work if subpattern doesn’t move the value it matches on.

Because of this distinction, we will need three different forms of view pattern: one where the view function takes its argument by value, on where it takes it by reference, and one by mutable reference. For this example, I’ll be using f -> p, ref f -> p, and ref mut f -> p respectively, but this is open to change (in in fact probably should be).

To be precise:

  • if let (f -> pat) = v {} is equivalent to if let pat = f(v) {}.
  • if let (ref f -> pat) = v {} is equivalent to if let pat = f(&v) {}.
  • if let (ref mut f -> pat) = v {} is equivalent to if let pat = f(&mut v) {}.

Exhaustiveness

A view pattern such as f -> pat should be considered irrefutable if and only if pat is irrefutable. Otherwise, no assumptions should be made about whether this will match.

Pros

  • This allows one to express some constructs that cannot currently be written. The example code shown above cannot currently be written with a match without a redundant unwrap because if let guards are not supported and there’s no way to fallthrough to the next arm of a match.

  • We can continue adding new pattern constructs such as range patterns, box patterns, advanced slice patterns, etc., but this feature subsumes all of them and also allows library authors to define their own.

    • Range patterns can be implemented using the following view function:

      fn range<'a, B: Ord>(start: &'a B, end: &'a B) -> impl Fn(&B) -> bool + 'a {
          move |e| start <= e && e < end
      }
      

      The pattern a..b becomes (ref range(&a, &b) -> true).

    • box p can be reduced to one of ((|b| *b) -> p), (ref Deref::deref -> p), or (ref mut DerefMut::deref_mut -> p).

    • advanced_slice_patterns takes more work (it would require return-type polymorphism and more complex type-level hackery), but it could be done; more importantly, it could be done without adding anything to the compiler but this one feature.

Cons

  • This adds more complexity to patterns.
  • The feature is potentially confusing and hard to search for.

There are probably more problems with this that I haven’t thought of.

Bikeshedding

  • Should the parentheses always be required around view patterns?
  • I believe the current syntax would cause infinite lookahead, so we probably want some kind of mandatory prefix.
  • It would be nice if there was a way to write a method call in place of f, or to otherwise include functions that don’t just take one parameter. You can use a lambda, but (|c| c.to_digit(10)) -> Some(n) looks a bit ugly.
  • It might be more ergonomic to have the difference between (f -> pat), (ref f -> pat), and (ref mut f -> pat) be inferred instead of explicit.
  • Should we have special syntax for f -> Some(p) and f -> true?
5 Likes

View patterns are quite a nice thing in Haskell but I think they pose challenges to overcome when adapting them for Rust. I think perhaps we've accepted some changes to pattern matching and control flow lately and giving it a breather before doing more changes would be good given our probable roadmap goals for 2019. I think also introducing pattern synonyms might be a step to take before this one. That said, here are some questions and notes for you to ponder.

-XPatternFamilies do, -XViewPatterns don't.

How do you get things to compose with move semantics and how do you get this to be efficient? E.g. consider:

enum E { A, B, C }
enum F { A, B }

fn foo(x: usize) -> Option<E> { ... }
fn bar(x: E) -> Option<F> { ... }

match foo(x) {
    Some(bar -> Some(F::A)) => a,
    Some(bar -> Some(F::B)) => b,
    _ => c,
}

Ideally, we want something like:

'_0: {
    match foo(x) {
        Some(_1) => match bar(_1) {
            Some(F::A) => break '_0 a,
            Some(F::B) => break '_0 b,
            _ => {},
        },
        _ => {},
    }

    c
}

but side-effects and poor desugarings could instead apply bar(x) twice.

I think it's also important that you consider how pat ::= ... | expr -> pat ; can nest arbitrarily as a pattern form.

See Tracking issue for RFC 2294, "if let guard" · Issue #51114 · rust-lang/rust · GitHub.

Can you say why? (I'm sure it's true I just haven't thought about it).

Sounds nice, how is this achieved?

Doesn't seem warranted. We don't have it in if let.

3 Likes

The points you make are probably a good argument for not adding (or at least delaying decision on) this feature.

-XPatternFamilies do, -XViewPatterns don’t.

Everything that can be done with pattern synonyms can be done with view patterns. For example, the view function


fn join<T, E1, E2>(r: Result<Result<T, E1>, E2>) -> Option<T> {
    match r {
        Ok(Ok(t)) => Some(t),
        _ => None,
    }
}

acts like the unidirectional pattern synonym pattern Join r = Right (Right r).

I didn't know about that. Given that if-let guards and view patterns are equally powerful I'd probably be fine with either one.

Regarding infinite lookahead: if it's ambiguous whether f is a pattern or an expression then the parser wouldn't know which one it is until it encounters the ->.

I don't know, but it seems like something that needs to be solved anyway if we intend for something like b @ box (ref n) to ever be valid (although it's quite possible that we don't).

  • |x| expr -> pat: is it (|x| expr) -> pat or | (x) | (expr -> pat)?

  • |x| -> T expr -> pat, is, hilariously, not ambiguous, afaict.

I have to ask, why? Why can't you just call the function with the value and pattern match on the return value? This seems redundant.

How so? Both types provide access to the underlying value (e.g. Deref); they would be pretty useless otherwise.

I have to disagree here. I would find code with custom pattern matching rules hard to read; I don't want arbitrary code to be disguised as a common language feature.

This is debatable at best; patterns are already an abstraction, but I don't think arguing semantics is particularly productive. Patterns are suited for one thing, they do it well, and I don't think we should hang more and more bells and whistles on them, they are already quite complex.

I disagree with this point. Among other things, patterns are hard to use with types that are semantically something like an enum but use a custom layout for higher efficiency. You can have a method to convert to a regular enum and then match on that, but for that to actually produce efficient code is a big ask for the optimizer. In lieu of that, if you want efficient layout, you have to sacrifice ergonomics, which is ideally something Rust would avoid.

I'm not sure how well this particular proposal would solve that, but extensible patterns in general are a feature I'm quite interested in.

3 Likes

You can't do this when nested inside another pattern. The code in the original post provides an example of a place where our current tools are very awkward.

Again, deref cannot be called when the value in question is nested inside of another pattern.

This is subjective, but I can see where you're coming from. The name of the function is there, but it can still be unclear. That said, Haskell code using view patterns tends to use them fairly sparingly and I haven't seen a place where it made things particularly unclear.

2 Likes

In that example, even though it's a simple one, it is quite confusing. Furthermore, .is_some() followed by unwrap() is an oft-encountered pattern which is trivially rewritten using if let or an inner match:

match c {
    ' ' | '\t' | '\n' => CharClass::Whitespace,
    c where c.is_ascii_alphabetic() => CharClass::Alpha,
    c => if let Some(digit) = to_digit(c) {
        CharClass::Digit(digit)
    } else {
        CharClass::Other
    }
}

This is perfectly clear, as short as it reasonably needs to be, and doesn't require any additional language features. I would say that there is no reason why you should have written the code in your first example in the first place, nor does this example illustrate well how the proposal is more useful than complicated.


To be honest, everytime someone tries to justify a language feature by arguing that it makes some convoluted code clearer, I invariably think that the code in question just needs a bit more thought and effort to write in a more considerate and systematic manner, using the already wide palette of existing elements of Rust.

It's possible to come up with virtually any number of (code snippet, feature) pairs where the former would be made "shorter" or "clearer" (for some, often subjective definition of these words) by the latter. But we shouldn't be so preoccupied with whether we can that we don't ask whether we should. In particular, I don't think it's healthy to continuously (and additively) adapt the language to individuals' taste, it is simply not a sustainable design and development model. Instead, users should become experienced in using existing features, learn idioms, common patterns of refactoring (and antipatterns to be refactored), and adapt their own way of thinking to the language, if they want to deal in the language.

2 Likes

Simpler:

match c {
    ' ' | '\t' | '\n' => CharClass::Whitespace,
    c if c.is_ascii_alphabetic() => CharClass::Alpha,
    c if let Some(digit) = to_digit(c) => CharClass::Digit(digit),
    _ => CharClass::Other,
}

What is nice about this is that you have less rightward drift, which is generally something you should fight for more readable code. One could easily imagine that the bodies of each arm is quite indented. Shaving off one level of indent makes it less likely that you go over the 80/100 limit and so that often makes code cleaner.

Why do you assume their code is convoluted? This extension exists in Haskell and is useful in real world code. There were occasions today when hacking on rustc where this would have been useful.

In my experience this just doesn't happen.

There's always a give and take. Expecting new users to just be completely assimilated is I think unrealistic. This certainly doesn't happen with natural languages and I don't see why formal ones would be much different.

I'd also note that this particular idiom comes from Haskell. Why is that relevant? Because Rust is similar to Haskell in many ways (at least when it doesn't pertain to strict/lazy...) and therefore it makes sense to ask whether something in Haskell would work well in Rust as well. That doesn't mean we automatically do it -- it needs to stand on its own merits, but Haskell is a good source of inspiration.

4 Likes

Because the example was. Testing for is_some and then unwrapping is strictly worse than pattern matching.

It's not as drastic as that. I don't know what you are trying to say with "complete assimilation" (as if it were something aggressive or forced), but certainly learning how to program in an idiomatic manner in a certain language shouldn't be considered much of a hassle or an expectation too high, if someone wants to program in said language. Sure, people coming from different backgrounds will have different styles, and it's fine; however, that in itself doesn't justify changing an important aspect of the language.

I do realize that. I don't think I ever argued against this feature because it comes from Haskell, I didn't even mention it in fact. I argued against it because I think its benefits don't outweigh its drawbacks.

3 Likes

That is all that is necessary, let's weigh the proposal on its merits... no need to add a "To be honest..." :wink: All you need to say is "I don't think this contributes to readability/unsufficiently-common/covered-well-by-other-things..." and then we can have differences of opinion wrt. that.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.