Vibe check: equality patterns

Someone recently suggested the following idea: we could allow patterns that repeat variables.

match ... {
    (a, a) => 1, // covers (true, true) and (false, false)
    (true, false) => 2,
    (false, true) => 3,
}

(a, a) would behave like (a, b) if a == b except it would work with exhaustiveness checking.

I did some reasearch. These patterns are called "nonlinear" (using "linear" in the resource sense, like in "linear types"). A bunch of languages have this feature including Erlang, Miranda, Prolog.

Ocaml and Haskell do not have them, and afaict the reasons boil down to:

  • Not all values are comparable (see here) -> we can require Eq;
  • The == implementation may differ from structural equality (see here, here) -> we can require recursive StructuralEq;
  • This will be hard to specify (e.g. here) -> we can just compile (a, a) to (a, a2) if a == a2;
  • This may be a typo -> fair, exhaustiveness should catch common cases;
  • This is a complicated feature -> haha yep.

What I've not seen is discussion of how to check exhaustiveness for these patterns; I can't find anything in the literature. This seems to be the real complexity with this feature. I've thought about how to extend rust's exhaustiveness checker and I think I can reduce it to something like a multigraph coloring problem; not trivial but not impossible either.

How do we feel about that? Have you missed this feature before? Would it be worth the extra complexity?

8 Likes

This is the crux of the question, isn't it? Guards would start participating in exhaustiveness checking.

If this starts working:

    match (bool1, bool2) {
        (a, a2) if a == a2 => {}
        (true, false) => {}
        (false, true) => {}
    }

Shouldn't this?

    match (bool1, bool2) {
        (a, a2) if a == a2 => {}
        (a, a2) if a != a2 => {}
    }

And then people will expect this to work even more than they already do today:[1]

    match int {
        a if a > other_int => {}
        a if a < other_int => {}
        a if a == other_int => {}
    }

  1. ignore the float aspect in the issue as that's just the first example I found; I've seen it on URLO before too ↩︎

Ah, I shouldn't have said "sugar"; we can't have guards participating in exhaustiveness with any remote sense of consistency.

What I meant is: "(a, a) is a new pattern syntax that has the same runtime behavior as (a, b) if a == b and (as all patterns do) interacts with exhaustiveness".

6 Likes

It could be restricted to only types and patterns where it isn't observable which of the many bindings is used. Compared to current const patterns, that would at least rule out floats and binding by reference. We could recover the lost expressiveness by still allowing something like

(a @ x, x) => { use(a); }

where the binding x: &T exists only for pattern matching but isn't usable later.

I mean, yes, but it also seems perfectly fine to allow them for references if we consistently pick the syntactically first binding. People who care about the difference can write an @ pattern like you suggest.

I hadn't thought of using @ bindings like that, this would be useful for non-Copy types: (a @ ref x, b @ ref x) would compare for equality and give me ownership of both values.

1 Like

This feature is something I miss from Erlang. And I regularly find myself writing this sort of code and then getting annoyed that it doesn't work.

The workaround with guards is unsatisfactory since you regularly end up with a trailing _ => unreachable!(), and you now have to check by hand that you actually did cover all the cases. It is always better when the compiler can check your work.

Also, the guard based workaround is more verbose and less readable. It is just a worse alternative.

So I would love to see this feature.

1 Like

FWIW Swift has had Eq patterns from the start (actually it’s even more general than that, it’s a dedicated operation that Equatable is just one provider for). They do not participate in exhaustivity checking. Occasionally people complain about specific examples where they want the compiler to be smarter but mostly it’s fine, you just write (the equivalent of) unreachable!() if you really don’t have a catch-all case.

Rust, unfortunately, is a bit past that because it already allows consts to be used in an exhaustive way. But I don’t think it’s the worst idea; as noted it’s not really different from how arms with guards don’t figure into exhaustivity.

Maybe it would be fine, but I do see some potential for confusion there. Someone might expect that it's the last binding that takes effect since that's how shadowing would work.

Also, if you are trying to mutate the value behind that reference, it would be preferable to get an error that requires you to be explicit about the target:

let mut x = (0,0);
if let (a,a) = &mut x {
    *a += 1; // not obvious if this increments x.0 or x.1 (or both?!)
}
10 Likes

As the person who made the original suggestion, let me copy over my motivating example:

#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
pub enum LogLevel {
    Quiet = 0,
    Progress = 1,
    Debug = 2,
}

// E0004: non-exhaustive patterns: (Quiet, Quiet) not covered
pub fn describe_log_level_change(old_lvl: LogLevel, lvl: LogLevel) -> &'static str {
    match (old_lvl, lvl) {
        (a, b) if a == b        => "No change",
        (_, LogLevel::Debug)    => "Debug logging activated",
        (LogLevel::Debug, _)    => "Debug logging deactivated",
        (_, LogLevel::Progress) => "Progress reporting activated",
        (LogLevel::Progress, _) => "Progress reporting deactivated",
    }
}

My initial reaction to getting this error was "ugh why don't guards participate in exhaustiveness checking", but that's ruled out by Rice's Theorem unless we put severe restrictions on what can appear in a guard expression, and if I add a trailing (Quiet, Quiet) => unreachable!() arm LLVM emits code for it, so it looks like we don't have the infrastructure to do that even if we wanted to.

And then without really thinking about it I tried writing (a, a) => "No change" instead and got a multiple bindings error, which I present as testimonial that the suggested notation is intuitive and won't break existing code.

4 Likes

Since this is about exhaustiveness, maybe (a, a) should desugar to (true, true) | (false, false) instead of (a, b) if a == b? (with capturing of the value, of course). This does mean that it can only work for booleans and for enums that only have unit variants - but for other cases you can just use a guard since it does not help with the exhaustiveness.

2 Likes

What about if it only helps with exhaustiveness when it comes to those simple finite types? But can still be used for u8 etc.

1 Like

At least in theory, you can generalise this to all types where the compiler knows enough to exhaustively generate all possible values of a - u8, i64 etc can all be done by the equivalent of generating all possible values. You'd want to short-circuit that, because actually generating 264 match arms is going to be problematic for other reasons, but it works by the same argument.

Could exhaustiveness checking use the 2^n patterns (I assume exhaustiveness pattern has a compact way to represent it, otherwise, it would indeed be much to slow for n > 32), while code generation use (a, b) if a == b ?

Code generation can be smart about this, and doesn't have to actually lower this to existing code. Think about enums that don't have Eq (or even PartialEq) - they should still support this even if they can't a == b.

2 Likes

It's too clever IMHO.

Patterns already allow the same binding to appear multiple times in alternatives This(a) | That(a), so the proposed syntax gives a completely new meaning to only a subtly different variation of the syntax.

3 Likes

If I modify my "motivating example" code to list all the (x, x) pairs instead of using (a, b) if a == b, LLVM generates notably tighter code. Not only is the unreachable arm gone, but also it is able to merge the two final basic blocks. And I needed an #[inline(never)] annotation to get assembly code to be emitted at all for that version of the function, which suggests that the inliner's heuristics are assessing it as much lighter weight than the alternative.

@Nadrieril Since @kornel is concerned that (a, a) => ... and This(a) | That(a) => ... are confusingly similar despite very different behavior: do Erlang, Miranda, or Prolog do anything to make these cases more obviously different? I am not particularly familiar with any of them.

(They don't seem confusingly similar to me, but I have a lot of practice paying attention to small syntactic differences and a high tolerance for terse notation.)

3 Likes

It has been quite a while since I actually coded in Erlang. Well over 15 years by now. So take this with a grain of salt:

Erlang has assign-once read-only variables (I don't even think it has shadowing, don't quite remember). And pattern matching is everywhere, not just in match statement, but all assignments are also pattern matching. As such matching on for example (AVariable, AVariable) is idiomatic and will bind the first use of it, and then try to use that binding to check the second use of it.

I don't remember that Erlang has anything like This(a) | That(a). Looking through the docs I don't see anything like that either. And some of the examples (like days for a given calendar month) are written expanded. I would assume they would use such a more compact syntax if they had it.

Prolog is very similar here to Erlang, and since I have used it much more recently I'm almost certain it doesn't have This(a) | That(a) style syntax.

In both Prolog and Erlang (they are syntactically very similar, though semantics are very different) | is normally used to separate the head from the tail of a linked list (Lisp style cons list if that helps): [H|T] = MyList is a way to unpack a list of at least one element. So that operator definitely isn't it.

I'm not at all familiar with Miranda. Haskell might be worth taking a look at too, I never managed to get very far with learning it, but it does have simple pattern matching.

I do believe the match statement in Python supports | for or though. Might be worth checking?

Python does indeed support "this or that" matching with | as the operator... and it doesn't support bind-and-compare patterns:

In a given pattern, a given name can only be bound once. E.g. case x, x: ... is invalid while case [x] | x: ... is allowed.

Python 3.13 language reference, section 8.6.4.4 "Capture Patterns"

@Nadrieril said up top that Haskell doesn't have bind-and-compare either. I'll let someone else speak for Miranda.

By the way, it it necessary to add the new syntax (a, a) => … rather that just teaching the compiler that (a, b) if a == b => … can be optimized much better under some circumstances (which will also possibly speed up some existing program from free and not the new code using the new syntax)?

1 Like

I don't think this is a completely new meaning: I think it is a natural generalization of the same meaning, which is "wherever the same variable name appears, that refers to the same variable" (which is basically just the definition of "variables", in general).

The "difference" is that in This(a) | That(a), the actual object will only be one particular variant, while in (a, a), there's just a single case with 2 sub-values – but that's the natural difference between a sum type and a product type, and is a separate matter from whether patterns may refer to the same variable multiple times.

3 Likes