How to handle pattern matching on constants

I am not familiar with pattern matching in Scala. Does Scala’s extensible pattern matching use extensible equality? I can’t really imagine how.

Aren’t we saying exactly the same thing? Of course expression Add(TWO, TWO) does not match pattern Int(4), but constant expression TWO_AND_TWO does match constant pattern FOUR. Or am I misunderstanding?

The big problem with “extensible pattern matching” is that equality tests are not sufficient. For example, Option<T> implements PartialEq when T implements it. This means that if I pattern match on a constant Option, consistency demands that it use that?

enum Weird<T> {
    Yes(T),
    No
}
impl<T> PartialEq for Weird<T> {
    fn partial_eq(self, other: Self) -> bool {
        panic!();
    }
}
const OPT_C = Weird::Yes(3);

match Weird::Yes(3) {
    OPT_C => calls PartialEq under this proposal,
    Weird::Yes(_) => cannot call PartialEq,
    Weird::Yes(3) => should this call PartialEq?,
    Weird::No => should this call PartialEq?,
}

Scala allows you to define a function which will be used to match a value, so you can for example have a function that takes a string and parses it into an int, and the function can return Some(value) on success or None on failure, which doesn’t match. And then you can do a match which extracts the value of the int. For the record, I find this functionality to be more than a little weird in the way it works, because it doesn’t quite fit in with the rest of the language in various cases.

And I think matching something structurally is fundamentally different than doing a value match. If I do:

match x { 1 => 10, _ => 0 }

Then what I’m doing is asking for equality checks. The fact that the compiler may or may not be able to optimize this is largely an implementation detail, not a semantic one.

Whereas if I do:

match x { Some(1) => 10, Some(_) => 0, None => 0 }

I want something different. I want ‘structural’ matching on Option, but I want value matching on the thing inside Some, if it structurally matches Some, and therefore, it would use equality.

So, if you have a const value being matched, it uses value matching, since the compiler isn’t expected to introspect the structure, since there is no visible structure. Syntactically this makes sense, since the programmer didn’t specify structure, they specified a value. If you want to match the structure of something, you should type it out and make it obvious that’s what you want, right?

3 Likes

I think it would be like this:

match Weird::Yes(3) {
    OPT_C => // panics
    Weird::Yes(_) => // structurally matches against Weird, doesn't call PartialEq
    Weird::Yes(3) => // this can't run, given the above, but if it did, it would structurally match Weird, but call PartialEq for 3
    Weird::No => // structurally matches, without calling PartialEq
}

Note, this is consistent with the current behavior for builtin types, ignoring consts.

1 Like

As my other reply suggests, I agree with this completely.

Constants usually behave as if they were simply inlined at the use site (the reference even describes their functionality that way). It would seem odd if manually inlining the constant changed the behavior of a match (except perhaps by one version being forbidden), and strikes me as something people would screw up when refactoring.

That’s trivially not true, because it’s possible to have a constant with a value specified for a private field used outside the module which has access to the private field. Hence, if the code were inserted verbatim, it would fail to compile due to privacy errors.

(I did say one version might be forbidden.) But yes, I should have been more precise.

If both forms are valid Rust and all the tokens in the definition refer to the same things in both places, then I think they should behave the same way. Put another way: if the constant and its definition are equivalent when used as expressions, they should also be equivalent when used as patterns (if both are accepted).

(These are the same sorts of caveats that occur with macros, which really do expand to their definitions, but not quite in a literal text-copy-paste way.)

This is primarily because I expect using a constant as an expression to be more common, and I think it would be bad if you could think of a constant as a literal value almost everywhere, but in patterns doing so would give you incorrect behavior, often without even a warning from the compiler.

1 Like

Edit: after rereading the OP, I support tweaks to the current behavior. Fix the holes. Don’t attempt to base match on PartialEq, because that’s not powerful enough to do everything, and doing it sometimes is inconsistent.


I’ve been trying to think of a behavior that avoids leaking private members, gives consistent behavior with Weird (which pretty much has to be structural matching for everything), and isn’t stupidly complicated:

PatternMatch(P: Pattern, X: Value) -> bool:

  1. If P ends in @var, check if the rest of the P matches with X. If so, the pattern matches and X is bound to var.
  2. If P is _, P always matches X.
  3. If P is a single variable, P always matches X. Also bind X to the variable P (the complexities of borrow checking this are identical to the current implementation).
  4. If P is a different type than X, throw a type error. Type inference is performed, but structural pattern matching between different types makes no sense.
  5. If their type is a primitive, pattern matching is done by direct comparison.
  6. If their type has any private member variables in its definition, or is a generic (and thus we don’t know one way or the other), throw an access error.
  7. If their type is an enum, they do not match if their discriminants are different. If their discriminants are the same, they match if their variant is C-like, or if PatternMatch(P’, X’) returns true, where P’ and X’ are the tuple or struct after the discriminant.
  8. If their type is a tuple or struct, they match if, for every member M, PatternMatch(P.M, X.M) returns true.

I guess I can summarize this as: structural equality is identical to equality for primitives, and is identical to destructuring and comparing all the member variables for structural equality for compound types, including the restrictions on destructuring (privacy, parametricity, etc).

Thanks @petrochenkov. It’s interesting that all these uses are fairly simple data, although I can’t imagine how to categorise these.

(To be honest, I’m surprised that you’re surprised! My understanding of / intuition regarding how const should behave has been in alignment with what the current implementation does for some time. I think it was in the “oh noes, patterns are interpreted as matching against existing consts or enum variants if they’re in scope, and as introducing a new binding otherwise” and later the "split static into const and static" debates where these crystallized for me.)

The only one of the described issues that really bothers me is the interaction with associated constants. I think forbidding the use of associated constants in patterns would be a prudent way to address it. Or more generally and precisely: only allow use of constants in patterns when the value of the constant is known at that location.

(I don’t really like const fns either, so I’m not sure which part of that interaction I would identify as being the problematic one, if any.)

I think PartialEq (or Eq) is a red herring here. Consider a struct (let’s call it Foo) whose fields are all public (let’s call them a and b), and which has a manual PartialEq impl which is not the same as the one which would be derived (i.e., the structural one). As a client, I can write either foo1 == foo2 or foo1.a == foo2.a && foo1.b == foo2.b, and they will have different behavior, which may surprise me. Notably, this is the same issue as described for consts, without involving consts. I think it is fundamentally an API (rather than necessarily language) design flaw to expose multiple, interchangeable, but inconsistent ways of performing equality tests on a type. I think the appropriate remedy here is to lint against (a) defining a non-structural (or any manual) PartialEq impl on a type with all-public (all-PartialEq, or all-matchable) fields, and against (b) exposing both a non-structural/manual PartialEq impl as well as consts for a type with private fields.

Matching against consts of a type with private fields also doesn’t bother me (though if there’s also a PartialEq impl, then that should be a lint on the provider’s side, as described above), any more than having a smart constructor fn for a type with private fields bothers me. I think patterns should be thought of as just the mirror image of (“dual to”) literals: both are language primitives, and more fundamental than things like PartialEq which can be built on top in user-written code. Construction and deconstruction.

I think transparent vs. opaque constants is an interesting question though, even if it’s clear that the established semantics of the consts currently in the language is transparency. An intuitive way to think about it is, “should the value of the constant appear in the generated documentation?”. There’s an important parallel with type aliases, both of which are mechanisms for abstracting over (or, at least, making shorthands for) “compile-time things”. I believe the distinction between transparent and opaque constants corresponds directly to the distinction between type and abstract type. In other words, both our current type and const are transparent, and just as with abstract type, we could also introduce abstract const, which would be an opaque constant. For MLers (which I am only barely), this corresponds to the distinction between transparent and opaque types in signatures (sig type Foo end vs. sig type Foo = Int end), except that in Rust we don’t write module signatures (“types for modules”), instead writing pub (and potentially abstract) modifiers inline in module definitions.

Finally, I think our consts also correspond to a restricted subset (no bindings, only bidirectional) of the pattern synonyms which have been semi-recently added to GHC.

2 Likes

To connect some dots in and refine the above:

  • W.r.t. associated constants, I think “a constant whose value is not known at a particular location” (following the duality between existential and universal quantification - between abstract and generic types) is just the same thing as “a const which is opaque as seen from that location”. Which should therefore follow whatever policy we choose to have for opaque constants in general (either "can’t match", or “can match, but only non-exhaustively”, which seems better). So like associated types, they would be transparent when used concretely/monomorphically (the impl to use is uniquely determined), and opaque when used generically (abstracted over impls). (And in both cases, we could allow writing abstract type or abstract const in the impl to make it opaque even if used concretely/monomorphically.)

  • W.r.t. a lint for “exposing both a non-structural/manual PartialEq impl as well as consts for a type with private fields”, of course this would only be for transparent consts (which are the only ones we have, right now). (Edit: actually, this only holds if our matching-on-opaque-consts policy is “can’t”.)

Rust’s type system is based on algebraic data types, which results in being able to take ‘unit’, ‘product’ and ‘sum’ and construct any other time. This implies that fundamentally, a structural equivalence should be done at this level, in theory. If structurally equivalent types aren’t actually equivalent, that has to be due to a ‘user’ defined operation which understands more information and makes the distinction. The same is true for the opposite, structurally nonequivalent types that are actually equivalent.

The following program, while somewhat silly, does demonstrate that Rust is inconsistent in how it applies pattern matching, when considering primitive types vs user defined types. Fundamentally, a f64 is a u64 with different operations. Rusts documentation suggests that the reason for this is because f64 and u64 have different impls of PartialEq (https://doc.rust-lang.org/stable/std/primitive.f64.html).

#[derive(Debug)]
struct Float {
    value: u64,
}

impl std::cmp::PartialEq for Float {
    fn eq(&self, other: &Float) -> bool {
        unsafe {
            let a: f64 = std::mem::transmute(self.value);
            let b: f64 = std::mem::transmute(other.value);
            a == b
        }
    }
}

fn main() {
    {
        const P0: f64 = 0.0;
        const N0: f64 = -0.0;
        // Positive 0 and negative 0 are equal.
        assert_eq!(P0, N0);
        let p0: u64 = unsafe { std::mem::transmute(P0) };
        let n0: u64 = unsafe { std::mem::transmute(N0) };
        // Positive 0 and negative 0 have different structure.
        assert!(p0 != n0);

        // Constants + Pattern matching should let me compare something structurally.
        // Therefore, these should be different.
        match N0 {
            P0 => println!("equal"),
            _ => println!("not equal"),
        }
    }
    {
        const P0: Float = Float { value: 0 };
        const N0: Float = Float { value: 9223372036854775808 };
        // Positive 0 and negative 0 are equal.
        assert_eq!(P0, N0);
        let p0: u64 = unsafe { std::mem::transmute(P0) };
        let n0: u64 = unsafe { std::mem::transmute(N0) };
        // Positive 0 and negative 0 have different structure.
        assert!(p0 != n0);

        // Constants + Pattern matching should let me compare something structurally.
        // Therefore, these should be different.
        match N0 {
            P0 => println!("equal"),
            _ => println!("not equal"),
        }
    }
}

Output:

equal
not equal

So, my fundamental question is, why can’t I implement a float that behaves like a float? And if pattern matching should be structural, why does it currently use PartialEq for floats, rather than bit-wise equality?

And I think matching something structurally is fundamentally different than doing a value match.

Here is an idea, inspired by the above statement.

Introduce a new kind of expression, using the keyword switch. Switch expression does value matching and runs ==. Match expression does structural matching and keeps the current semantics.

I think this is a good example, but I don't take quite the same lessons from it. That is, if I write "foo1 == foo2", I feel like I am explicitly requesting equality against the foo values, whereas when I write foo1.a == foo2.a && foo1.b == foo2.b, I am opting out of that. In the same way, as @ahmedcharles wrote, I feel like matching against Foo { a: A, b: B } feels quite different from matching against F where const F = Foo(A, B). (Now, obviously, any PartialEq impl which differs from straight-up structural equality has the potential to be surprising, but that seems like an orthogonal issue. Sometimes it's the right thing to do nonetheless.)

Private fields would not bother me except that I feel like the current treatment is bypassing the abstraction that the user has constructed (PartialEq). (Whereas calling new is not doing so.) But this seems to come back to how you view uses of FOO as a pattern -- to me, they are asking "is this value equal to the constant FOO"?

I agree with this.

But not with this. That is, I don't think there is much about const that makes it particularly transparent (apart from the pattern matching semantics we are discussing here). But perhaps that is all you mean, in which case I do agree that this is what the code is currently doing (though not necessarily with the proposed addition of abstract const).

In any case, I think the next step is clear: someone (presumably me :slight_smile: should do a crater run using the intersection semantics, just to get a feeling for what's happening "in the wild".

I feel like the intersection solution - banning matching on constants of non-primitive types - would be the right solution.

OTOH, public fields in const are completely transparent, as you can easily access them in array lengths and enum discriminants.

I would also prefer to use bitwise equality for float matching, but the ship might have sailed for that one.

Right. While to me, it's asking "does this value match the pattern FOO"? Again, the Glasgow Haskell folks have implemented this very same capability (much-extended), and they call it a feature: pattern synonyms. I've always assumed our consts were in the same spirit.

And I mean... why would we expect something used as a pattern to be interpreted as anything other than a pattern? I can see how someone who has not had much exposure to pattern matching might fall back on intuitions about == for a time, given that the two often coincide for simple cases, but I don't see why importing some of that lack of clarity into the language itself would improve anything.

I could see the appeal of a cautious, conservative, least common denominator approach of limiting it to primitive types where match and == are the same, so nobody's intuitions can be violated (not even the wrong ones)... in a pre-1.0 world, maybe. Making a breaking change for it after promising not to break things any more does add some emphasis to the "over" in "overabundance of caution", though, in my humble opinion.

It's not, although the most obvious impact of opacity vs. transparency hadn't yet become apparent to me when I last wrote. Which is: (outside of the given scope), does the typechecker know that this (type/constant) is equal to what is written on its RHS? Transparent: yes, opaque: no. If a module m exports a transparent type T = bool;, then clients may say let foo: m::T = true;. If it's an opaque abstract type T = bool;, however, they cannot. Likewise, if the module exports a transparent const C: usize = 9;, then clients may do let arr: [i32; m::C] = [8; 9];. Whereas if it were an opaque abstract const, then they couldn't. So it's easy to see that our existing consts are quite definitely transparent. This is also what I was talking about with respect to associated consts (and likewise types) in generic code (and also just plain type parameters of generic functions): if the RHS is not available, then it can only be treated as opaque, exactly the same as an abstract type (resp. const).

The behavior of constants in pattern matches is just an extension of this: if the typechecker knows its RHS, then it can use it to check for exhaustiveness. Otherwise (if it's opaque), then it can't assume anything.

(I'm not necessarily advocating for adding abstract const to the language, but it doesn't have to exist in rustc in order for it to exist in our heads, and be useful there.)

1 Like

As a small side note, I have run into cases where I’ve matched all possible u8 values, and still required an extra _ arm.