How to handle pattern matching on constants

#34

Rust’s type system is based on algebraic data types, which results in being able to take ‘unit’, ‘product’ and ‘sum’ and construct any other time. This implies that fundamentally, a structural equivalence should be done at this level, in theory. If structurally equivalent types aren’t actually equivalent, that has to be due to a ‘user’ defined operation which understands more information and makes the distinction. The same is true for the opposite, structurally nonequivalent types that are actually equivalent.

The following program, while somewhat silly, does demonstrate that Rust is inconsistent in how it applies pattern matching, when considering primitive types vs user defined types. Fundamentally, a f64 is a u64 with different operations. Rusts documentation suggests that the reason for this is because f64 and u64 have different impls of PartialEq (https://doc.rust-lang.org/stable/std/primitive.f64.html).

#[derive(Debug)]
struct Float {
    value: u64,
}

impl std::cmp::PartialEq for Float {
    fn eq(&self, other: &Float) -> bool {
        unsafe {
            let a: f64 = std::mem::transmute(self.value);
            let b: f64 = std::mem::transmute(other.value);
            a == b
        }
    }
}

fn main() {
    {
        const P0: f64 = 0.0;
        const N0: f64 = -0.0;
        // Positive 0 and negative 0 are equal.
        assert_eq!(P0, N0);
        let p0: u64 = unsafe { std::mem::transmute(P0) };
        let n0: u64 = unsafe { std::mem::transmute(N0) };
        // Positive 0 and negative 0 have different structure.
        assert!(p0 != n0);

        // Constants + Pattern matching should let me compare something structurally.
        // Therefore, these should be different.
        match N0 {
            P0 => println!("equal"),
            _ => println!("not equal"),
        }
    }
    {
        const P0: Float = Float { value: 0 };
        const N0: Float = Float { value: 9223372036854775808 };
        // Positive 0 and negative 0 are equal.
        assert_eq!(P0, N0);
        let p0: u64 = unsafe { std::mem::transmute(P0) };
        let n0: u64 = unsafe { std::mem::transmute(N0) };
        // Positive 0 and negative 0 have different structure.
        assert!(p0 != n0);

        // Constants + Pattern matching should let me compare something structurally.
        // Therefore, these should be different.
        match N0 {
            P0 => println!("equal"),
            _ => println!("not equal"),
        }
    }
}

Output:

equal
not equal

So, my fundamental question is, why can’t I implement a float that behaves like a float? And if pattern matching should be structural, why does it currently use PartialEq for floats, rather than bit-wise equality?

#35

And I think matching something structurally is fundamentally different than doing a value match.

Here is an idea, inspired by the above statement.

Introduce a new kind of expression, using the keyword switch. Switch expression does value matching and runs ==. Match expression does structural matching and keeps the current semantics.

#36

I think this is a good example, but I don’t take quite the same lessons from it. That is, if I write “foo1 == foo2”, I feel like I am explicitly requesting equality against the foo values, whereas when I write foo1.a == foo2.a && foo1.b == foo2.b, I am opting out of that. In the same way, as @ahmedcharles wrote, I feel like matching against Foo { a: A, b: B } feels quite different from matching against F where const F = Foo(A, B). (Now, obviously, any PartialEq impl which differs from straight-up structural equality has the potential to be surprising, but that seems like an orthogonal issue. Sometimes it’s the right thing to do nonetheless.)

Private fields would not bother me except that I feel like the current treatment is bypassing the abstraction that the user has constructed (PartialEq). (Whereas calling new is not doing so.) But this seems to come back to how you view uses of FOO as a pattern – to me, they are asking "is this value equal to the constant FOO"?

I agree with this.

But not with this. That is, I don’t think there is much about const that makes it particularly transparent (apart from the pattern matching semantics we are discussing here). But perhaps that is all you mean, in which case I do agree that this is what the code is currently doing (though not necessarily with the proposed addition of abstract const).

In any case, I think the next step is clear: someone (presumably me :slight_smile: should do a crater run using the intersection semantics, just to get a feeling for what’s happening “in the wild”.

#37

I feel like the intersection solution - banning matching on constants of non-primitive types - would be the right solution.

OTOH, public fields in const are completely transparent, as you can easily access them in array lengths and enum discriminants.

#38

I would also prefer to use bitwise equality for float matching, but the ship might have sailed for that one.

#39

Right. While to me, it’s asking "does this value match the pattern FOO"? Again, the Glasgow Haskell folks have implemented this very same capability (much-extended), and they call it a feature: pattern synonyms. I’ve always assumed our consts were in the same spirit.

And I mean… why would we expect something used as a pattern to be interpreted as anything other than a pattern? I can see how someone who has not had much exposure to pattern matching might fall back on intuitions about == for a time, given that the two often coincide for simple cases, but I don’t see why importing some of that lack of clarity into the language itself would improve anything.

I could see the appeal of a cautious, conservative, least common denominator approach of limiting it to primitive types where match and == are the same, so nobody’s intuitions can be violated (not even the wrong ones)… in a pre-1.0 world, maybe. Making a breaking change for it after promising not to break things any more does add some emphasis to the “over” in “overabundance of caution”, though, in my humble opinion.

It’s not, although the most obvious impact of opacity vs. transparency hadn’t yet become apparent to me when I last wrote. Which is: (outside of the given scope), does the typechecker know that this (type/constant) is equal to what is written on its RHS? Transparent: yes, opaque: no. If a module m exports a transparent type T = bool;, then clients may say let foo: m::T = true;. If it’s an opaque abstract type T = bool;, however, they cannot. Likewise, if the module exports a transparent const C: usize = 9;, then clients may do let arr: [i32; m::C] = [8; 9];. Whereas if it were an opaque abstract const, then they couldn’t. So it’s easy to see that our existing consts are quite definitely transparent. This is also what I was talking about with respect to associated consts (and likewise types) in generic code (and also just plain type parameters of generic functions): if the RHS is not available, then it can only be treated as opaque, exactly the same as an abstract type (resp. const).

The behavior of constants in pattern matches is just an extension of this: if the typechecker knows its RHS, then it can use it to check for exhaustiveness. Otherwise (if it’s opaque), then it can’t assume anything.

(I’m not necessarily advocating for adding abstract const to the language, but it doesn’t have to exist in rustc in order for it to exist in our heads, and be useful there.)

#40

As a small side note, I have run into cases where I’ve matched all possible u8 values, and still required an extra _ arm.

#41

@glaebhoerl Does Haskell allow pattern synonyms to be used as variables?

Anyways, given the following pattern, Some(<>) and assuming Some is the enum variant of Option, whatever replaces <> in the code can be interpreted in two ways:

  1. It can be a previously defined value, in which case PartialEq is used to determine it’s value, at least if it’s a primitive type (my previous code demonstrates this).
  2. It can introduce a new binding which will refer to the value inside the Some, if it matches.

If const introduces a value, then it should consistently behave that way, meaning, the behavior of primitive types today. If const actually did introduce a pattern synonym, then it would have to allow for the second case above as well, meaning I could do something like this:

const C: Option<(usize, usize)> = Some(1, x);
let a = match Some((1, 2)) {
    C => x,
    _ => 0,
}

But, this doesn’t work and if it did, C couldn’t be used as a variable. And it would have the subjectively unfortunate side effect that x would be introduced without any way to know at the point of the pattern match itself.

Overall, the goal of allowing extensible values within patterns seems valuable, since it allows user defined types to achieve the same syntax and semantics that primitive types enjoy today.

Unfortunately, I don’t see any benefits of allowing const which declares values (as opposed to pattern synonyms) to act like a structural match rather than a value match, when the behavior would be restricted when compared to what I’d want from pattern synonyms while also causing an inconsistency when compared to primitive types.

That said, pattern synonyms do sound interesting and perhaps they deserve their own place in the language?

#42

See here: https://github.com/rust-lang/rust/issues/12483

1 Like
#43

I feel like this could be resolved by allowing pattern macros, which would let us do something like

define_match_expander! C {
     ($(x:pat) ) => Some(1, $(x)) 
}
 
let a = match Some((1, 2)) {
    C!(x) => x,
    _ => 0, 
}

I’m drawing from Racket’s define-match-expander here, since it’s quite nice. The article which I found the best use of match-expanders was Matt Might’s on red-black trees, where he has some patterns which match on black nodes or red nodes only. Ex.

(define-match-expander B
  (syntax-rules ()
    [(_)              (or (T _ 'B _ _ _ _)
                          (L _))]
    [(_ cmp)          (or (T cmp 'B _ _ _ _)
                          (L cmp))]
    [(_ l r)          (T _ 'B l _ _ r)]
    [(_ l k v r)      (T _ 'B l k v r)]
    [(_ cmp l k v r)  (T cmp 'B l k v r)]))

So just asking for a node to match B!() would say it’s either a interior node painted black, or a leaf node. It makes the actual presentation quite nice.

So, I am in favour of letting Rust’s match grow past the simple case statement in ML and Haskell, and into something more powerful and expressive like Racket’s match.

#44

OK, as promised, I whipped up some crater numbers. The result: 14 crates (0.6%) use constants that fall outside the “intersection” semantics (iow, constants whose types are not ints, floats, or tuples thereof). I plan to analyze the uses of those 14 a bit more but have not had time.

Here is the report and the branch that I tested. The interesting thing in this report is the total regressions – this measurement was done in such a way that constants which appeared in dependencies generated warnings, but constants appearing in the “main crate” being tested reported errors, so we can accurately assess the total number of crates that will need to be changed.

#45

I find it interesting that none of the uses if pattern matching in the compiler required structural matching on the constant, since you transformed them all to use == which runs PartialEq. Granted, this is only anecdotal.

#46

It looks like all the constants from crater and from rustc are newtypes of primitive types a la struct MyId(u32).
E.g. with the intersection semantics if someone wants to achieve some additional type safety by using a "domain specific u32" through a newtype, they will lose the ability to match on it.

#47

Yes. To be clear, I am not a fan of the intersection semantics as our “long term plan”. But I would ideally like to make a “positive” choice about how to go beyond the intersection semantics. That said, the fact that floats are currently defined to use PartialEq (e.g., matching on 0 also matches on -0) is pretty interesting. I probably should have excluded it from the “intersection”.

#48

I think const makes a pretty poor pattern synonym. For example, you can’t handle binding. Also, the intent is quite unclear. Constants are primarily used as values, but they also double as a pattern synonym – which is like a value, but different, for the reasons I outlined in the OP (it behaves differently in various cases, exposes details that would not otherwise be exposed, etc).

Looking at the wiki page on Haskell’s pattern synonyms, they seem very different from consts. Also (unsurprisingly) they seem to address the shortcomings I listed above. For example, the intention is quite clear: you declare them with a pattern keyword:

   pattern Arrow t1 t2 = App "->" [t1, t2]

Similarly, as you can include bindings (the t1 and t2 above). Also, when you choose to make use of a pattern synonym, it seems like you must (or can) be very explicit about importing it:

   module Foo (pattern Arrow) where ...

Now, there is SOME intersection. For example, you can define pattern synonyms that also function as constructor functions or as expressions (and, presumably, thus something like a constant today, if the constructor takes no arguments). But in all these cases, you prefix the definition with pattern, making it very clear that you are defining something that plays multiple roles.

So I guess my feeling is: pattern synonyms seem like they could be useful! I’d consider implementing them someday (perhaps as an alternative to overridable patterns). But if we want pattern synonyms, let’s do it right, and not abuse poor constants.

I don’t know what this really means. I think of patterns as a convenient notation for specifying “test and extract value” operations. So, Some(<pattern>) compiles to “test that the variant is Some and extract the value”. We then recursively apply <pattern> to that value. Sometimes there are no values to extract: if you match a nullary enum variant like None, it just means "test that the variant is None". So, along these lines, I simply see the meaning of a const pattern like FOO to be “test that the value is PartialEq to FOO and extract no values”. That is to say, this is not “treating constants as something other than a pattern”, it’s “defining what it means for a const to appear in a pattern”.

It is worth reiterating that floats behave exactly this way today. That is, a value of -0 will match a pattern of 0. Similarly, we give errors if you match on NaN – any of the various possible NaN interpretations – presumably because the arm would never match. But under the current interpretation of constants in patterns, there is no way to make a user-defined type that acts this way! I could write a struct Float { mantissa, exponent } that was identical to f64 in every way from the point of view of expressions, but matching on constants like const ZERO: Float = ... and const NEG_ZERO: Float = ... would behave differently, as well as matching on const NAN: Float.

So, to clarify: do you think that matching on NaN should succeed if it happens to be the same bitpattern for NaN, but fail if it is not? Now, it’s an error today, so you can dodge the question, but then you have to wonder WHY it is an error – that only makes sense if you assume a PartialEq semantics. And, of course, why aren’t user-defined abstractions afforded the same courtesy that float is.

2 Likes
#49

I don’t like how “float matching” works - in fact, we even have a lint that warns on its unintuitiveness. Patterns are supposed to be doing structural matching (OTOH I don’t really care about constant patterns being allowed).

#50

In general, equality isn’t very useful on floats, since you don’t get precise results — so I’m inclined to agree that having them in a match is not necessarily a good idea, and I think this is why one might want a lint. But I think that’s a separate sort of concern, and applies equally to == expressions.

I feel that all variants (no pun intended) are doing structural matching – the question is how deeply the structure extends. Under the PartialEq interpretation, the structure being matched extends precisely as deep as the structure that appears in the match itself. Under the current interpretation, the structure extends all the way down until a binding (or a float, I guess) is encountered.

#51

In general, equality isn’t very useful on floats, since you don’t get precise results — so I’m inclined to agree that having them in a match is not necessarily a good idea, and I think this is why one might want a lint. But I think that’s a separate sort of concern, and applies equally to == expressions.

Of course, IEEE754 equality is rarely a good idea, because of its fragility, but bitwise equality does make sense in some situations (e.g. state machine replication) - not that I can imagine a use for matching against a constant float, but I won’t be particularly surprised to be wrong.

I feel that all variants (no pun intended) are doing structural matching – the question is how deeply the structure extends. Under the PartialEq interpretation, the structure being matched extends precisely as deep as the structure that appears in the match itself. Under the current interpretation, the structure extends all the way down until a binding (or a float, I guess) is encountered.

My feeling is that PartialEq and pattern matching are supposed to be distinct things.

#52

I agree. I’d like to see this done right and I’m glad to see you’re also interested (long term).

This is exactly how I feel pattern matching ‘should’ work (though, I suspect that was clear already).

#53

Opened an RFC arguing for the intersection semantics: https://github.com/rust-lang/rfcs/pull/1445