How to handle pattern matching on constants

#27

Constants usually behave as if they were simply inlined at the use site (the reference even describes their functionality that way). It would seem odd if manually inlining the constant changed the behavior of a match (except perhaps by one version being forbidden), and strikes me as something people would screw up when refactoring.

#28

That’s trivially not true, because it’s possible to have a constant with a value specified for a private field used outside the module which has access to the private field. Hence, if the code were inserted verbatim, it would fail to compile due to privacy errors.

#29

(I did say one version might be forbidden.) But yes, I should have been more precise.

If both forms are valid Rust and all the tokens in the definition refer to the same things in both places, then I think they should behave the same way. Put another way: if the constant and its definition are equivalent when used as expressions, they should also be equivalent when used as patterns (if both are accepted).

(These are the same sorts of caveats that occur with macros, which really do expand to their definitions, but not quite in a literal text-copy-paste way.)

This is primarily because I expect using a constant as an expression to be more common, and I think it would be bad if you could think of a constant as a literal value almost everywhere, but in patterns doing so would give you incorrect behavior, often without even a warning from the compiler.

1 Like
#30

Edit: after rereading the OP, I support tweaks to the current behavior. Fix the holes. Don’t attempt to base match on PartialEq, because that’s not powerful enough to do everything, and doing it sometimes is inconsistent.


I’ve been trying to think of a behavior that avoids leaking private members, gives consistent behavior with Weird (which pretty much has to be structural matching for everything), and isn’t stupidly complicated:

PatternMatch(P: Pattern, X: Value) -> bool:

  1. If P ends in @var, check if the rest of the P matches with X. If so, the pattern matches and X is bound to var.
  2. If P is _, P always matches X.
  3. If P is a single variable, P always matches X. Also bind X to the variable P (the complexities of borrow checking this are identical to the current implementation).
  4. If P is a different type than X, throw a type error. Type inference is performed, but structural pattern matching between different types makes no sense.
  5. If their type is a primitive, pattern matching is done by direct comparison.
  6. If their type has any private member variables in its definition, or is a generic (and thus we don’t know one way or the other), throw an access error.
  7. If their type is an enum, they do not match if their discriminants are different. If their discriminants are the same, they match if their variant is C-like, or if PatternMatch(P’, X’) returns true, where P’ and X’ are the tuple or struct after the discriminant.
  8. If their type is a tuple or struct, they match if, for every member M, PatternMatch(P.M, X.M) returns true.

I guess I can summarize this as: structural equality is identical to equality for primitives, and is identical to destructuring and comparing all the member variables for structural equality for compound types, including the restrictions on destructuring (privacy, parametricity, etc).

#31

Thanks @petrochenkov. It’s interesting that all these uses are fairly simple data, although I can’t imagine how to categorise these.

#32

(To be honest, I’m surprised that you’re surprised! My understanding of / intuition regarding how const should behave has been in alignment with what the current implementation does for some time. I think it was in the “oh noes, patterns are interpreted as matching against existing consts or enum variants if they’re in scope, and as introducing a new binding otherwise” and later the "split static into const and static" debates where these crystallized for me.)

The only one of the described issues that really bothers me is the interaction with associated constants. I think forbidding the use of associated constants in patterns would be a prudent way to address it. Or more generally and precisely: only allow use of constants in patterns when the value of the constant is known at that location.

(I don’t really like const fns either, so I’m not sure which part of that interaction I would identify as being the problematic one, if any.)

I think PartialEq (or Eq) is a red herring here. Consider a struct (let’s call it Foo) whose fields are all public (let’s call them a and b), and which has a manual PartialEq impl which is not the same as the one which would be derived (i.e., the structural one). As a client, I can write either foo1 == foo2 or foo1.a == foo2.a && foo1.b == foo2.b, and they will have different behavior, which may surprise me. Notably, this is the same issue as described for consts, without involving consts. I think it is fundamentally an API (rather than necessarily language) design flaw to expose multiple, interchangeable, but inconsistent ways of performing equality tests on a type. I think the appropriate remedy here is to lint against (a) defining a non-structural (or any manual) PartialEq impl on a type with all-public (all-PartialEq, or all-matchable) fields, and against (b) exposing both a non-structural/manual PartialEq impl as well as consts for a type with private fields.

Matching against consts of a type with private fields also doesn’t bother me (though if there’s also a PartialEq impl, then that should be a lint on the provider’s side, as described above), any more than having a smart constructor fn for a type with private fields bothers me. I think patterns should be thought of as just the mirror image of (“dual to”) literals: both are language primitives, and more fundamental than things like PartialEq which can be built on top in user-written code. Construction and deconstruction.

I think transparent vs. opaque constants is an interesting question though, even if it’s clear that the established semantics of the consts currently in the language is transparency. An intuitive way to think about it is, “should the value of the constant appear in the generated documentation?”. There’s an important parallel with type aliases, both of which are mechanisms for abstracting over (or, at least, making shorthands for) “compile-time things”. I believe the distinction between transparent and opaque constants corresponds directly to the distinction between type and abstract type. In other words, both our current type and const are transparent, and just as with abstract type, we could also introduce abstract const, which would be an opaque constant. For MLers (which I am only barely), this corresponds to the distinction between transparent and opaque types in signatures (sig type Foo end vs. sig type Foo = Int end), except that in Rust we don’t write module signatures (“types for modules”), instead writing pub (and potentially abstract) modifiers inline in module definitions.

Finally, I think our consts also correspond to a restricted subset (no bindings, only bidirectional) of the pattern synonyms which have been semi-recently added to GHC.

2 Likes
#33

To connect some dots in and refine the above:

  • W.r.t. associated constants, I think “a constant whose value is not known at a particular location” (following the duality between existential and universal quantification - between abstract and generic types) is just the same thing as “a const which is opaque as seen from that location”. Which should therefore follow whatever policy we choose to have for opaque constants in general (either "can’t match", or “can match, but only non-exhaustively”, which seems better). So like associated types, they would be transparent when used concretely/monomorphically (the impl to use is uniquely determined), and opaque when used generically (abstracted over impls). (And in both cases, we could allow writing abstract type or abstract const in the impl to make it opaque even if used concretely/monomorphically.)

  • W.r.t. a lint for “exposing both a non-structural/manual PartialEq impl as well as consts for a type with private fields”, of course this would only be for transparent consts (which are the only ones we have, right now). (Edit: actually, this only holds if our matching-on-opaque-consts policy is “can’t”.)

#34

Rust’s type system is based on algebraic data types, which results in being able to take ‘unit’, ‘product’ and ‘sum’ and construct any other time. This implies that fundamentally, a structural equivalence should be done at this level, in theory. If structurally equivalent types aren’t actually equivalent, that has to be due to a ‘user’ defined operation which understands more information and makes the distinction. The same is true for the opposite, structurally nonequivalent types that are actually equivalent.

The following program, while somewhat silly, does demonstrate that Rust is inconsistent in how it applies pattern matching, when considering primitive types vs user defined types. Fundamentally, a f64 is a u64 with different operations. Rusts documentation suggests that the reason for this is because f64 and u64 have different impls of PartialEq (https://doc.rust-lang.org/stable/std/primitive.f64.html).

#[derive(Debug)]
struct Float {
    value: u64,
}

impl std::cmp::PartialEq for Float {
    fn eq(&self, other: &Float) -> bool {
        unsafe {
            let a: f64 = std::mem::transmute(self.value);
            let b: f64 = std::mem::transmute(other.value);
            a == b
        }
    }
}

fn main() {
    {
        const P0: f64 = 0.0;
        const N0: f64 = -0.0;
        // Positive 0 and negative 0 are equal.
        assert_eq!(P0, N0);
        let p0: u64 = unsafe { std::mem::transmute(P0) };
        let n0: u64 = unsafe { std::mem::transmute(N0) };
        // Positive 0 and negative 0 have different structure.
        assert!(p0 != n0);

        // Constants + Pattern matching should let me compare something structurally.
        // Therefore, these should be different.
        match N0 {
            P0 => println!("equal"),
            _ => println!("not equal"),
        }
    }
    {
        const P0: Float = Float { value: 0 };
        const N0: Float = Float { value: 9223372036854775808 };
        // Positive 0 and negative 0 are equal.
        assert_eq!(P0, N0);
        let p0: u64 = unsafe { std::mem::transmute(P0) };
        let n0: u64 = unsafe { std::mem::transmute(N0) };
        // Positive 0 and negative 0 have different structure.
        assert!(p0 != n0);

        // Constants + Pattern matching should let me compare something structurally.
        // Therefore, these should be different.
        match N0 {
            P0 => println!("equal"),
            _ => println!("not equal"),
        }
    }
}

Output:

equal
not equal

So, my fundamental question is, why can’t I implement a float that behaves like a float? And if pattern matching should be structural, why does it currently use PartialEq for floats, rather than bit-wise equality?

#35

And I think matching something structurally is fundamentally different than doing a value match.

Here is an idea, inspired by the above statement.

Introduce a new kind of expression, using the keyword switch. Switch expression does value matching and runs ==. Match expression does structural matching and keeps the current semantics.

#36

I think this is a good example, but I don’t take quite the same lessons from it. That is, if I write “foo1 == foo2”, I feel like I am explicitly requesting equality against the foo values, whereas when I write foo1.a == foo2.a && foo1.b == foo2.b, I am opting out of that. In the same way, as @ahmedcharles wrote, I feel like matching against Foo { a: A, b: B } feels quite different from matching against F where const F = Foo(A, B). (Now, obviously, any PartialEq impl which differs from straight-up structural equality has the potential to be surprising, but that seems like an orthogonal issue. Sometimes it’s the right thing to do nonetheless.)

Private fields would not bother me except that I feel like the current treatment is bypassing the abstraction that the user has constructed (PartialEq). (Whereas calling new is not doing so.) But this seems to come back to how you view uses of FOO as a pattern – to me, they are asking "is this value equal to the constant FOO"?

I agree with this.

But not with this. That is, I don’t think there is much about const that makes it particularly transparent (apart from the pattern matching semantics we are discussing here). But perhaps that is all you mean, in which case I do agree that this is what the code is currently doing (though not necessarily with the proposed addition of abstract const).

In any case, I think the next step is clear: someone (presumably me :slight_smile: should do a crater run using the intersection semantics, just to get a feeling for what’s happening “in the wild”.

#37

I feel like the intersection solution - banning matching on constants of non-primitive types - would be the right solution.

OTOH, public fields in const are completely transparent, as you can easily access them in array lengths and enum discriminants.

#38

I would also prefer to use bitwise equality for float matching, but the ship might have sailed for that one.

#39

Right. While to me, it’s asking "does this value match the pattern FOO"? Again, the Glasgow Haskell folks have implemented this very same capability (much-extended), and they call it a feature: pattern synonyms. I’ve always assumed our consts were in the same spirit.

And I mean… why would we expect something used as a pattern to be interpreted as anything other than a pattern? I can see how someone who has not had much exposure to pattern matching might fall back on intuitions about == for a time, given that the two often coincide for simple cases, but I don’t see why importing some of that lack of clarity into the language itself would improve anything.

I could see the appeal of a cautious, conservative, least common denominator approach of limiting it to primitive types where match and == are the same, so nobody’s intuitions can be violated (not even the wrong ones)… in a pre-1.0 world, maybe. Making a breaking change for it after promising not to break things any more does add some emphasis to the “over” in “overabundance of caution”, though, in my humble opinion.

It’s not, although the most obvious impact of opacity vs. transparency hadn’t yet become apparent to me when I last wrote. Which is: (outside of the given scope), does the typechecker know that this (type/constant) is equal to what is written on its RHS? Transparent: yes, opaque: no. If a module m exports a transparent type T = bool;, then clients may say let foo: m::T = true;. If it’s an opaque abstract type T = bool;, however, they cannot. Likewise, if the module exports a transparent const C: usize = 9;, then clients may do let arr: [i32; m::C] = [8; 9];. Whereas if it were an opaque abstract const, then they couldn’t. So it’s easy to see that our existing consts are quite definitely transparent. This is also what I was talking about with respect to associated consts (and likewise types) in generic code (and also just plain type parameters of generic functions): if the RHS is not available, then it can only be treated as opaque, exactly the same as an abstract type (resp. const).

The behavior of constants in pattern matches is just an extension of this: if the typechecker knows its RHS, then it can use it to check for exhaustiveness. Otherwise (if it’s opaque), then it can’t assume anything.

(I’m not necessarily advocating for adding abstract const to the language, but it doesn’t have to exist in rustc in order for it to exist in our heads, and be useful there.)

#40

As a small side note, I have run into cases where I’ve matched all possible u8 values, and still required an extra _ arm.

#41

@glaebhoerl Does Haskell allow pattern synonyms to be used as variables?

Anyways, given the following pattern, Some(<>) and assuming Some is the enum variant of Option, whatever replaces <> in the code can be interpreted in two ways:

  1. It can be a previously defined value, in which case PartialEq is used to determine it’s value, at least if it’s a primitive type (my previous code demonstrates this).
  2. It can introduce a new binding which will refer to the value inside the Some, if it matches.

If const introduces a value, then it should consistently behave that way, meaning, the behavior of primitive types today. If const actually did introduce a pattern synonym, then it would have to allow for the second case above as well, meaning I could do something like this:

const C: Option<(usize, usize)> = Some(1, x);
let a = match Some((1, 2)) {
    C => x,
    _ => 0,
}

But, this doesn’t work and if it did, C couldn’t be used as a variable. And it would have the subjectively unfortunate side effect that x would be introduced without any way to know at the point of the pattern match itself.

Overall, the goal of allowing extensible values within patterns seems valuable, since it allows user defined types to achieve the same syntax and semantics that primitive types enjoy today.

Unfortunately, I don’t see any benefits of allowing const which declares values (as opposed to pattern synonyms) to act like a structural match rather than a value match, when the behavior would be restricted when compared to what I’d want from pattern synonyms while also causing an inconsistency when compared to primitive types.

That said, pattern synonyms do sound interesting and perhaps they deserve their own place in the language?

#42

See here: https://github.com/rust-lang/rust/issues/12483

1 Like
#43

I feel like this could be resolved by allowing pattern macros, which would let us do something like

define_match_expander! C {
     ($(x:pat) ) => Some(1, $(x)) 
}
 
let a = match Some((1, 2)) {
    C!(x) => x,
    _ => 0, 
}

I’m drawing from Racket’s define-match-expander here, since it’s quite nice. The article which I found the best use of match-expanders was Matt Might’s on red-black trees, where he has some patterns which match on black nodes or red nodes only. Ex.

(define-match-expander B
  (syntax-rules ()
    [(_)              (or (T _ 'B _ _ _ _)
                          (L _))]
    [(_ cmp)          (or (T cmp 'B _ _ _ _)
                          (L cmp))]
    [(_ l r)          (T _ 'B l _ _ r)]
    [(_ l k v r)      (T _ 'B l k v r)]
    [(_ cmp l k v r)  (T cmp 'B l k v r)]))

So just asking for a node to match B!() would say it’s either a interior node painted black, or a leaf node. It makes the actual presentation quite nice.

So, I am in favour of letting Rust’s match grow past the simple case statement in ML and Haskell, and into something more powerful and expressive like Racket’s match.

#44

OK, as promised, I whipped up some crater numbers. The result: 14 crates (0.6%) use constants that fall outside the “intersection” semantics (iow, constants whose types are not ints, floats, or tuples thereof). I plan to analyze the uses of those 14 a bit more but have not had time.

Here is the report and the branch that I tested. The interesting thing in this report is the total regressions – this measurement was done in such a way that constants which appeared in dependencies generated warnings, but constants appearing in the “main crate” being tested reported errors, so we can accurately assess the total number of crates that will need to be changed.

#45

I find it interesting that none of the uses if pattern matching in the compiler required structural matching on the constant, since you transformed them all to use == which runs PartialEq. Granted, this is only anecdotal.

#46

It looks like all the constants from crater and from rustc are newtypes of primitive types a la struct MyId(u32).
E.g. with the intersection semantics if someone wants to achieve some additional type safety by using a "domain specific u32" through a newtype, they will lose the ability to match on it.