How to handle pattern matching on constants

So, in the process of doing work on MIR, I’ve also encountered some surprises (to me, at least) in how patterns are currently handled by rustc, and in particular the way that pattern matching treats constants. I am not convinced that the current behavior is what we want, though it has its merits to be sure, and I think it is something we ought to discuss and consider more broadly.

The immediate concern: matching on constants

The thing which I am most prominently worried about is the semantics of matching against a constant. Consider a program like this one:

struct SomeType {
    a: u32,
    b: u32,
}

const SOME_CONSTANT: SomeType = SomeType { a: 22+22, b: 44+44 };

fn test(v: SomeType) {
    match v {
        SOME_CONSTANT => println!("Yes"),
        _ => println!("No"),
    }
}

The question at hand is what do we expect this match to do, precisely? Personally, I expect it to print Yes if v == SOME_CONSTANT, and No otherwise. In fact, since SomeType does not implement PartialEq, I expect this code not to compile. But it does! So what does it do?

Today what we do is to first evaluate SOME_CONSTANT to a value. So we would reduce the expression:

const SOME_CONSTANT: SomeType = SomeType { a: 22+22, b: 44+44 };

to the value SomeType { a: 44, b: 88 }. We then expand the pattern SOME_CONSTANT as though you had typed this value in place (well, almost as though, read on for some complications around privacy). Thus the match statement above is equivalent to:

match v {
    SomeType { a: 44, b: 88 } => println!(Yes),
    _ => println!("No"),
}

So, effectively, patterns define a kind of “structural equality” that is distinct from the == operator. It is not related to the PartialEq trait, or really any user-defined code, but rather tied directly to the structure of the type being matched.

Pros and cons of the current behavior

The current behavior has both pros and cons, from my point of view. Let’s look at some of these.

Pros

Better optimization. One of the biggest “pros” is that it can potentially enable nice optimization. For example, if I have:

struct Value { x: u32 }
const V1: Value = Value { x: 0 };
const V2: Value = Value { x: 1 };
const V3: Value = Value { x: 2 };
const V4: Value = Value { x: 3 };
const V5: Value = Value { x: 4 };

and I have a match pattern like:

match v {
    V1 => ..., 
    ...,
    V5 => ...,
}

Then because pattern matching is always a process of structurally extracting values, we can compile this to code that reads the field x (which is a u32) and does an appropriate switch on that value. Some of the alternatives we’ll see below would potentially force a more conservative compilation strategy.

Better exhautiveness and dead-code checking. Similarly, we can do more thorough exhaustiveness and dead-code checking. So for example if I have a struct like:

struct Value { field: bool }
const TRUE: Value { field: true };
const FALSE: Value { field: false };

and a match pattern like:

match v { TRUE => .., FALSE => .. }

then we can prove that this match is exhaustive. Similarly, we can prove that the following match contains dead-code:

const A: Value { field: true };
match v {
    TRUE => ...,
    A => ...,
}

Again, some of the alternatives might not allow this. (But note the cons, which also raise the question of exhaustiveness checking.)

Nullary variants and constants are (more) equivalent. Currently, there is a sort of equivalence between enum variants and constants, at least with respect to pattern matching. Consider a C-like enum:

enum Modes {
    Happy = 22,
    Shiny = 44,
    People = 66,
    Holding = 88,
    Hands = 110,
}

const C: Modes = Modes::Happy;

Now if I match against Modes::Happy, that is matching against an enum variant, and under all the proposals I will discuss below, it will check the actual variant of the value being matched (regardless of whether Modes implements PartialEq, which it does not here). On the other hand, if matching against C were to require a PartialEq impl, then it would be illegal. Therefore matching against an enum variant is distinct from matching against a constant.

Backwards compatible. Not to be overlooked, the current behavior is, well, the current behavior. There are probably programs relying on it, so we would want to measure and consider what will happen to those programs if we make changes. That said, the current behavior was added in July of 2014 and unfortunately never went through the RFC process; I consider changing it to fall under the "underspecific language semantics" clause of RFC 1122. But clearly we want to be careful here.

Cons

But there are some downsides to this approach as well.

Privacy-busting. The current constant expansion code does not consider privacy. In other words, constants are expanded into equivalent patterns, but those patterns may not have been something the user could have typed because of privacy rules.

Consider a module like:

mod foo {
    pub struct Foo { b: bool }
    pub const V1: Foo = Foo { b: true };
    pub const V2: Foo = Foo { b: false };
}

Note that there is an abstraction boundary here: b is a private field. But now if I wrote code from another module that matches on a value of type Foo, that abstraction boundary is pierced:

fn bar(f: x::Foo) {
    // rustc knows this is exhaustive because if expanded `V1` into equivalent
    // patterns; patterns you could not write by hand!
    match f {
        x::V1 => { }
        x::V2 => { }
    }
}

This seems obviously bad. Now if I add new fields to Foo, or perhaps change that b field to an i32, suddenly those examples stop compiling.

Encapsulation-busting. The current notion of patterns also has the potential to interact poorly with extensions like const fn. In particular, it renders the algorithm used by a const fn “transparent” the consumer. Imagine for example that I define a fn hash that, given a previous hash and a value, produces a new hash. Because I am lazy and prototyping my system, I decide for now to just ignore the new value and pass the old hash through:

const fn add_to_hash(prev_hash: u64, _value: u64) -> u64 {
    prev_hash
}

Now I have some consumers of my library and they define a few constants:

const HASH_ZERO: add_to_hash(0, 0);
const HASH_ONE: add_to_hash(0, 1);

And at some point they write a match statement:

fn process_hash(h: u64) {
    match h {
        HASH_ZERO => /* do something */,
        HASH_ONE => /* do something else */,
        _ => /* do something else again */,
}

In fact, up until very recently, this would not have compiled, because, for historical reasons, pattern matching uses a simplified version of constant evaluation, and that version did not yet support const fn. But this was not for any principled reason, it was just an artifact of the current implementation, and in fact PR #26848 just landed which lifts this restriction. (Though the example currently ICEs for some reason.)

Anyhow, once we get all those details sorted out, then what you would get for this match is a dead-code error, because the compiler can see that HASH_ZERO and HASH_ONE are the same value. (As as aside, I think this should be a lint warning and not an error, but either way is bad here.) You can make similar examples that lead to exhaustiveness errors.

This seems bad to me: it is exposing the details of my hash algorithm to clients, whereas I probably intended for clients to treat the algorithm as opaque and subject to change.

Leaky abstractions; or, two notions of equality. We already have a user-defined notion of equality, which is the PartialEq trait. But this interpretation of constants ignores that and defines a second, stricter notion of equality. This will cause problems for user-defined types that rely on PartialEq to define equality.

Consider a simple duration type:

#[derive(Copy, Clone)]
pub struct Duration {
    pub seconds: u32,
    pub minutes: u32,
}  

Let’s say that this Duration type wishes to represent a span of time, but it also wishes to preserve whether that time was expressed in seconds or minutes. In other words, 60 seconds and 1 minute are equal values, but we don’t want to normalize 60 seconds into 1 minute; perhaps because it comes from user input and we wish to keep things just as the user chose to express it.

We might implement PartialEq like so (actually the PartialEq trait is slightly different, but you get the idea):

impl PartialEq for Duration {
    fn eq(&self, other: &Duration) -> bool {
        let s1 = (self.seconds as u64) + (self.minutes as u64 * 60);
        let s2 = (other.seconds as u64) + (other.minutes as u64 * 60);
        s1 == s2
    }
}

Now imagine I have some constants:

const TWENTY_TWO_SECONDS: Duration = Duration { seconds: 22, minutes: 0 };
const ONE_MINUTE: Duration = Duration { seconds: 0, minutes: 1 };

And I write a match statement using those constants:

fn detect_some_case_or_other(d: Duration) {
    match d {
        TWENTY_TWO_SECONDS => /* do something */,
        ONE_MINUTE => /* do something else */,
        _ => /* do something else again */,
    }
}

Now this code is, in all probability, buggy. Probably I meant to use the notion of equality that Duration defined, where seconds and minutes are normalized. But that is not the behavior I will see – instead I will use a pure structural match. What’s worse, this means the code will probably work in my local tests, since I like to say "one minute", but it will break when I demo it for my customer, since she prefers to write “60 seconds”. Grrr.

(If you don’t like this example, then consider a string type that retains case but compares in a case-insensitive fashion, such as you might find for a Mac or Windows filesystem.)

Associated constants or generic integers. Rewriting constants into patterns requires that we can fully evaluate the constant at the time of exhaustiveness checking. For associated constants and type-level integers, that is not possible. Consider:

trait SomeTrait {
    const A: bool;
    const B: bool;
}

fn foo<T:SomeTrait>(x: bool) {
    match x {
        T::A => println!("A"),
        T::B => println!("B"),
    }
}

impl SomeTrait for i32 {
    const A: bool = true;
    const B: bool = true;
}    

impl SomeTrait for u32 {
    const A: bool = true;
    const B: bool = false;
}    

Is this match exhaustive? Does it contain dead code? The answer will depend on whether T=i32 or T=u32, of course. But we try to run exhaustiveness checking just once, for all types, so that we can say definitely whether a fn is correct or not, regardless of what T winds up being.

The big alternatives

So what could we do differently? I think there are three big choices. Within each choice there are naturally some tweaks one might consider.

  1. Pattern conversion: The current behavior, with some tweaks.
  2. PartialEq comparison: We could compile matches against a constant to compare using the == operator, so that match foo { BAR => ... } is equivalent to match foo { bar if bar == BAR => ... }
  3. Intersection: We could make it illegal to match against constants unless they are of built-in type, like u32 or something like that f32. These types behave the same for both options 1 and 2.

UPDATE: There is a fourth case I had not considered: Pattern conversion only at trans time. Described loosely in this comment below.

Tweaks to the current behavior

If we wanted to stick with the idea that referencing a constant within a match pattern is short-hand for pattern expansion – or, to put it another way, is a kind of built-in structural equality check – then this does imply that we can never permit associated constants to appear in match patterns. As I described above, associated constants are simply incompatible with this approach, since they cannot be resolved to a specific value.

We can probably tweak some of the other cons by imposing other restrictions:

  • We can fix privacy-busting by checking the expanded patterns for privacy violations, as suggested by @petrochenkov . This means that we would prohibit constant patterns if the constant value (not the type of the constant value) contains any private fields.
  • We could prohibit constant patterns if they are defined with a const fn. This would fix encapsulation-busting.

You see a theme here: basically the current approach means that there are classes and families of constants we may want to exclude from patterns.

PartialEq comparison

The current MIR translation actually implements a variant on patterns where matching against a constant compiles down to a call to PartialEq::eq. This means that a match statement like:

match foo {
    BAR => ...
}

is compiled in roughly the same was as one that uses an if guard:

match foo {
    x if x == BAR => ...
}

Clearly this resolves the abstraction-busting point, since all equality comparisons go through ==. However, it also has other implications:

  1. Exhaustiveness checking must treat constants opaquely. This is both good and bad. For example, it resolves the privacy-busting and encapsulation-busting points, but it may also be annoying in some cases and require an extra _ arm. I consider this to be a net improvement however, particularly since it is unusual that one can fully and exhaustively match against constants. Basically it only happens with bool values.
  2. Generated code can be less efficient. Because the == operator is defined by user code, we have to be more conservative; we can’t extract out the value of a field and switch on that, instead we have to consecutively execute comparisons (after all, we don’t know that the PartialEq code considers that particular field). We might be able to solve this by detecting when PartialEq is defined by deriving and optimizing those cases, since then we know the semantics.
  3. Associated constants – indeed all constants – work just fine. Because we are not trying to build a pattern, we can wait until trans time to evaluate the constant, and hence it works fine with associated constants and other similar extensions.
  4. Some constants will never match. As a simple example, consider matching against NaN (remember that NaN == NaN is false). Today, we report this as a compilation error, but that is not possible if we haven’t fully evaluated every constant to a value. So instead it would just never match. The same might be true for broken PartialEq impls.

Even here, there are also some variants one might consider. In particular, @pnkfelix proposed at one point restricting constants to types that implement Eq rather than PartialEq, which would rule out matching on floating point (probably not a best practice in any case).

One consideration though is that making this change could cause silent changes to the semantics of existing programs. In particular, there may be programs taking active advantage of the abstraction busting property of today’s approach, and using matches to achieve a "structural equality" and == to achieve “semantic equality”. This is hard to know. I think if we were to transition to PartialEq, we would probably want to consider first implementing the "intersection" semantics described below, and then expanding to PartialEq. This means that programs relying on abstraction-busting would be forced to rewrite in the interim.

Finally, another consideration is that this change would imply that pattern matching is not pure. That is, user-defined code can execute. This does not bother me personally. In fact, it seems rather inconsistent to try and enforce purity around match expressions when we have steadily and surely been moving away from purity in the rest of the language. For years now, Rust has been migrating to a library-defined, extensible language. We replaced builtin pointer types with library-defined smart pointers, for example, and we now use Vec and String everywhere rather than types built into the language. As part of that transition, we have added traits like Deref, Index, and PartialEq that allow the user to override virtually every operator. This means that when I see an expression like *x, it may well be running arbitrary code. In practice, this is No Big Deal – no doubt there is the occasional crazy Deref impl, but for the most part they act like you expect, and the world keeps turning. Moreover, I expect sooner or later we will add deref patterns (sometimes called box patterns), which would also violate purity.

Intersection: Restrict constants to builtin types

We could prohibit matching on constants unless the type is a built-in type like i32 or (u32, i32) or something. This is probably not the behavior we want long-term, but it has the advance of being a conservative change that would not change the semantics of any existing programs (though it may cause compilation errors). In any case, it makes sense to implement this change and test it with a crater run (which I have not yet done) to get some sense of how many programs would potentially be affected by any changes we make here.

(As a somewhat amusing side note, for a long time, this is what I thought that the compiler did. When I discovered otherwise, I filed #20489. But then I wrote a commit with the message “do not try to fix #20489”, and GH promptly closed the issue in response (doh). So then I forgot about it until recently. Doh!)

Conclusion

None as of yet. Obviously I would have preferred to have based the semantics on ==, as you can probably tell, but I’m not sure what’s the best thing to do now, given that pattern matching on constants in its current form has been stable for some time.

5 Likes

Some statistics. These are all the constants in patterns used by rustc and standard library.
115 free constants, 0 associated constants (inherent associated constants can currently be used in patterns)
109 constants with built-in types, 6 constants with user defined types

rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/libfmt_macros
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/libsyntax
Const pat(96839: NO_EXPANSION) codemap::ExpnId
Const pat(96840: COMMAND_LINE_EXPN) codemap::ExpnId
Const pat(104601: COMMAND_LINE_SP) codemap::Span
Const pat(126050: token::SELF_KEYWORD_NAME) ast::Name
Const pat(216296: NO_EXPANSION) codemap::ExpnId
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_front
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_llvm
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_back
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_data_structures
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc
Const pat(44975: ast::DUMMY_NODE_ID) u32
Const pat(45139: ast::DUMMY_NODE_ID) u32
Const pat(149428: ROOT_CODE_EXTENT) middle::region::CodeExtent
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_borrowck
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_platform_intrinsics
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_typeck
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_mir
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_resolve
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_trans
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_privacy
Const pat(3067: ast::DUMMY_NODE_ID) u32
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_lint
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_driver
rustc: x86_64-pc-windows-gnu/stage1/bin/rustlib/x86_64-pc-windows-gnu/bin/rustc.exe
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/libcore
Const pat(1027: EXP_MASK) u32
Const pat(1031: EXP_MASK) u32
Const pat(2031: EXP_MASK) u64
Const pat(2035: EXP_MASK) u64
Const pat(138158: WRITING) usize
Const pat(138160: UNUSED) usize
Const pat(138378: WRITING) usize
Const pat(138812: UNUSED) usize
Const pat(170006: TAG_CONT_U8) u8
Const pat(170014: TAG_CONT_U8) u8
Const pat(170021: TAG_CONT_U8) u8
Const pat(170029: TAG_CONT_U8) u8
Const pat(170123: TAG_CONT_U8) u8
Const pat(170124: TAG_CONT_U8) u8
Const pat(170132: TAG_CONT_U8) u8
Const pat(170133: TAG_CONT_U8) u8
Const pat(170140: TAG_CONT_U8) u8
Const pat(170141: TAG_CONT_U8) u8
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/liblibc
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/librand
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/liballoc_system
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/liballoc
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_unicode
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/libcollections
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/liballoc_jemalloc
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/libstd
Const pat(1817: NEG_INFINITY) f32
Const pat(2832: NEG_INFINITY) f64
Const pat(6363: EMPTY_BUCKET) u64
Const pat(48186: EMPTY) usize
Const pat(48190: DISCONNECTED) usize
Const pat(48199: DATA) usize
Const pat(48352: EMPTY) usize
Const pat(48356: DATA) usize
Const pat(48397: DISCONNECTED) usize
Const pat(48507: DATA) usize
Const pat(48508: EMPTY) usize
Const pat(48510: DISCONNECTED) usize
Const pat(48539: DATA) usize
Const pat(48540: DISCONNECTED) usize
Const pat(48541: EMPTY) usize
Const pat(48564: DISCONNECTED) usize
Const pat(48565: EMPTY) usize
Const pat(48568: DATA) usize
Const pat(48613: EMPTY) usize
Const pat(48617: DATA) usize
Const pat(48621: DISCONNECTED) usize
Const pat(48628: DISCONNECTED) usize
Const pat(48701: EMPTY) usize
Const pat(48703: DATA) usize
Const pat(48715: DISCONNECTED) usize
Const pat(48730: DISCONNECTED) usize
Const pat(48810: EMPTY) usize
Const pat(48812: DATA) usize
Const pat(48814: DISCONNECTED) usize
Const pat(48825: EMPTY) usize
Const pat(48847: DATA) usize
Const pat(48851: DISCONNECTED) usize
Const pat(50635: DISCONNECTED) isize
Const pat(50791: DISCONNECTED) isize
Const pat(51006: DISCONNECTED) isize
Const pat(51182: DISCONNECTED) isize
Const pat(51971: DISCONNECTED) isize
Const pat(52233: DISCONNECTED) isize
Const pat(52382: DISCONNECTED) isize
Const pat(52520: DISCONNECTED) isize
Const pat(52674: DISCONNECTED) isize
Const pat(64823: DW_EH_PE_absptr) u8
Const pat(64827: DW_EH_PE_uleb128) u8
Const pat(64832: DW_EH_PE_udata2) u8
Const pat(64838: DW_EH_PE_udata4) u8
Const pat(64844: DW_EH_PE_udata8) u8
Const pat(64850: DW_EH_PE_sleb128) u8
Const pat(64855: DW_EH_PE_sdata2) u8
Const pat(64861: DW_EH_PE_sdata4) u8
Const pat(64867: DW_EH_PE_sdata8) u8
Const pat(64902: DW_EH_PE_absptr) u8
Const pat(64904: DW_EH_PE_pcrel) u8
Const pat(64909: DW_EH_PE_textrel) u8
Const pat(64941: DW_EH_PE_datarel) u8
Const pat(64973: DW_EH_PE_funcrel) u8
Const pat(65959: libc::AF_INET) i32
Const pat(66009: libc::AF_INET6) i32
Const pat(80872: c::IO_REPARSE_TAG_SYMLINK) u32
Const pat(80874: c::IO_REPARSE_TAG_MOUNT_POINT) u32
Const pat(82933: INVALID_SOCKET) u64
Const pat(83006: INVALID_SOCKET) u64
Const pat(83108: INVALID_SOCKET) u64
Const pat(85893: WAIT_OBJECT_0) u32
Const pat(88459: libc::ERROR_ACCESS_DENIED) i32
Const pat(88461: libc::ERROR_ALREADY_EXISTS) i32
Const pat(88463: libc::ERROR_BROKEN_PIPE) i32
Const pat(88465: libc::ERROR_FILE_NOT_FOUND) i32
Const pat(88467: c::ERROR_PATH_NOT_FOUND) i32
Const pat(88469: libc::ERROR_NO_DATA) i32
Const pat(88471: libc::ERROR_OPERATION_ABORTED) i32
Const pat(88473: libc::WSAEACCES) i32
Const pat(88475: libc::WSAEADDRINUSE) i32
Const pat(88477: libc::WSAEADDRNOTAVAIL) i32
Const pat(88479: libc::WSAECONNABORTED) i32
Const pat(88481: libc::WSAECONNREFUSED) i32
Const pat(88483: libc::WSAECONNRESET) i32
Const pat(88485: libc::WSAEINVAL) i32
Const pat(88487: libc::WSAENOTCONN) i32
Const pat(88489: libc::WSAEWOULDBLOCK) i32
Const pat(88491: libc::WSAETIMEDOUT) i32
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/libflate
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/libarena
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/liblog
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/libterm
Const pat(7721: color::BLACK) u16
Const pat(7723: color::BLUE) u16
Const pat(7725: color::GREEN) u16
Const pat(7727: color::RED) u16
Const pat(7729: color::YELLOW) u16
Const pat(7733: color::MAGENTA) u16
Const pat(7737: color::CYAN) u16
Const pat(7741: color::WHITE) u16
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/libserialize
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/libgetopts
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/librbml
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/libtest
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/libgraphviz
rustc: x86_64-pc-windows-gnu/stage2/bin/rustlib/x86_64-pc-windows-gnu/lib/librustc_bitflags

Given this statistics it may be possible to largely circumvent the problem by restricting constants in patterns to built-in types. (A crater run would still give a better picture)
To respect backward compatibility, a warning may be issued in rare cases where matching on complex constants is used - “you are matching on a complex constant, it may not respect equality and perform unwanted exhaustiveness checking, manually use == instead”.

For me, I never expected pattern matching to run ==, so I am against such changes. Current semantic roughly agrees with my intuition.

I agree with forbidding associated constants and function calls in constants for pattern matching. I am not sure about privacy checking. In my mind, there can be a public constant whose value contains private fields. I don’t see it as busting privacy, because after all, for the case to trigger, constants need to be public.

It is unclear how big a problem this will be in practice, but the interaction with exhaustiveness definitly adds new and exciting ways for you to break clients. I guess there is another middle-ground that I hadn't considered, which is to preserve the existing runtime semantics, but remove the interaction with exhaustiveness checking etc. This would even permit associated constants at some point.

To elaborate a bit more on this alternative, which I will call Pattern conversion only at trans time. The idea would be that we treat constants as opaque when checking for exhaustiveness and dead-code, which ameliorates the privacy-busting and encapsulation-busting cons. It also permits associated constants to appear, since we only need to do pattern expansion during trans, at which point we are fully monomorphized.

It does not, however, resolve the fact that we have two notions of equality, so there is still abstraction-busting that lets you distinguish “deeply structural equal” and “semantically equal” (even if you didn’t intend to do so).

In my mind, exhaustiveness backward compatibility hazard is an orthogonal problem to constants in pattern matching. While the interaction with exhaustiveness is certainly surprising and undesirable, adding a new enum variant can break clients by exhaustiveness today and the interaction doesn’t seem worse than that.

We had RFC 757 to address exhaustiveness problems and I think it is a good solution for this too. If you have public constants TRUE and FALSE but do not want clients to see them as exhaustive, you can make them an extensible enum.

I disagree that these problems are the same. It is well-known that if you export a public enum you can break clients, yes, but if you can break clients by changing private details or your structure, that is another thing.

EDIT:

There is no enum here to mark as extensible or inextensible, though.

The "private field matching" problem is annoying as it can break encapsulation, and I think the best way to prevent it is to prohibit matching on values with private fields. I also don't care much for projections in match - you can always write the guard version.

On the other hand, I don't think the const fn problem is really problematic - callers can observe their callee's precise behaviour, and if the call occurs at compile time, they can observe it at compile time, with it not being much worse than [u8; 1/(HASH_ONE^HASH_ZERO)].

+1 for making unreachable arm detection a lint - this is required for improving check_match backwards-compatibly.

trans-time expansion still allows you to do structural equality on types with private fields, which breaks parametricity in possibly-unintended ways (not that we will have any parametricity left after specialization anyway, but this seems more "accidental").

I would prefer that matching always do structural equality, even on types with floating point.

What are these uses? I'm keen to understand the use cases. (sorry, I couldn't parse the long list of data very well).

(To my mind, I like the idea of banning pattern matching on non-built-in types (perhaps also allow public C-like enums?), but I would like to understand the reasons it is used before committing to that position).

https://github.com/rust-lang/rust/blob/master/src/libsyntax/codemap.rs#L977

https://github.com/rust-lang/rust/blob/master/src/libsyntax/diagnostic.rs#L810

https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/parser.rs#L2072

https://github.com/rust-lang/rust/blob/master/src/libsyntax/ext/base.rs#L683

https://github.com/rust-lang/rust/blob/master/src/librustc/middle/region.rs#L650

1 Like

@petrochenkov thanks for gathering that data. To my eye, these mostly look like they are intended to compare for equality, and so would continue to work the same under any set of rules (presuming Eq impls exist, of course).

Yes, agreed. I guess saying that this "solves" the privacy question was an overstatement -- it addresses the "downstream clients continue to build" part of privacy, but the value of private fields can still be "observed" in a sense. This seems to be the overlap of what I was calling "privacy busting" and "abstraction busting".

If we could get away with it, I’d ideally like to move towards a “you can only pattern match on constants” world (the intersection alternative). This gives us maximum flexibility later on and may not actually break code in practice. Cases which use structure constants can switch to the manual desugaring of guards if required.

I definitely don’t think we can do this if it’s heavily used, however, so I’d at the very least expect a crater run, 1-2 cycles of warnings of usage, and then turning it into an error.

2 Likes

In the short term, I think the intersection approach would be best.

Long term, I think using PartialEq would match what I expect most. While I’m perhaps not very familiar with pattern matching due to mostly being familiar with C++, I expect pattern matching to be user extensible and respect the existing ways that users can change the behavior of a type. If I define PartialEq for a type, I expect the semantics of the entire language to respect it, not just parts of the language. That’s probably the biggest consideration for me, since I don’t like special cases (if I wanted lots of special cases to remember, I’d use C++ rather than Rust :slight_smile: )

I expect pattern matching to be user extensible.

I think this may be the primary difference. I am familiar with pattern matching, and I do not expect pattern matching to be user extensible. As far as I know, neither OCaml nor Haskell have user extensible pattern matching. On the other hand, I think familiarity argument is a moot, since people familiar with pattern matching is a minority, and Rust does not target that minority. In that sense I am interested in what people expect.

Continuing on my expectation, even if pattern matching were user extensible, I do not expect it to use user extensible equality. This is because equality is not sufficent to match patterns.

Consider Expr type, with values like Int(4) and Add(Int(2), Int(2)). Assume user extensible equality for Expr is defined such that expressions are compared after evaluation. (This is probably not a good idea for this particular case, but it is a stand-in for user extensible equality here.) Following constants are defined.

const ONE: Expr = Int(1);
const TWO: Expr = Int(2);
const THREE: Expr = Int(3);
const FOUR: Expr = Int(4);
const TWO_AND_TWO: Expr = Add(TWO, TWO);
const ONE_AND_THREE: Expr = Add(ONE, THREE);

Using user extensible equality, following patterns match TWO_AND_TWO: FOUR, TWO_AND_TWO, ONE_AND_THREE, Add(TWO, TWO). This pattern is not matched: Add(ONE, THREE). This pattern is matched: Add(x, y), but equality does not provide what should be the value of x and y.

1 Like

Not sure if you've seen it, but Scala has nice support for extensible patterns. I'd be interested in pursuing similar solutions at some point, though I don't think it's something we need in short term or in medium term. Still, while I don't think we have to go out of our way to make patterns 100% extensible, I think it makes sense for pattern matching to respect the extensibility that we DO have.

This is also my current opinion. It seems best if we can make this decision in an "affirmative" way -- that is, picking the semantics that we want freely, without feeling constrained. I will gin up a branch that enforces this condition and do a crater run to see what the results are.

That's not quite what I had in mind. In particular, Add(TWO, TWO) would not match against Int(4), because matching the pattern Add(TWO, TWO) is distinct from matching the constant TWO_AND_TWO. The former would test that the variant is Add, and then test whether the two arguments are equal to TWO. The latter would check whether the enum as a whole is equal to TWO_AND_TWO.