So, in the process of doing work on MIR, I’ve also encountered some surprises (to me, at least) in how patterns are currently handled by rustc, and in particular the way that pattern matching treats constants. I am not convinced that the current behavior is what we want, though it has its merits to be sure, and I think it is something we ought to discuss and consider more broadly.
The immediate concern: matching on constants
The thing which I am most prominently worried about is the semantics of matching against a constant. Consider a program like this one:
struct SomeType {
a: u32,
b: u32,
}
const SOME_CONSTANT: SomeType = SomeType { a: 22+22, b: 44+44 };
fn test(v: SomeType) {
match v {
SOME_CONSTANT => println!("Yes"),
_ => println!("No"),
}
}
The question at hand is what do we expect this match to do, precisely?
Personally, I expect it to print Yes
if v == SOME_CONSTANT
, and
No
otherwise. In fact, since SomeType
does not implement
PartialEq
, I expect this code not to compile. But it does! So what
does it do?
Today what we do is to first evaluate SOME_CONSTANT
to a value. So we would
reduce the expression:
const SOME_CONSTANT: SomeType = SomeType { a: 22+22, b: 44+44 };
to the value SomeType { a: 44, b: 88 }
. We then expand the pattern
SOME_CONSTANT
as though you had typed this value in place (well,
almost as though, read on for some complications around privacy).
Thus the match statement above is equivalent to:
match v {
SomeType { a: 44, b: 88 } => println!(Yes),
_ => println!("No"),
}
So, effectively, patterns define a kind of “structural equality” that
is distinct from the ==
operator. It is not related to the
PartialEq
trait, or really any user-defined code, but rather tied
directly to the structure of the type being matched.
Pros and cons of the current behavior
The current behavior has both pros and cons, from my point of view. Let’s look at some of these.
Pros
Better optimization. One of the biggest “pros” is that it can potentially enable nice optimization. For example, if I have:
struct Value { x: u32 }
const V1: Value = Value { x: 0 };
const V2: Value = Value { x: 1 };
const V3: Value = Value { x: 2 };
const V4: Value = Value { x: 3 };
const V5: Value = Value { x: 4 };
and I have a match pattern like:
match v {
V1 => ...,
...,
V5 => ...,
}
Then because pattern matching is always a process of structurally
extracting values, we can compile this to code that reads the field
x
(which is a u32
) and does an appropriate switch on that
value. Some of the alternatives we’ll see below would potentially
force a more conservative compilation strategy.
Better exhautiveness and dead-code checking. Similarly, we can do more thorough exhaustiveness and dead-code checking. So for example if I have a struct like:
struct Value { field: bool }
const TRUE: Value { field: true };
const FALSE: Value { field: false };
and a match pattern like:
match v { TRUE => .., FALSE => .. }
then we can prove that this match is exhaustive. Similarly, we can prove that the following match contains dead-code:
const A: Value { field: true };
match v {
TRUE => ...,
A => ...,
}
Again, some of the alternatives might not allow this. (But note the cons, which also raise the question of exhaustiveness checking.)
Nullary variants and constants are (more) equivalent. Currently, there is a sort of equivalence between enum variants and constants, at least with respect to pattern matching. Consider a C-like enum:
enum Modes {
Happy = 22,
Shiny = 44,
People = 66,
Holding = 88,
Hands = 110,
}
const C: Modes = Modes::Happy;
Now if I match against Modes::Happy
, that is matching against an
enum variant, and under all the proposals I will discuss below, it
will check the actual variant of the value being matched (regardless
of whether Modes
implements PartialEq
, which it does not here). On
the other hand, if matching against C
were to require a PartialEq
impl, then it would be illegal. Therefore matching against an enum
variant is distinct from matching against a constant.
Backwards compatible. Not to be overlooked, the current behavior is, well, the current behavior. There are probably programs relying on it, so we would want to measure and consider what will happen to those programs if we make changes. That said, the current behavior was added in July of 2014 and unfortunately never went through the RFC process; I consider changing it to fall under the "underspecific language semantics" clause of RFC 1122. But clearly we want to be careful here.
Cons
But there are some downsides to this approach as well.
Privacy-busting. The current constant expansion code does not consider privacy. In other words, constants are expanded into equivalent patterns, but those patterns may not have been something the user could have typed because of privacy rules.
Consider a module like:
mod foo {
pub struct Foo { b: bool }
pub const V1: Foo = Foo { b: true };
pub const V2: Foo = Foo { b: false };
}
Note that there is an abstraction boundary here: b is a private field. But now if I wrote code from another module that matches on a value of type Foo, that abstraction boundary is pierced:
fn bar(f: x::Foo) {
// rustc knows this is exhaustive because if expanded `V1` into equivalent
// patterns; patterns you could not write by hand!
match f {
x::V1 => { }
x::V2 => { }
}
}
This seems obviously bad. Now if I add new fields to Foo, or perhaps change that b field to an i32, suddenly those examples stop compiling.
Encapsulation-busting. The current notion of patterns also has the
potential to interact poorly with extensions like const fn
. In
particular, it renders the algorithm used by a const fn
“transparent” the consumer. Imagine for example that I define a fn
hash
that, given a previous hash and a value, produces a new hash.
Because I am lazy and prototyping my system, I decide for now to just
ignore the new value and pass the old hash through:
const fn add_to_hash(prev_hash: u64, _value: u64) -> u64 {
prev_hash
}
Now I have some consumers of my library and they define a few constants:
const HASH_ZERO: add_to_hash(0, 0);
const HASH_ONE: add_to_hash(0, 1);
And at some point they write a match statement:
fn process_hash(h: u64) {
match h {
HASH_ZERO => /* do something */,
HASH_ONE => /* do something else */,
_ => /* do something else again */,
}
In fact, up until very recently, this would not have compiled,
because, for historical reasons, pattern matching uses a simplified
version of constant evaluation, and that version did not yet support
const fn
. But this was not for any principled reason, it was just an
artifact of the current implementation, and in fact PR #26848 just
landed which lifts this restriction. (Though the example
currently ICEs for some reason.)
Anyhow, once we get all those details sorted out, then what you would
get for this match is a dead-code error, because the compiler can
see that HASH_ZERO
and HASH_ONE
are the same value. (As as aside,
I think this should be a lint warning and not an error, but either way
is bad here.) You can make similar examples that lead to
exhaustiveness errors.
This seems bad to me: it is exposing the details of my hash algorithm to clients, whereas I probably intended for clients to treat the algorithm as opaque and subject to change.
Leaky abstractions; or, two notions of equality. We already have a
user-defined notion of equality, which is the PartialEq
trait. But
this interpretation of constants ignores that and defines a second,
stricter notion of equality. This will cause problems for user-defined
types that rely on PartialEq
to define equality.
Consider a simple duration type:
#[derive(Copy, Clone)]
pub struct Duration {
pub seconds: u32,
pub minutes: u32,
}
Let’s say that this Duration
type wishes to represent a span of
time, but it also wishes to preserve whether that time was expressed
in seconds or minutes. In other words, 60 seconds and 1 minute are
equal values, but we don’t want to normalize 60 seconds into 1 minute;
perhaps because it comes from user input and we wish to keep things
just as the user chose to express it.
We might implement PartialEq
like so (actually the PartialEq
trait
is slightly different, but you get the idea):
impl PartialEq for Duration {
fn eq(&self, other: &Duration) -> bool {
let s1 = (self.seconds as u64) + (self.minutes as u64 * 60);
let s2 = (other.seconds as u64) + (other.minutes as u64 * 60);
s1 == s2
}
}
Now imagine I have some constants:
const TWENTY_TWO_SECONDS: Duration = Duration { seconds: 22, minutes: 0 };
const ONE_MINUTE: Duration = Duration { seconds: 0, minutes: 1 };
And I write a match statement using those constants:
fn detect_some_case_or_other(d: Duration) {
match d {
TWENTY_TWO_SECONDS => /* do something */,
ONE_MINUTE => /* do something else */,
_ => /* do something else again */,
}
}
Now this code is, in all probability, buggy. Probably I meant to use
the notion of equality that Duration
defined, where seconds and
minutes are normalized. But that is not the behavior I will see –
instead I will use a pure structural match. What’s worse, this means
the code will probably work in my local tests, since I like to say
"one minute", but it will break when I demo it for my customer, since
she prefers to write “60 seconds”. Grrr.
(If you don’t like this example, then consider a string type that retains case but compares in a case-insensitive fashion, such as you might find for a Mac or Windows filesystem.)
Associated constants or generic integers. Rewriting constants into patterns requires that we can fully evaluate the constant at the time of exhaustiveness checking. For associated constants and type-level integers, that is not possible. Consider:
trait SomeTrait {
const A: bool;
const B: bool;
}
fn foo<T:SomeTrait>(x: bool) {
match x {
T::A => println!("A"),
T::B => println!("B"),
}
}
impl SomeTrait for i32 {
const A: bool = true;
const B: bool = true;
}
impl SomeTrait for u32 {
const A: bool = true;
const B: bool = false;
}
Is this match exhaustive? Does it contain dead code? The answer will
depend on whether T=i32
or T=u32
, of course. But we try to run exhaustiveness
checking just once, for all types, so that we can say definitely whether a fn
is correct or not, regardless of what T
winds up being.
The big alternatives
So what could we do differently? I think there are three big choices. Within each choice there are naturally some tweaks one might consider.
- Pattern conversion: The current behavior, with some tweaks.
-
PartialEq comparison: We could compile matches against a constant
to compare using the
==
operator, so thatmatch foo { BAR => ... }
is equivalent tomatch foo { bar if bar == BAR => ... }
-
Intersection: We could make it illegal to match against
constants unless they are of built-in type, like
u32
or something like thatf32
. These types behave the same for both options 1 and 2.
UPDATE: There is a fourth case I had not considered: Pattern conversion only at trans time. Described loosely in this comment below.
Tweaks to the current behavior
If we wanted to stick with the idea that referencing a constant within a match pattern is short-hand for pattern expansion – or, to put it another way, is a kind of built-in structural equality check – then this does imply that we can never permit associated constants to appear in match patterns. As I described above, associated constants are simply incompatible with this approach, since they cannot be resolved to a specific value.
We can probably tweak some of the other cons by imposing other restrictions:
- We can fix privacy-busting by checking the expanded patterns for privacy violations, as suggested by @petrochenkov . This means that we would prohibit constant patterns if the constant value (not the type of the constant value) contains any private fields.
- We could prohibit constant patterns if they are defined with a
const fn
. This would fix encapsulation-busting.
You see a theme here: basically the current approach means that there are classes and families of constants we may want to exclude from patterns.
PartialEq comparison
The current MIR translation actually implements a variant on patterns
where matching against a constant compiles down to a call to
PartialEq::eq
. This means that a match statement like:
match foo {
BAR => ...
}
is compiled in roughly the same was as one that uses an if
guard:
match foo {
x if x == BAR => ...
}
Clearly this resolves the abstraction-busting point, since all
equality comparisons go through ==
. However, it also has other implications:
-
Exhaustiveness checking must treat constants opaquely. This is
both good and bad. For example, it resolves the privacy-busting and
encapsulation-busting points, but it may also be annoying in some
cases and require an extra
_
arm. I consider this to be a net improvement however, particularly since it is unusual that one can fully and exhaustively match against constants. Basically it only happens withbool
values. -
Generated code can be less efficient. Because the
==
operator is defined by user code, we have to be more conservative; we can’t extract out the value of a field and switch on that, instead we have to consecutively execute comparisons (after all, we don’t know that thePartialEq
code considers that particular field). We might be able to solve this by detecting whenPartialEq
is defined byderiving
and optimizing those cases, since then we know the semantics. - Associated constants – indeed all constants – work just fine. Because we are not trying to build a pattern, we can wait until trans time to evaluate the constant, and hence it works fine with associated constants and other similar extensions.
-
Some constants will never match. As a simple example, consider
matching against
NaN
(remember thatNaN == NaN
is false). Today, we report this as a compilation error, but that is not possible if we haven’t fully evaluated every constant to a value. So instead it would just never match. The same might be true for brokenPartialEq
impls.
Even here, there are also some variants one might consider. In
particular, @pnkfelix proposed at one point restricting constants to
types that implement Eq
rather than PartialEq
, which would rule
out matching on floating point (probably not a best practice in any
case).
One consideration though is that making this change could cause silent
changes to the semantics of existing programs. In particular, there
may be programs taking active advantage of the abstraction busting
property of today’s approach, and using matches to achieve a
"structural equality" and ==
to achieve “semantic equality”. This is
hard to know. I think if we were to transition to PartialEq
, we
would probably want to consider first implementing the
"intersection" semantics described below, and then expanding to
PartialEq
. This means that programs relying on abstraction-busting
would be forced to rewrite in the interim.
Finally, another consideration is that this change would imply that
pattern matching is not pure. That is, user-defined code can
execute. This does not bother me personally. In fact, it seems rather
inconsistent to try and enforce purity around match expressions when
we have steadily and surely been moving away from purity in the rest
of the language. For years now, Rust has been migrating to a
library-defined, extensible language. We replaced builtin pointer
types with library-defined smart pointers, for example, and we now use
Vec
and String
everywhere rather than types built into the
language. As part of that transition, we have added traits like
Deref
, Index
, and PartialEq
that allow the user to override
virtually every operator. This means that when I see an expression
like *x
, it may well be running arbitrary code. In practice, this is
No Big Deal – no doubt there is the occasional crazy Deref
impl,
but for the most part they act like you expect, and the world keeps
turning. Moreover, I expect sooner or later we will add deref patterns
(sometimes called box
patterns), which would also violate purity.
Intersection: Restrict constants to builtin types
We could prohibit matching on constants unless the type is a built-in
type like i32
or (u32, i32)
or something. This is probably not the
behavior we want long-term, but it has the advance of being a
conservative change that would not change the semantics of any
existing programs (though it may cause compilation errors). In any
case, it makes sense to implement this change and test it with a
crater run (which I have not yet done) to get some sense of how many
programs would potentially be affected by any changes we make here.
(As a somewhat amusing side note, for a long time, this is what I thought that the compiler did. When I discovered otherwise, I filed #20489. But then I wrote a commit with the message “do not try to fix #20489”, and GH promptly closed the issue in response (doh). So then I forgot about it until recently. Doh!)
Conclusion
None as of yet. Obviously I would have preferred to have based the semantics on ==
, as you can probably tell, but I’m not sure what’s the best thing to do now, given that pattern matching on constants in its current form has been stable for some time.