Pre-RFC: lint empty arms to stabilize `exhaustive_patterns`

I'd quite like to move forward on the stabilization of the exhaustive_patterns feature, which allows omitting match arms for empty types, e.g.:

let x: Result<u8, !> = Ok(5);
match x {
  Ok(y) => y,
  // no need to add a `Err` pattern, as `!` is uninhabited.
}

Summary

I propose a lint that warns when we omit an empty arm. Unsafe users can turn it on to avoid footguns, safe users can leave it off.

Motivation

The main blocker for exhaustive_patterns is the interaction of omitted match arms with unsafe code. This was raised, along with a proposed solution, by @nikomatsakis here.

Here's how I understand it: an empty type is a type that has no valid values. Around unsafe code, we can have invalid or uninitialized values, so we must be careful about which code assumes/implies validity.

First example (playground):

#[derive(Copy, Clone)]
enum Void {}
union Uninit<T: Copy> {
    value: T,
    uninit: (),
}
unsafe {
    let mut x: Uninit<(u32, Void)> = Uninit { uninit: () };
    x.value.0 = 22; // initialize first part of the tuple

    match x.value {
        (v, _) => println!("{v}"),
    }
}

Here we get a perfectly reachable arm of type (u32, Void), because the tuple was partially initialized. Omitting the arm could also be ok, and would assert validity of the second element of the tuple. In safe code however this arm is truly unreachable.

The second example (taken from @RalfJung here) is x: &!. Here match x {} implicitly dereferences x, checks the discriminant, and declares UB because there can be no valid discriminant. Compare with match x { y => ... } which is essentially just a let-binding. Here the absence of an arm triggers the dereference and validity assertion. This is a footgun.

The conclusion is that something must be done to treat empty patterns specially around unsafe code.

Proposal

The original proposal proposed special patterns called "never patterns" to denote unreachable arms that assert/require validity of some empty type. No progress happened on this since however. My proposal is more humble.

I propose a new allow-by-default omitted_empty_arm (bikesheddable name) lint, gated under exhaustive_patterns. It warns the user if they omitted an empty arm. For example:

let x: Result<T, !> = ...;
#[allow(omitted_empty_arm)]
match x {
  Ok(y) => ...,
  // all good
}
#[deny(omitted_empty_arm)]
match x {
  Ok(y) => ...,
  // ERROR: add a "Err(_)" arm
}

In other words, whenever a match would be accepted under current exhaustive_patterns but rejected without the feature, instead of emitting a "match is not exhaustive" error, we emit a "match should mention empty patterns explicitly" warning.

The intention is that users of unsafe would turn it on to avoid accidental soundness mishaps, and users of safe code would leave it off. If a user of unsafe does want to assert validity they can turn the lint off locally, which acts as a reminder that something fishy is going on. This is in the vein of @nikomatsakis's "let users choose what kind of help they want from the compiler".

This isn't as elegant as the full never patterns proposal, but has the advantage that I can implement it myself in a few days if this gets positive feedback. This is also forwards-compatible with never patterns.

Drawbacks

Having the lint allow-by-default risks that unsafe users forget to turn it on and shoot themself in the foot. This could be solved with profiles.

Alternatives

Instead of an optional lint, we could set it deny-by-default. This has the same behavior (modulo different diagnostics) as leaving exhaustive_patterns off. This could allow to stabilize the feature without changing which match statements are allowed. Warn-by-default is an interesting middle-ground too.

I don't consider never patterns as an alternative since this is forwards-compatible with it. I know of no other alternative, apart from never stabilizing exhaustive_patterns.

Open questions

  • Should the lint be allow-by-default (and unsafe users must remember to turn it on), or warn/deny-by-default (and safe users will turn it off if they don't do unsafe)?
  • When the lint is off, should we warn that empty arms are unreachable?
  • Instead of/in addition to setting the lint level by hand, should we turn it on automatically within unsafe blocks? If so, how do we still letusers control it?

EDIT: clarified my intentions around allow-by-default

1 Like

There's no UB here, at least once this finally lands. x.value constructs a place and performs projection to the field, which is fine because it is in-bounds. Binding that to (v, _) loads x.value.0 but that is initialized. x.value.1 is never loaded so we never construct an invalid value.

This example also doesn't even rely on exhaustive_patterns.

References are the only case I currently know where the absence of an arm actually causes UB. So we could also start by stabilizing a version of exhaustive_patterns that does not recurse through references (nor through unions).

4 Likes

Indeed! What I mean is that under exhaustive_patterns the arm is currently warned as unreachable, and omitting the arm is considered exhaustive. That's surprising since the arm can be actually reachable. The lint would let the user make an informed choice.

EDIT: I clarified the OP a bit to make this clearer

EDIT: oh I thought it caused UB to omit that arm, amended again

EDIT: MIRI doesn't like it when I omit the arm (playground), are you sure it isn't UB?

Another non-reference case where removing the arm looks UB to MIRI, adapted from here (playground):

#[derive(Copy, Clone)]
enum Void {}
union Uninit<T: Copy> {
    value: T,
    uninit: (),
}
unsafe {
    let x: Uninit<Void> = Uninit { uninit: () };
    match x.value {
        _ => println!("hi from the void!"),
    }
}

Naïve thought: All of the examples here contain unions. Is there some type-based rule that could distinguish them from the safe case, so that code not touching any unions can omit the arm?

1 Like

It is? IMO that's a serious bug.

According to Miri the code is reachable without UB. So if we are saying it's unreachable then that is just wrong. In fact we should guarantee that the print is reached! Everything else would be a codegen bug.

When I run this in Miri on the playground, it doesn't report UB?

EDIT: Oh, you said "hen removing the arm", sorry.

Ok. Further question: if that match was not within an unsafe block, would the arm be guaranteed to be truly unreachable? Because of some validity constraint in safe code or something? If not then I will never warn around unreachability here.

Further further question: what about:

unsafe {
  let mut x: Uninit<Result<u32, Void>> = Uninit { uninit: () };
  // do some magic
  match x {
    Ok(_) => println!("ok"),
    Err(_) => println!("err"),
  }
}

Can Err be reachable here ever?

Indeed, the code I linked is ok under the "match with a wildcard doesn't touch the value" rule. What I mean is that it becomes UB when you remove the arm.

unsafe blocks make no difference for whether some code has UB or not.

But if the match was not in an unsafe block then it couldn't project into union fields or load from raw pointers. That's why I think it is those operations that should be considered here.

I assume you mean match x.value { ...?

No. This match has to read the discriminant of x.value, and reading the discriminant will never return the discriminant of an uninhabited variant. Therefore the Err variant is truly unreachable.

I don't quite understand your answer. Allow me to rephrase: I propose that, under the exhaustive_patterns feature, given a match on a value of type (u32, Void) whose match keyword is not within an unsafe block, rustc would warn that its arms (if any) are all unreachable. Is this warning good?

Indeed oops

Great, thank you

The discussion about unsoundness of how we currently handle empty matches has moved to here: Incorrect "unreachable match arm" warning · Issue #117119 · rust-lang/rust · GitHub

Note that match works on places, not values. That's part of the confusion here. You can match on an uninitialized place, and if you only use the _ pattern then that's still completely fine since no invalid value is ever constructed.

I'm not sure the presence or absence of unsafe should be the deciding point, since unsafe doesn't change program behavior. I definitely think we shouldn't even use exhaustive_patterns if the place is inside a union or a raw pointer -- so for those cases we don't need a lint, we get a hard error.

Could you give some examples on which you want the lint to fire?

Oh! That does clarify things a lot, ty.

This might be naive, but: can I rely on any &T found outside an unsafe block to be pointing to a valid value? IOW, is it UB to let a non-safe value escape into safe code? This is the sense in which I thought unsafe could be the deciding point, particularly regarding &!.

You mean the lint I propose in the OP? My initial idea was: whatever difference there is between enabling exhaustive_patterns and not enabling it, turn that difference into a lint.

Having discussed things more with you, I think I'd now sum up my lint as: instead of a using a never pattern to indicate that an unreachable arm is accessing things, use #[allow(omitted_empty_arm)] to signify that.

So I'd want it to fire anywhere we'd want the compiler to require a never pattern, which seems to be: if the scrutinee is within a union or a ptr deref and we omitted an arm for an uninhabited type. As a stretch goal it would be nice to also allow omitting arms of type &!, and the lint would fire if we're not sure the reference is safe for &!, e.g. because we're in an unsafe block.

I'm hoping it can get us to experiment and iron things out without needing to fully RFC never patterns yet. Get things moving a bit. Particularly since I can implement such a lint on my own.

Yeah that's the eventual goal. We can only make it a hard error if we have never patterns to resolve the error however. Here's I'm using #[allow(..)] as a substitute.

It may make more sense to RFC never patterns for real at this point honestly, what do you think?

The distinction is generally “known code” vs “unknown code”. You can allow both safe or unsafe code to observe a value that is violating its safety invariants when you know that code does not rely on those invariants. Once you allow the value to be used by arbitrary unknown code then you no longer know whether it will rely on those safety invariants to avoid UB.

I may sound like a broken record, but unsafe blocks do not alter semantics. You can never change whether a program has UB by adding or removing or changing the size of an unsafe block. I am running out of different ways to say this. :wink:

Under the semantics implemented by Miri (which for this case, are different from what we document, to avoid making a commitment): having an &bool is not UB, but loading a bool from it. This is the validity invariant. The safety invariant says that &bool points to a valid bool, so passing a bad &bool to outside safe code is unsound -- but that's different from it being UB.

If you are confused by the distinction of safety invariant and validity invariant, here's a blog post on the topic:

That would mean accepting the code without the match arm though, which I'd rather not do. Until we have stronger motivation, I'd prefer if we stabilize only a minimal part of exhaustive_patterns: the one where a _ arm at the end is truly unreachable if the safety invariant of the base of the place expression is satisfied. For raw pointers and unions we don't know their safety invariant so we shouldn't assume anything about it.

2 Likes

Oh I had not understood that, that clarifies so many things. And safety only needs to be enforced at the abstraction boundary if I follow correctly.

This all answers my question about &!: I can't reliably tell if a &! is empty in exhaustiveness checking. I'll leave that for future work then.

Phew, thank you both for taking the time to explain, I'm slowly making sense of this. It was helpful to re-read your blog post.

Wait, don't you mean the validity invariant? The way I understand things, if we assume safety we would allow match x {} on a x: &!.

I'm with you on stabilizing a minimal part though. At the very least it would allow let Ok(x) = ... for Result<T, !> which is what we want most.

1 Like

Hm, yeah. But if we say validity then we maybe shouldn't allow match *x {} for x: &!; the validity invariant for that case is still being debated.

:+1:

1 Like

+1

This would be possible with feature partial Sum Types

I am stopping this in favor of doing never patterns properly: Tracking Issue for Never Patterns · Issue #118155 · rust-lang/rust · GitHub

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.