Blog post: never patterns, exhaustive matching, and uninhabited types

I wrote a blog post:

RFC 1216 introduced ! as the sort of “canonical” uninhabited type in Rust, but actually one can readily make an uninhabited type of your very own just by declared an enum with no variants (e.g., enum Void { }). Since such an enum can never be instantiated, the type cannot have any values. Done.

However, ever since the introduction of !, we’ve wrestled with some of its implications, particularly around exhaustiveness checking – that is, the checks the compiler does to ensure that when you write a match, you have covered every possibility. As we’ll see a bit later, there are some annoying tensions – particularly between the needs of “safe” and “unsafe” code – that are tricky to resolve.

Recently, though, @RalfJung and I were having a chat and we came up with an interesting idea I wanted to write about. This idea offers a possibility for a “third way” that lets us resolve some of these tensions, I believe.

Read the rest.

11 Likes

I like the idea of having ! psuedo-patterns and making them optional in safe code but lint-recommended in unsafe code. That seems like a really good trade-off.

I still don’t get the point about tuples though. With the current exhaustive_patterns feature you can do exactly what’s written in the example code:

unsafe {
    let x: Uninit<(u32, T)> = Uninit { uninit: () };
    x.value.0 = 22; // initialize first part of the tuple
    ...
    match x.value {
        (v, _) => {
            // access only first part of the tuple
        }
    }
    ...
}

It’s just that, if T is known to be !, you can also leave out the (v, _) branch entirely. I don’t understand why this is problematic. If you’re in safe code then you should never have a value of type (u32, !), and if you’re in unsafe code then this will raise a warning.

I’m not sure whether the match statement above is, or should be, undefined behaviour in the case of T = !. But this seems completely orthogonal to whether we auto-never (u32, !) or not. What am I missing?

1 Like

Nice post.

What I enjoyed about it is the idea of auto-never elaboration to a fully elaborated in-language syntax with ! patterns. Hopefully these ! patterns would satisfy the identity pat | ! = pat per @canndrew’s comment here.

However, not applying auto-never elaboration to product types (tuples, etc.) struck me as imposing a surprise tax on users of safe Rust. I personally think this would be a mistake.

What I would instead propose is that we do the full auto-never elaboration for sum types, products, and references (as per RFC 1872) in safe code and simply disallow (at least initially) any auto-never elaboration in unsafe code. An exception will have to be made for 0-deep cases like (match expr: UninhabitedType {}) in unsafe because it is already permitted and stable.

I think this ergonomics tax for users of unsafe code to have to write out the ! patterns themselves is a good priority because the extra clarity is beneficial (you are doing dangerous things, so leaving less things inferred should be helpful). I also think making safe code ergonomic should be our priority, so this works out nicely.

2 Likes

I still firmly believe that we need auto-derived non-implementable Inhabited and Uninhabited traits. (while I dislike the latter, it can be made an alias when and if we’ll get trait aliases, negative bounds and proper detection of uninhabitness for corner cases) Ok, maybe auto-bound will create too much churn to be practical, so it makes sense to drop it.

Having two traits should solve problem with obscure corner cases with recursive uninhabited types, as they will not implement any of those traits. In addition to providing more guarantees in generic unsafe code (with Uninit you still can create uninhabited type values), it will also allow us to write the following code:

impl<T: Inhabited, E: Uninhabited> Result<T, E> {
    /// Safely unwraps Ok variant
    fn always_ok(self) -> T { .. }
}

impl<T: Uninhabited, E: Inhabited> Result<T, E> {
    fn always_err(self) -> E { .. }
}

Which I think is somewhat better and easier to understand than let Ok(value) = result; magic.

This is exactly the question. We want the match expression you wrote to NOT be UB. However, if you leave away the only branch, then it must be UB as there is no code we could otherwise execute.

Basically, your match is equivalent to

{
  let v = x.value.0;
  // access only first part of the tuple
}

and hence does not ever touch a T, so whether there is a valid T stored there does not matter.

Note that focusing entirely on ! is a bit of a red herring. If T was bool, the following would be UB:

    match x.value {
        (v, true) => {
            // access only first part of the tuple
        }
        _ => {}
    }

because you are "touching" an uninitialized and hence invalid bool. However, your match is still fine because it does not care about the value of the bool. The situation is entirely symmetric for !, and pattern that "touches" the uninhabited field is !.

Basically we only want to auto-never if there cannot ever be any valid data anywhere in the thing you are not matching. Matching a (u32, !) can yield valid data, and hence it does not get auto-nevered. It is still uninhbaited though so you can manually apply the never pattern:

match x.value {
  (_, !) // one arm has !, no code needed
}

See Uninhabited types are ZSTs despite partial initialization being allowed in safe code. · Issue #49298 · rust-lang/rust · GitHub for an example of how treating products with one uninhabited field as entirely uninhabited has caused trouble. That was a compiler bug, of course, but partial initialization is something we want unsafe code to be able to do.


This is not entirely simple -- how do you define "unsafe code"? Checking if the match itself is in an unsafe block is not enough, people often use very localized unsafe blocks. So you have to at least check the entire function. But maybe better check the entire module?

I also think it would be rather surprising to have unsafe blocks affect code generation at all.

Anyway, a nice aspect of this proposal is that we can start with conservative rules for auto-nevering and expand them later to auto-never in more cases, if this is really such a common case.

5 Likes

I think I maybe-sorta get what problem we are actually trying to address here now (w.r.t. partially initialized product types), which is new. I don’t really mind the proposed solution - at least it’d work in the way I would “naively” expect in most cases - though I don’t feel entirely convinced that the added complexity is justified…

Couldn’t/shouldn’t this, rather than being tied to uninhabited types specifically, be a general feature of pattern matching which lets you assert “and there are no more cases” any time you have previously covered all the possible ones? E.g. match my_bool { true => thing, false => other_thing, ! } would also be legal. Which feels really redundant, but not more so than when the number of other possible cases was zero, so. (Is this the same thing @canndrew was proposing in the comment @Centril linked?)

I don't see how allowing

match x { }

makes

match x {
    (v, _) => ...,
}

undefined behaviour. What's the connection here?

Yeah but... why? You can still write (v, _) => ... if you want, and we can allow this to be valid. But why shouldn't I be able to leave that branch off entirely in safe code (and in doing so, "touch" the uninhabited member of the tuple)? It would still raise a warning in unsafe code where that tuple member might be uninitialized.

Sorry if I'm being thick :confused:

It wasn't, but that's a good observation.

My mental model is, (T,!) or any structs contains uninhabited types should be uninhabited unconditionaly at least in safe code. Unsafe code can use it, but only non-UB when access only the initialized parts.

Originaly, type theory says that uninhabited types have negative infinite size, which effectively makes every struct types contains uninhabitied types to have the same negative infinite size. Although in fact ! was implemented as a zero size, I think we can still special case it such that it behave like this at least in safe code. I was further suggest that the alignment shall follow some simular rules so such structs will never have valid memory address.

So this means, writing x.value.0 itself require unsafe code block if x.value is uninhabited, because x.value does not even have a valid memory address.

(FWIW, AFAIA it doesn't say anything about this - negative infinity is just a choice which seems to make the logic work out)

1 Like

This is fully in accordance with everything I said -- so if you brought this up to disagree with me, please clarify where you see a disagreement :slight_smile: .

The problem is defining "access". We are left with a situation where the following is NOT an access:

let v = MaybeUninit::<!>::uninitialized();
match v.get_mut() {
  x => ...
}

This is just a let-binding. On the other hand, the following IS an access:

let v = MaybeUninit::<!>::uninitialized();
match v.get_mut() {
}

That seems rather strange, to make something access more by writing strictly less. I think by saying that the latter is actually sugar for

let v = MaybeUninit::<!>::uninitialized();
match v.get_mut() {
  &!
}

we have a much better way of talking about this, and something we can point at that does the access.

Never-patterns do not show up in type theory because pattern matching in type theory can do much less: It can do case distinction on an enum, done. No let-binding like what I showed above, no _ pattern to opt-out of accessing certain data, no & or ref mut patterns -- that's all Rust specific.

Originally, type theory does not have a concept of the "size" of a type... that is an implementation detail. But anyway, that discussion has little to do with this topic. :wink:

I assume you are saying we can make (u32, !) a ZST? We cannot. In fact we did and it was wrong. I linked to the issue above, here it is again: Uninhabited types are ZSTs despite partial initialization being allowed in safe code. · Issue #49298 · rust-lang/rust · GitHub.

1 Like

My point was this. I wanted it to be that if you write no arms at all, then the only valid pattern you could write would be _. But in this case:

unsafe {
    let x: Uninit<(u32, T)> = Uninit { uninit: () };
    x.value.0 = 22; // initialize first part of the tuple
    ...
    match x.value { } // look ma, no arms
}

you could write (_, !). However, I agree we could potentially finesse the situation via a lint in the same I proposed for &! -- originally, I thought that the lint would only apply to if we inserted a & pattern as part of the auto-never transformation.

I just want to point out that "unsafe code" is not a well-defined concept -- it's rather a sort of fuzzy concept. This is (in part) why I wanted it to be a lint.

Unsafe blocks are a precise concept. But unsafe code is larger than the blocks. The precise boundary though can be tricky to get your finger on.

1 Like

No I am not saying that. In fact, what I said was: keep what we have already doing today, but make sure in at least safe code, type check it as if (u32,!) is uninhabited and therefore do not mark anything like x.0 as safe if x is uninhabited even it can be partially initialized in unsafe code.

In this model, match { (v,_) => ... } on uninhabited tuple is also unsafe. What ever we are going to do in unsafe code is not my interest; I only interest in safe code.

If I recall, this was a point of disagreement for us before: I still prefer that, to the extent possible, "leaving off a branch" should not be considered a "significant" act by the user. Put another way, I expect the compiler to tell me when I'm missing branches -- but I can choose between

match x { (y, !) => .. }

which successfully matches and extracts data and

match x { }

which is UB to execute, then it seems like I am making a decision here. That means if I had a Uninit<!> type before but I refactor to Uninit<(T, !)> I may introduce bugs because there are cases that before were unreachable but now no longer are?

It sounds like a long shot, I agree, but it still makes me uncomfortable. I wonder if there are other cases where it could occur.

(For that matter, the &! case is tricky, because there auto-never is significant, but I think there is a difference in that there is no useful data I could extract out. The only pattern I could write, I believe, is _ -- since even a &_ pattern would assert that the pointer is valid, and it cannot be.)

In any case, as @RalfJung said, it's also true that we can start with a conservative auto-never and expand.

The entire point of the ! pattern as proposed is to say explicitly when we are "loading the discriminant of an empty type".

In a match like match y { &(ref x) => ... }, it is clear that we are not loading the discriminant of anything, we are just dereferencing a pointer and taking a reference again -- this is a NOP and should not be UB even if y: &!. In a match like match y { true => ..., false => ... } it is clear that we are matching on the discriminant of a boolean, so this better be UB if the discriminant is not valid for bool. With empty types, I think it is extremely useful to have something to say explicitly "and now load its discriminant", to have something to point to in the code that does this. Otherwise we are in a situation where it is really hard to explain under which circumstances which discriminant is accessed.

In fact, this has lead to bugs already: Matching on uninhabited unsafe places (union fields, raw pointer dereferences, etc.) allowed in safe code. · Issue #47412 · rust-lang/rust · GitHub was caused by the problem that there is no code that actually performs the "access" to an uninhabited type. We now have a crude hack to work around this. Had we used ! patterns, I think this would not have happened: ! patterns desugar to code that calls the MIR intrinsic for loading a discriminant of the empty type, which would have triggered the unsafety checker to actually see an access happening here.

Later phases of the pipeline can then translate "load discriminant of empty enum" to "unreachable".

Sorry, my second "it" referred to the transformed code. The first is UB, the second is not. I think that is really surprising, because just looking at the code the first does strictly less than the second. The magic is in what we didn't write, and being able to talk about it ("there is a (_, !) pattern automatically inserted") makes it easier to teach and easier to implement (see the bug I quoted above).


All of this is entirely orthogonal to the exact rules for auto-never. I am a bit confused in the discussion here about who is generally debating the usefulness of the ! pattern, and who just wants to see different rules for auto-never. The purpose of the post (as I see it) is mostly to establish that we should have a ! pattern and some rules. This gives us a clear road ahead in terms of defining when which data must be valid, it provides us with better terminology for discussion and I hope we can also adapt code generation the way I proposed above, to get rid of the horrible hack that we currently need to work around our lack of a ! pattern.

As usual, we'll probably want auto-never to be conservative at first, and then slowly expand its applicability where that seems useful. But first we need consensus that we even want the ! pattern and some auto-never mechanism.

I'd like to drill into this a bit more. I suppose I could imagine some generic adapters that convert from Result<T, E> to Result<T, (E, F)> or something -- in which case, if E is !, then you lose some ergonomics (you must now write (!, _) when matching).

I wonder if we could thread the needle a bit in another way as well -- e.g., if the value in question does not lie in a union field, then perhaps it must be initialized? This seems like something we could potentially know, but I'm not sure.

It's possible to define uninhabited non-ZST types, e.g.:

struct Endless<'a>(&'a mut Endless, &'a mut Endless);

Granted they will not generate trap instructions today, but it can be viewed as compiler just not being smart enough yet.

Regarding (u32, !) I agree that it should be unconditionally considered uninhabited type. Otherwise struct Foo { a: u32, b: ! } will be inhabited type as well, which will make me very uncomfortable.

Good point, we should focus. =)

But this is legal safe code on nightly:

let mut x : (u32, !);
x.0 = 4;

So, what you are saying already cannot work.

Do you mean unsafe, or undefined behavior...? Changing the rules for what is unsafe (i.e., for what requires an unsafe block) is not part of the discussion, and I do not see how it would help at all.
However, what is part of the discussion is what happens when you perform matches around unsafe code, where you might have fabricated a &(u32, !) or so.

Well then maybe you are at the wrong place here -- this discussion cannot be held without taking the considerations of unsafe code into account. It is an important part of Rust.

I disagree, this type is NOT uninhabited in the sense that "constructing an element of this type is instantaneous UB". Notice that &! is not uninhabited either.

Of course, these are just my personal views, just as your statement is your personal view -- the rules for validity of references are not settled yet. It also depends on your notion of "uninhabited".