Blog post: never patterns, exhaustive matching, and uninhabited types

Also the latter (pattern ! matching type (u32, !)) shouldn't compile -- at least in my view, ! as a pattern should only match an empty enum or the ! type (potentially applying the usual match ergonomics auto (de)ref), not any type that would otherwise be auto-nevered.

Yes, thanks for the correction. =)

Originally, I had in mind that the "autonever lint" would fire only when autonever generated & patterns. I agree that if we expand the lint to fire when it generates _ patterns as well (so that it fires for (_, !)) -- or perhaps for any use of autonever at all -- then this may be good enough.

I am teetering on the edge of being in favor of the more aggressive def'n of auto-never. What would definitely push me over the edge would allowing autonever to generate (_, !) (or some such pattern) would make some safe code more ergonomic. I can imagine such scenarios in the abstract, for sure, but I don't have one in mind that is as concrete as the as_ref example is.

(I agree, I was going to point out the same thing.)

A few things:

First, I think that as long as we are using explicit ! patterns, there is no problem. As @RalfJung pointed out, the second example -- matching a (T, !) with a ! pattern -- would not compile. This is why -- to me -- the idea of using lints makes a difference -- it points people in that direction.

If we do not have a lint, though, then people might just write match x { } -- and then I would still think there is some potential danger. That is, I think that this refactoring could have introduced a bug that was not there before.

In particular, in your example where we started with this:

let x: Uninit<!> = ...;

... // do some stuff

match x.value { }

and then refactored x to have the type Uninit<(u32, !)>, I would assume that we have also made changes to the ... ("do some stuff") part of the program. And those changes may mean that the match x.value { } match is now reachable when it was not before -- and (partially) initialized.

I would just give some thought about what should we do for the uninhabited types.

Technically, we have a way to define "strong uninhabited" recursively:

  1. Empty enums and ! are uninhabited.
  2. Enums contains only uninhabited variants are uninhabited.
  3. Structs contains at least one uninhabited field are uninhabited.

&!, in this definition, is NOT strongly uninhabited; but as like the example of struct T<'a>(&'a mut T<'a>,&'a mut T<'a>), it is de-facto uninhabited. The difference here is, those types can be created "uninitialized" and have a well behaved size. This is different from the strongly uninhibited types above: they don't have a well defined size according to the "negative infinity" rule.

I need to clarify, that the "size" I mean here is NOT the size we will send to LLVM IR, it is something only in the mental model, and the rust code without unsafe blocks will be legally assume. So in unsafe code blocks strongly uninhabited types can be ZST (not necessory), but in safe code, they will not have a valid size.

To make it clear we can image the following intrinsic function:

// in std::intrinsics
fn safe_size_of<T>() -> Option<usize>;

// in user code
assert!(safe_size_of::<!>().is_none());
assert_eq!(safe_size_of::<&!>(), Some(size_of::<&u32>()));
// theoretically size_of::<&!>() can be 0, because of optimization
// so this may fail:
assert_eq!(safe_size_of::<&!>(), Some(size_of::<&!>()));
//Optimization may decrease the size, but not increase
assert!(safe_size_of::<MyInhabitedType>().unwrap()>=size_of::<MyInhabitedType>()))

In this theory, if we add the ! pattern (which I would vote for), a match { ! } should match all strongly uninhabited types, don't have to write match { !,! } on Result<!,!> or match { (_,!) } on (i32,!). Furthermore, I would expect if all inhabited variants are covered, a single ! at the end of the match pattern will be enough to tell the compiler "no more variants are reachable", for example match { NotReady => .., ! } on AsyncResult<!,!>.

In this point of view, both Uninit<!> and Uninit<(u32,!)> are strongly uninhabited, so we should never write things like let x: Uninit<!> = ... in safe code, because it does not have a good theoretical size. Instead, if we ever need to bind a Uninit value we can write Uninit<&!> instead, which will then require an explicit match { &! } pattern.

But unfortunately, today's stable rust allows partial initialize on strong uninhabited types. I strongly believe we should follow the "lint, deprecate, deny-by-default, disable" path to not allow any access to strong uninhabited type fields. Because, I couldn't see any value of enabling this hazard operation. Is it help writing better code? Seems no. Is it what normally programmers will do? No at least for me. On the other hand, I can see how easy for new users to incidentally use empty enums when they actually wanted an empty struct.

I am also happy to enforce we always using explicit ! patterns.

Now look at niko's example,

There is a quick fix of the above:

enum Void { }

fn main() {
    let mut x: (u32, &Void);
    x.0 = 22;
}

By change the type from (u32,Void) to (u32,&Void), we can do whatever we can already before, without violate the rules I introduced here. Which means, we may be able to write cargo fix scripts like this on existing code, to minimize breakage.

Interesting follow up questions

The above formulation seems to imply a very different view of &! in Rust. By definition ! have zero possible values, so it does not have a valid size. Now &! have 1 possible values: being uninitialized, so its natural size is 0, this is why I add a comment in code above to say that sizeof::<&!>() can be optimized to 0.

But this means &! is isomorphic to ()! This is interesting and lead me to think what things like &&! means. With the same logic this would be isomorphic to a type that have two values: uninitialized, or initialized to an uninitialized uninhabited value, so this is isomorphic to bool!

So we can reduce the type system with only ! and reference types?

2 Likes

This can happen in generic code: using a struct (i32, T), we can do partial initialization. Later, T may turn out to be !.

So your proposal would exclude uninhabited types from being used as generic arguments in many cases. I think this also answers your question about the "value of enabling this": It's a matter of uniformity. We allow it for general types, so we should allow it for uninhabited types in particular.

Note that there is no hazard here, the compiler makes sure that you are not actually creating an element of an uninhabited type.

2 Likes

If this happened, the previously inhabited struct is now uninhabited. So the partial initialization path should now must be in an unreachable path by simply apply the rules of "strongly uninhabited" (the type inference engine concluded some uninhabited values as the input), and so the compiler ignores this path to allow the code that do partial initialization. If this is not the case, of cause the user should get an compile error.

For example, if a piece of code is manupulating a value of type (i32,T), this value must come fome somewhere else. If the calling code passes a (i32,!), the compiler immidiately sees this is strongly uninhabited, so it conclues this call will never happen. So monophization can ignore this function altogether, ensuring partial initialization can never happen in (strongly) uninhabited types.

To prove of this can be done in a sound way, remember the size_of is implementation detail. So imagine we ensure safe_size_of is the real size of types, or if it returns None, usize::MAX is used. Then in the code gen, as the compiler need to calculate local offsets, it will inevidablely run into trouble and results in error when partial initialization on strongly uninhabited types happens. So there must be a way to detect this in compile time. In such a case, generic codes should not be blamed as they supposed to work on all types with suitable sizes. So the problem must be in the code that feeds the generic code with abnormaly sized types.

I really want to see a real world good use case of this. For me, it is really wierd to have ! in structs and do partial initializations. Even if we do want this, &! can play as a workaround, as mentioned.

But the point is that partial initialization on uninhabited types creates challenges right now, and we are talking about it.

fn silly<T, U, FT: Fn() -> T, FU: Fn() -> U>(t: FT, u: FU) -> (T, U) {
    let tu: (T, U);
    tu.0 = t();
    tu.1 = u();
    // tu // error[E0381]: use of possibly uninitialized variable: `tu` - is this accurate?
    (tu.0, tu.1)
}

#[derive(Debug)]
enum Never {}

fn main() {
    println!("{:?}", silly::<u32, u32, _, _>(|| 5, || 10));
    println!("{:?}", silly::<u32, Never, _, _>(|| 5, || panic!()));
}

This is valid code today on nightly edition 2018 with no other feature flags (just NLL). Are you trying to suggest that partial initialization of any generic type should not be allowed? Because any generic type can be uninhabited if it always diverges to obtain a value.

This doesn’t show a use case of partial initialization, but it at least shows that forbidding it would be surprising.

Or is this actually fine because of requiring referring to the individual places in tu individually for some reason? That would also be interesting behavior to have to maintain, because currently my intuition says that it’s a bug that tu isn’t considered initialized yet.

Also, this works fine if we use a concrete inhabited type in the call to silly() instead of a type isomorphic to !, since ! coerces to any type. If that’s valid and using ! isn’t I’d be even more surprised.

My proposal suggest to blame this

println!("{:?}", silly::<u32, Never, _, _>(|| 5, || panic!()));

As it attempt to instance a generic function that do partial initialization. It also suggest a workaround to fix:

println!("{:?}", silly::<u32, &Never, _, _>(|| 5, || panic!()));

This allow to keep the semantic the same.

In another post I suggested to have trait bound T: Into<!> for this purpose. So Into<!> will be implemented for all empty enums, and types like &!.

After reading the responses I come up with an modification on my previous proposal. It is not yet a well formed RFC. I will try to formalize this into a proper pre-RFC tonight.

Definition

A type is Maximum Sized Type (MST) if and only if either of the following:

  1. It is !

  2. It is an non-empty enum and all variants are MST

  3. It is a struct that contains MST fields

  4. It is a union that contains MST fields

Rules

MSTs does not have a valid size. In monorphization they cannot have valid offsets and addresses to its internal, so partial initialization of MSTs are strictly forbidden. This pressure does not apply to the generic functions itself; instead, the rule is that user can only send non-MST generic parameters to functions that will do partial initialization.

MST is a strict subset of uninhabited types. Other uninhabited types, like the empty enum, do have valid size and so can be part of a type with partial initialization allowed.

Pros and Cons

  • As the empty enum is now excluded from MST, partial initilzation like
enum Never{};
let v:(u32,Never);
v.0=1;

is allowed, as in the current stable. It also consistent with the implementation that Never is ZST. So no more incompatibilities. Everything new is the ! type, which is unstable.

  • However, as a trade off the definition of MST have to be worded carefully to exclude the empty enum, which makes it less beautiful. Also the distinguish between ! and empty enum seems arbitrary.

Observations and reasonal

  • The restriction on MST is very similar to the rules we already have with DSTs. We simply cannot do certain things if Sized is not implied. And yet the workaround is similar too: add an indirection & will make it useable again. So now we need to extend the idea to MSTs.

  • The name of MST is from the fact that if a type T can coerce (or just transmute without unconditional UB) to U, the size of T must be greater or equal to U. So in this definition, as ! can coerce to any type, its size must be greater than any types.

  • The rule that empty enum is not MST can be justified as it is not able to coerce empty enum to any other types automatically. And if empty enums are ZST as it is today, transmute to another non-ZST must unconditionally UB. In fact user have to write a match block to do so safely, and this can be seen as an indirection operation, like deferencing a reference.

  • Unions can be think of just another way to do transmute. So if a union can coerce to its MST field types, it must itself be MST.

  • This idea is surprisingly consistent. For example, with the “negative infinity” point of view we cannot use usize::Max without special arithematics. However, if empty enum is not MST, then the rule “enums/unions picks the maximum size, structs add all sizes together (cap at usize::MAX)” fits well with usize::Max model. EDIT: ignore this; I know there is always some inconsistency as the rules are different in enums and in unions.

Unresolved issues

  • We can do the following:
let v:Result<!,!>;
match v { ! }
let v: (i32,!);
match v { ! };
let v: AsyncResult<!,!>;
match v { NotReady => {}, ! };
let v: &!;
match v { &! }

But how do we want to justify the following:

enum Never{};
let v: Never;
match v {!};

Is it just a special rule? This breaks the assumption that ! in match arms means match to MST.

  • Do we need a trait to claim “this type is sized and not MST” so we can write code for explicit partial initialization? How should we extend the DST rules to partial initialization?

  • Are there a better name than MST? after all, the actual size is unimportant; we only need to make sure code that involves MST (after monorphization) never being generated.

This is a fresh take based on just the current nightly 2018 edition behavior, (what I recall of) this blog post, and (what I recall of) the discussion around an access-based model.


I think the important thing to note is that types like (u32, !) are very unlikely to show up in non-generic code. Thus, the most important thing when considering how the concrete type should be when its used in a generic context, which is where it’s likely to be.

We want the ! to effectively make any code after it dead after it exists. We need it to not change anything before it exists. And we want to allow partial initialization to work in generic code, as that is something people are currently doing and we have no reason to break. Furthermore, we want to allow, by methodology of !-pattern elaboration, to allow matchs to ignore !-dead enum variants.

To start with, let us say that a match statement contains an automatically-inserted ! pattern which will catch any “strongly uninhabited” pattern as unreachable. I will define this term along with what it accesses and means for code as we go along.

The trivial base case is the ! type and the empty enum – the types that you can write an empty match over today.

let never = panic!();
match never { }

// elaborated

let never: ! = panic!();
match never {
    !,
}

The obvious generic case is enum Either<Left, Right> { Left(Left), Right(Right) }. We want to be able to use Either<_, !> to delete the Either::Right variant.

let x = either(|| 10, || panic!());
let Either::Left(x) = x;

// elaborated

let x: Either<i32, !> = either::<i32, !>(|| -> i32 { 10 }, || -> ! { panic!() });
let x: i32 = match x {
    Either::Left(x) => x,
    !,
};

From this behavior we derive that a newtype variant around a strongly uninhabited type must also be strongly uninhabited. If a newtype variant is just a 1-tuple variant, then that would also imply that n-tuple variants containing a ! should be strongly uninhabited, but let’s treat the newtype as special currently.

Here the ! pattern destructures the 1-tuple variant into a strongly uninhabited type. As this cannot exist, that branch is dead code.

Nothing so far should be contentious. This is all strongly desired behavior from the ! type.


Now let’s consider the common pattern of *mut ! to fill the same use case as C void*. Obviously, moving forward an extern type should be preferred – an extern type is not considered uninhabited, just unintrospectable by Rust code. However, *mut ! is not problematic – a pointer can be null or dangling and Rust doesn’t care. Rust only cares about the validity of a raw pointer when you dereference it. Dereferencing a *mut ! is unsafe and instant UB, as it produces a strongly uninhabited type.

But what about &!? The trivial types-as-contracts position is that &! should be strongly uninhabited as well – as there cannot be a valid !, there cannot be a valid reference to !. However, the access-based model that we’re leaning towards currently says that the validity of the reference only matters when it is used. (Note that passing it into a safe function definitely counts as a use, as does returning it from any safe function.)

But how much does this matter in practice? In safe code, it should be impossible to get to a reference-to-!, as that would require having a ! in the first place. This author is of the position that &! can be considered strongly uninhabited safely – unsafe code authors should use a pointer type or extern types. We should probably lint against creating uninhabited enums anyway, and suggest using ! or an extern type when possible.

The beneficial result of this classification is that Either<&_, &!> continues to have it’s never side deleted.

let x = either(|| 10, || panic!());
let x = x.as_ref();
let Either::Left(x) = x;

// elaborated

let x: Either<i32, !> = either::<i32, !>(|| -> i32 { 10 }, || -> i32 { panic!() });
let x: Either<&i32, &!> = x.as_ref();
let x: &i32 = match x {
    Either::Left(x) => x,
    !,
};

This can, however be accomplished without deciding that &! is definitely strongly uninhabited – by relying on match binding modes!


At this point I feel like I’ve just restated the blog post, honestly. (I’m a bit too tired to be thinking this hard.) I think !-elaboration is a good way to talk about how the compiler “catches” the unreachable patterns, and that the auto-never rules can describe which patterns are considered strongly uninhabited.

I think it’s desirable that a struct containing a ! field should be considered strongly uninhabited. Consider a routine that provides additional error information:

type ParseResult<T, E> = Result<T, ParseError<E>>;
struct ParseError<E> {
    cause: E,
    location: (u32, u32),
}

I’d still want to be able to use ! to delete the error case. If !-uninhabitedness passes through struct (and struct variant) membership, I’d expect the same from tuple (and tuple variant) membership.

I’ve reasoned myself into a corner and I’m not sure what the best way out is anymore. Considering these uninhabited and allowing !-elaboration to catch and delete these arms feels right. Probably any ! elaboration should raise a lint “in and around” unsafe blocks.

Whatever the solution is, it shouldn’t rely on leaking implementation details of the generic function, however. Whether the function does its internal workings via piecewise initialization or separate stack values put together at the end of the function should not matter for what the generic arguments are allowed to be.

Acquiring the ! is the point of UB where code can start getting optimized out. The question is what are the requirements to observe that the ! exists, and thus invoke UB and optimize out code.

The most important part of a (pre-)RFC, motivation, is missing for me. It's not only unclear to me why we should care to prohibit partial initialization when it involves uninhabited types, it's especially unclear why this proposal doesn't actually do it for all uninhabited types but only for those rooted in !. Put differently: whatever problem this restriction is intended to solve, surely the same problem can be encountered with enum Never {}?

Beyond that, another fundamental issue with this proposal is that it introduces monomorphization-time errors. These are quite problematic, making it hard to reason about and solve compile errors locally, and in many cases including this one being a hazard to semver because refactorings of a library's internals can trigger the error in downstream code that was previously fine. For these and other reasons, the language goes to great lengths to avoid them in other places.

4 Likes

Since pattern matching also applies to let, would that mean that the following would become valid Rust?

fn foo() -> ! {
    panic!()
}

fn main() {
    let ! = foo(); // <-- oh my
}

What are the implications of such a syntactic construct?

That’s saying that any code after the let ! = foo(); statement is guaranteed unreachable (because foo() never returns). Just the same as if you have

fn main() {
    foo();
}

but explicit to a reader of the code without knowing foo's signature.

Another place the never pattern could be useful is in error conversion as I mentioned a while ago, you can be more type safe when declaring potential error branches unreachable:

let foo: Result<String, (!, i32)> = Ok("hello".to_owned());
let bar: Result<String, String> = foo.map_err(|(!, _)| unreachable!());

or if it allows expression-less closures when it appears in the arguments (not sure if the grammar would support it though)

let bar: Result<String, String> = foo.map_err(|(!, _)|);

Interesting! It might be worth looking at why Agda made a similar design decision, There, for nested uninhabited types, you write things like:

   case x of
      Ok(y) -> f(y)
      Err(())

Agda has to do this because of non-trivial computation in type expressions, but it may be that Rust associated types already introduce enough type-level computation to make this necessary.

I had no idea Agda does something like this! I did a brief google search but couldn’t find out how they call this; do you have a pointer to the reference or so?

Agda’s pattern matching is also much more powerful than what you usually have in type theory, so they might want it for reasons somewhat similar to us.

Having finally written that blog post on validity of values, I can now clarify why I think there are multiple notions of “uninhabited”: Some types do not have any valid inhabitants – they are as uninhabited as it gets. Examples are ! or any empty enum, but also (i32, !).

Other types do not have any safe inhabitants, but they might have valid inhabitants. An example of such a type is

struct Recursive<'a>(&'a mut Recursive<'a>);

According to my definition, Recursive(MaybeUninit::<Recursive>::get_mut()) is valid for Recursive but not safe – there is no safe inhabitant.

4 Likes

So this keep me thinking what &&! means.

&! does have a valid inhabitants, but no safe inhabitant.

&&! on the other hand, should have a safe inhabitant that point to a valid but unsafe inhabitant.

Is it correct?

No, &&! has no safe inhabitant either.

Remember, for a value to be safe means that safe code can do arbitrary things with it without causing UB. Safe code can dereference a &&! twice, so that type cannot have a safe value.

&T is only safe if it points to something safe.

2 Likes