Recent change to make exhaustiveness and uninhabited types play nicer together

So the recent change in PR 38069 to make the exhaustiveness checking consider uninhabited types specially caused a lot fallout. A lot of this is from crates that use #[deny(warnings)], and hence which are now failing with arms that used to be considered inhabited being considered uninhabited, but some of it is not. Lint failures are not in and of themselves worrisome, but I am a bit worried that we have changed the semantics of code in unacceptable ways. Here is a list of the issues opened tallying regressions that I am aware of:

  • https://github.com/rust-lang/rust/issues/38889 – Subtle breaking change with match patterns on uninhabited types
    • Points out that empty enums were being used intentionally to make “placeholder” enum variants, but these placeholders are now recognized as impossible.
    • There exists, I think, a better pattern that could be used here.
  • https://github.com/rust-lang/rust/issues/38969 – ICE in empty-0.0.4, Rust 1.16, unreachable for-loop pattern
    • A problem because it is a hard error for a for loop pattern to be unreachable. This is fixable and should not be a hard error anyway, in my opinion.
  • https://github.com/rust-lang/rust/issues/38972 – Irrefutable while-let pattern in log4rs-rolling-file-0.2.0, Rust 1.16
    • A problem because it is a hard error for a while let pattern to be irrefutable. This is fixable and should not be a hard error anyway, in my opinion.
  • https://github.com/rust-lang/rust/issues/38977 – Unreachable expression in guard-0.3.1, Rust 1.15
    • A problem only because the code is used in macros that gets exported to clients, which may have #[deny(warnings)]. This seems like a failure of cap-lints more than anything else, particularly since this crate could benefit from the feature in question. However, it does seem to be hard for @durka to find a formulation that achieves their goal of guaranteeing divergence and works across all versions of rustc.
  • https://github.com/rust-lang/rust/issues/38975 – Unreachable expression in void-1.0.2, Rust 1.16
    • This doesn’t seem like a problem. It’s a contained lint error that won’t affect clients, and the existence of this crate can be considered a feature request for precisely the changes we have made. =)

Reviewing this list, I think there are two bugs, and we should probably just fix those, but most of the remaining impact seems all right. It’d be nice though to find a way for @durka to express the pattern they are looking for that checks without warnings on stable stable/nightly, I guess?

I do think in retrospect we didn’t spend enough energy evaluating the impact of this change. It’d be good to have phased it in more gently, at minimum by contacting crate owners. I think this is my fault as the reviewer of the PR for not double-checking that we had run crater and so forth.

That said, there is one thing that I am worried about. Some will recall this prior discussion about the best way to think about uninhabited types and mem::uninitialized. In that discussion, interestingly, we came to some conclusions that seem somewhat at odds with the current exhaustiveness checking changes. In particular, in that discussion, we talked about whether it ever made sense to have a value of type &!, and under what conditions that could be considered UB. The general consensus there was that it only became UB if the type was dereferenced – which implies that unsafe code CAN create values of type &! (say) so long as that reference never escapes to safe code and is never dereferenced by the unsafe code.

However, in nightly today, &! is considered uninhabited (as is &Void), which means that code like this typechecks:

enum Void { }
fn foo(x: Result<i32, &Void>) {
    let Ok(x) = x;
}

This seems wrong to me given the tentative outcome of the unsafe-code-guidelines discussion discussion above, no? It also seems to be a backwards incompatibility hazard for crates that were using *const Void to represent void* pointers, even if they ought not to have been doing that. (But, given the outcome of the UCG discussion, this doesn’t seem wrong, though it’s not what I would do.)

cc @canndrew @arielb1 @eddyb @ubsan @brson

4 Likes

Thanks for putting these issues in perspective @nikomatsakis. It does look to me like there’s more legwork to do here to smooth over the transition. Whatever the outcome, please let’s make it happen before beta branches on the 31st (I think).

@nikomatsakis I think these might be two separate questions which are being conflated. I suspect that &! should be considered potentially inhabited to the optimizer (and, under a tootsie pop model, only in the vicinity of unsafe blocks!), but not to the typechecker.

I understand why &! and &Void are uninhabited, but why does that create problems for *const Void? Aren’t pointers allowed to have arbitrary/impossible values?

In a meeting now, so writing hastily:

But I’m having some second thoughts. My feeling is that this experimentation with ! and inhabitedness is – in a sense! – going great. We’re encountering a lot of interesting questions. But I feel like we haven’t found the final answers yet, and I am wary that we are pushing this process forward a bit too chaotically.

I think we should consider trying to restore the old behavior around empty enums temporarily. The idea would be that true inhabitedness purely derives from ! for now (which is gated). Then we can get the semantics how we want them for ! and – when we feel ready – enable them for empty enums.

In the meantime, we can start doing warning periods around empty enums for things and patterns we expect to change.

But I am very wary that we will (e.g.) accept some matches now (perhaps some that use Result<(), &Void>) that we would not want to accept later.

I think you are right that there is an interaction with unsafe code here, but I disagree that the type-checker should consider types like &! uninhabited. Perhaps it is the case (depending on what ultimate rules we decided upon) that the type-checker would consider a type like &! uninhabited but only in safe code, however. The interaction of all of this reasoning with the unsafe code guidelines is all the more reason, I think, to try and isolate it to !, so that we can experiment until we are happy, and not consider empty enums to be uninhabited yet until we know what we want.

So actually *const Void seems to be inarguably inhabited (by null). I should probably have written &Void in my example.

I agree this is prudent. There doesn't seem much reason to rush this particular feature.

After discussing in the compiler-team meeting, we were thinking that the way to do this is to have both the old + new exhaustiveness checking code, and to use the never_type feature gate to decide which one we use.

I don't understand how, even in unsafe code, &! could be inhabited. When converting from *const ! to &'a !, aren't you promising to the compiler that *const ! is a valid pointer to a ! for all of 'a? I don't see how that could be possible if ! is uninhabited.

1 Like

Well, are you? That question has not yet been conclusively answered. Certainly, if you pass a value of type &! to an unknown external function, you are promising something along these lines. But if the value is only passed from one internal function to another, maybe no promises are made.

What would be the practical use to defining &T as anything other than a pointer to a value of type T? If the pointer may not currently be valid, why not keep it as *const T? It seems to me that there are also a number of possible optimizations that would be broken by relaxing this definition.

@nikomatsakis So, as I just remembered - the thing with the memory model / unsafe code guidelines was that: for any type T, &T may not be assumed to always refer to a valid instance of T [by the optimizer, in the vicinity of unsafe]. &! is just an instantiation that brings it into particularly stark relief, because ! otherwise happens to have no valid instances. But the logic itself is uniform across all types.

Should then the type system also require me to add an extra wildcard _ => {} arm to this match?

fn example(arg: &bool) {
    match arg {
        &true => {},
        &false => {}
    }
}

After all, if &bool cannot be assumed to refer to a valid instance of bool, then that match is not really exhaustive.

And the logic for &! is exactly the same, the only difference is the number of valid instances the type does have, which is a free parameter of the whole question.

(I had been hoping to try to elucidate why it feels absurd to me to require the typechecker to follow the same rules here as the optimizer, but failing that for now, I hope this example at least shows that something is not quite right with that assumption.)

5 Likes

Maybe the more general point is this: the purpose of the memory model / unsafe code guidelines is to figure out when the optimizer can and cannot trust the type system. Taking the answers to that question and then applying them to the type system, rather than the optimizer, seems like it gets things mixed up.

2 Likes

I think that the type-checker and optimizer are certainly connected. If you are in unsafe code, after all, it is presumably possible (and legal) to have an Err(x) value (where x: &!), but in safe code that should never happen (because only unsafe code could construct such a value, and it ought not to have released it to you). Put another way, there are some circumstances in which the optimizer would be able to figure out that it would be UB if the Err were inhabited, and hence it can assume that it is not; in some subset of those circumstances, we might make the typechecker reason in the same fashion. I say subset because presumably the typechecker rules should be conservative and easier to explain.

It's better to just make it impossible to construct a value of type &!. There are no values of type ! so any *const ! or *mut ! must be NULL and so attempting to make a &! is statically detectable to be nonsense. (This is clear if *const T is defined to be isomorphic to Option<&T> and *mut T is defined to be isomorphic to Option<&mut T>.)

My code does do enum Foo {} to define types that are opaque to Rust but not C (that is, C code can create an object of type Foo, return a pointer/reference to a heap-allocated Foo, free such an object given a pointer/reference, and inspect and manipulate its internals, but Rust cannot do anything except pass a pointer/reference to such a thing to C code). However, I think it is much more important for the type system to be logically sound, and I agree enum Foo {} is logically uninhabited. I would be happy to immediately change my code to stop using this pattern and I encourage other people to do the same, in order to help the language team give Rust reasonable semantics.

The only question I have is this: What exactly should wrappers around C libraries replace enum Foo {} with?

1 Like

I don't think this relationship holds, given that nonsense like 42 as *const T is safe. Pointers can be anything, so I think even *const ! must be allowed to be non-NULL.

3 Likes

It depends on how the language is ultimately defined, especially regarding what happens when as_ref() or as &T or as &mut T is applied to a pointer that doesn't point to an object (of the right type). I would hope that we will be able to define at least a proper subset of Rust that operates only on references that actually refer to objects, i.e. a subset that doesn't contain as_ref() or as &T or as &mut T, or a subset that doesn't contain pointers at all.

Also, a Rust compiler could reasonably reject 42 as *const ! and 42 as *mut ! statically, since it knows there's no way an object of type ! can exist at address 42 by the definition of type !. Similarly, we could statically reject any extern function that returns *const ! or *mut ! since it isn't useful (AFAICT) to have an extern function that always returns a null pointer to an uninhabited type, and it wouldn't make sense for such a function to return anything other than null. More generally, we could say that by definition the only way to construct a *const ! or *mut ! is via core::ptr::null() or core::ptr::null_mut(), respectively.

Not only are arbitrary non-NULL raw pointers valid as @cuviper said (you can even construct them in safe code!), they also don’t carry the same aliasing guarantees as Option<&T> and Option<&mut T> even when they actually refer to valid values! Life is easier if you just accept that raw pointers have no inherent meaning and only have some limited obligations if they are actually dereferenced.

But let’s turn back to the question of &!. As weird as it feels to ever have a value of that type, the various partial proposals for unsafe code guidelines already imply that, in the presence of unsafe code, a reference may not carry all the same implications as a reference normally does. For example, several proposals care only about whether illegal memory accesses (for varying models of legality) actually happen at run time, so that a mutably aliased or dangling or otherwise “illegal” &T may very well exist, as long as it is not used.

The other question is whether it is useful to try and make provisions for unsafe code turning raw pointers to ! into references. The use of empty enums for opaque types would be one reason to do so. While it’s probably unavoidable that some currently existing code will be ruled UB by whatever rules are eventually adopted, this pattern is rather common and so it probably shouldn’t be broken without a measurable benefit (e.g., better optimization capabilities).

2 Likes

Yes, because "normally" the optimizer can trust the results of the typechecker, and optimize based on them.

Right - in unsafe code, the optimizer can't always trust the typechecker, so although it should be able to assume that &! is unreachable, it refrains from doing so.

This feels backwards. The typechecker should always be able to reason that &! is uninhabited - that follows directly from the definition of those types. The optimizer not trusting the conclusions of the typechecker is an unfortunate (if highly prudent!) concession to practical reality. The typechecker not trusting itself seems plain insanity.

Again, I want to emphasize that the real question here is not "can the typechecker conclude that &! is uninhabited?", it's "when can the typechecker conclude that a match on &T is exhaustive?". And right now we say that it's exhaustive just as long as, inside the & pattern, it matches T itself exhaustively. Now, ! is not magical in this respect. It's just a type that happens to have two fewer inhabitants than bool, and one less than (). If a match on &bool needs two (non-wildcard) arms to be exhaustive and &() needs one, then &! should require zero. What is the motivation for treating it inconsistently? I contend there is none.

Also again, the optimizer not trusting the validity of & values is not in any way specific to &! either - it is also for all &T. So that isn't a motivation. If we wanted to reflect this in the type system, we would also have to say that a match on &bool with &true and &false patterns is not exhaustive, which is plainly silly (along with backwards incompatible).

[I feel like we're not getting through to each other here, but don't know why?]

11 Likes