Recent change to make exhaustiveness and uninhabited types play nicer together

Unreachable patterns should be a warning, and this change can go through because of that.

In fact, IIUC the exhaustiveness checker and/or the optimizer do assume that given #[repr(c)] enum E { A, B, C }, a value of type E may only have the values A, B, or C, which is known to be invalid as C's enum types are allowed to take values that aren't labeled, and #[repr(c)] enum will typically be auto-generated from C enums. See Match on repr(C) enum returned from C library with unknown value leads to UB ¡ Issue #36927 ¡ rust-lang/rust ¡ GitHub. Given that that is currently done, it seems totally reasonable for the exhaustiveness checker and optimizer to assume that !, a type that was specifically created to be uninhabited, doesn't have any values, since there's no reason to use ! in a program otherwise.

Regarding the semantics of pointers that don't point to objects, and references formed from such pointers and/or what happens when such things are deferenced, let's just recognize that it's all unspecified. Until it is specified and two independent implementations of the language agree, IMO it's prudent to try to avoid allowing programs to make references to uninhabited types, or at least avoid attributing any meaning to such programs.

this pattern is rather common and so it probably shouldn't be broken without a measurable benefit (e.g., better optimization capabilities).

Making the formal semantics of the language easier/possible to specify is a measurable benefit, and that's my goal with arguing that we should statically reject nonsense when possible.

2 Likes

While I understand the practical desire to support existing patterns, I agree with @briansmith that this isn’t properly defined semantics. After all, these C types are not uninhabited, merely external and opaque for rustc to reason about. It is semantically incorrect to denote them with uninhabited types in rust. Instead we should have a way to say to the compiler that the type is defined elsewhere. Something like an extern attribute.

2 Likes

It is true that this is currently unspecified. So are almost all finer details of how unsafe code should behave. Nailing down these details — specifying what is UB and what is fine — is precisely the reason for having the unsafe code guidelines effort. So saying “it’s not specified so let’s be conservative” is kind of a cop-out: Yes, it’s unspecified, that’s the problem we’re trying to solve here.

I could understand saying “we can always relax this again later so let’s be strict for now” though (despite not fully agreeing).

So, as the perpertrator of this, sorry for taking a while to respond. Luckily though, if I put off arguing for sanity long enough @glaebhoerl will usually step in and do it for me. So I'm not sure I have anything more to add, but I'll try..

First though,

I do think in retrospect we didn't spend enough energy evaluating the impact of this change. It'd be good to have phased it in more gently, at minimum by contacting crate owners. I think this is my fault as the reviewer of the PR for not double-checking that we had run crater and so forth.

This is also my fault for complaining and rushing you and not reminding you that we probably need to do a crater run first. So sorry for that.

I agree this is prudent. There doesn't seem much reason to rush this particular feature.

We should try and ease these changes in rather than throwing them at people, sure, but keep in mind that every month we drag our feet before making a breaking change to the language more code gets written that will eventually break. The bug for this had been open since Feb 2014. So I think another lesson to learn from this is to pay more attention to language inconsistencies and put a high priority on fixing them sooner rather than later. Cool new features and fulfilling the community's wishlist is great and all but we shouldn't leave backwards-incompatibility landmines sitting around for three years.

interestingly, we came to some conclusions that seem somewhat at odds with the current exhaustiveness checking changes.

I wasn't under the impression that we ever really reached a conclusion there. Or at least it didn't feel that way to me. But it was stupid of me to implement this the only way that made sense to me rather than try to finish the discussion. So I'm sorry for that aswell.

Speaking of making sense though...

Why would we ever want to say that &Void or &! are not uninhabited. I'm still not getting it. The whole point of reference types (as opposed to raw pointers) is that they always point to valid data. The fact that &T and T have the same cardinality is so fundamental to Rust that it's baked into the syntax. This is why we can write things like match bool_ref { &true => ..., &false => ... } without having an extra wildcard pattern. match void_ref {} is exactly the same thing. I get that the optimizer may want to treat &T as *const T but the typechecker doesn't, even in unsafe code.

The argument in favor of making &! inhabited seems to be that, at least in unsafe code, we don't want to assume that references really give the guarantees they say do or else we might break things by over-optimizing. But there's an important difference here between &bool and &!: Given a &bool in an unsafe block we can't assume that it's valid. But given an &! in an unsafe block we can assume that it's invalid. This kind of assumption is what allows us to say that a function which returns ! is diverging and a patttern match on ! is unfullfilable, rather than saying that we don't know "which" ! we have and so we shouldn't assume that code with a ! in scope is dead code. The same exact same logic applies to &!, Void and &Void - they're statically impossible, ergo their patterns are unfulfillable. In the case of a &bool in an unsafe context we don't really know what black magic the programmer has done around it so we need to be conservative to avoid breaking things. In the case of &! we know for a fact that the code is already dead or broken. Is there a subtly to this I'm not seeing?

I think we should consider trying to restore the old behavior around empty enums temporarily

The thing is - other than fixing those bugs - I can't see how we can roll this out any more painlessly. If we stick it behind a feature gate then, eventually, the feature will gate open and people will get the same warnings they're getting now. Is there any reason for rolling back or gating the PR other than some confusion around whether &Void is uninhabited?

11 Likes
  1. Document what people should replace enum $name {} with.
  2. Make a lint recommending against empty enums, set to "allow" by default.
  3. Encourage and/or assist people in getting their code to stop triggering the lint.
  4. Make the lint "warn" by default.
  5. Make the lint "deny" by default.
  6. Carry on with this good change.

Note that #1 may require an addition to the language to add support for defining opaque (struct) types that can't be instantiated in Rust but which may be instantiated in non-Rust code.

@briansmith Does pub struct $name(()); (that is, with a single private field preventing it from being instantiated outside the mod) not work for your purposes?

Consider Rust Playground

mod foo { #[repr(C)] pub struct Foo(()); }
use foo::Foo;

extern {
    fn foo_new() -> *mut Foo;
    fn foo_delete(x: *mut Foo) -> *mut Foo;
}

#[test]
fn test_foo_new_delete() {
   let foo = unsafe { foo_new() };
   unsafe { foo_delete(foo); }
}

For that, I get "warning: found non-foreign-function-safe member in struct marked #[repr(C)]: found Rust tuple type in foreign module; consider using a struct instead, #[warn(improper_ctypes)] on by default"

Now consider my workaround for that, Rust Playground

mod foo { #[repr(C)] pub struct Foo(u8); }
use foo::Foo;

extern {
    fn foo_new() -> *mut Foo;
    fn foo_delete(x: *mut Foo) -> *mut Foo;
}

#[test]
fn test_foo_new_delete() {
   let foo = unsafe { foo_new() };
   unsafe { foo_delete(foo); }
}

#[test]
fn test_foo_move() {
   let foo = unsafe { foo_new() };
   let foo = unsafe { &*foo };
   // The following isn't safe since the representation of `Foo` isn't really
   // a `u8` as specified above, tet the compiler allows it. It's not a fault
   // of the compiler; rather, we shouldn't have lied to the compiler about the
   // representation of `Foo`.
   let _ = *foo;
}

In particular, a value of an opaque type must not be moved since the Rust specification of the type is a lie, but such workarounds allow such values to be moved.

Thus I don't know of any workaround for the lack of opaque types that doesn't have some serious deficiency.

Document what people should replace enum $name {} with.

You mean for opaque types that live behind a pointer? This can work:

struct Opaque {
    _priv: !,
}

It's uninhabited, but it's uninhabitedness is private. You can use ! in C APIs aswell (since a function with a ! argument can't be called).

struct T { _t: &T }

It’s also uninhabited.

I found these threads on the issue tracker which seem potentially related:

https://github.com/rust-lang/rust/issues/27303

FWIW, I think both *const () and *const ! - and their user-defined equivalents - are equally valid as types for foreign pointers which should not be dereferenced in Rust, for opposite reasons. A pointer to an uninhabited type should never be dereferenced; a pointer to a unit type can be dereferenced any time, but it is a no-op to do so (it accesses 0 bits of memory). That's in theory, the improper_ctypes lint may modify the equation (though, tbh, it really shouldn't).

That's as long as it's not dereferenced and therefore also never converted to an & reference. If doing that is a requirement, then only the unit type approach seems sound. If improper_ctypes complains about that, then IMHO that should be fixed!

(Using a unit type also has the advantage that it won't blow your foot off if you "accidentally" dereference it - it's just a no-op.)

struct T { _t: &T }

It's also uninhabited.

That depends... does an (unsafely constructed) T that refers to itself count as a valid value? In the implementation I was conservative about this and said "yes"; that's what this line does.

But I just wrote some demo code to show that T isn't treated as uninhabited by the new matching code and instead it ICEd the compiler. So thanks for bringing this up.

That depends... does an (unsafely constructed) T that refers to itself count as a valid value?

It would interact very badly with Rust's aliasing semantics, since you're trying to take a shared reference and mutate the same object at once; I'm not sure if there's a way to do it without triggering UB.

You can construct it without unsafe code on nightly:

#![feature(static_recursion)]
struct T<'a> { _t: &'a T<'a> }
static ST: T<'static> = T { _t: &ST };

Also, while this doesn’t work…

let mut x: T;
x._t = &x;

…it might in the future. I think the borrow checker ought to accept the second line someday (connected to “borrowing for the future”), and while it’s not currently possible to use an originally uninitialized struct after filling in all the fields, that should be fixed.

1 Like

How can this possibly work? That object references itself, which means it cannot be moved without invalidating the reference.

The object is borrowed, obviously it can’t be moved until the borrow ends.

There's a PR here to fix those bugs and add some tests (@arielb1, @nikomatsakis).

So... what now? I know people want to hide the new match behaviour behind a feature-gate but given that all the new code does is raise extra warnings it seems like that would just be delaying the inevitable. Eventually the gate will be opened and people will be getting the same warnings they're getting now.

If the only reason to feature gate is that we don't want to commit to saying that &Void and &! are treated as uninhabitable by matches then I'd like to restate my position that: yes, we definitely do want to commit to that.

Edit:

then I'd like to restate my position that ...

By which I mean I'll implement it if y'all insist but I'd like yous to reconsider one more time first.

2 Likes

I still think we should hide the behavior around empty enums behind a gate. I've not had time to catch up on this thread (3-day weekend here, and i'm traveling now), but I continue to feel that the current semantics around inhabitedness are unclear, particularly in the face of unsafe code. The problem with considering &EmptyEnum to be uninhabited is that we are potentially forced to accept that interpretation forever. This is not true for !, which remains gated.

I am not saying that the type-checker should not trust itself. But imagine that there may exist values of type &Void in unsafe code. One could be created via uninitialized memory, for example. In that case, it does not trigger UB if you merely shuttle such a value around, but not to dereference it. Now imagine that there is some (unsafe) module that returns Ok(x) where x: &'static Void. Now some other code matches on that value, but it has no arm for Ok -- this will be considered a valid match, even though it is not exhaustive. So what happens at runtime when this arm is taken?

I guess it comes back to a deeper understanding of the scenarios where such theoretically uninhabited types may arise. While it seems silly to have types like &!, such types can indeed arise in generic code, which is what led to the previous thread.

In any case, I'm amenable to a conclusion that says that we should consider &Void uninhabited. I think that may be the correct decision. But I don't want to have this debate with a ticking clock in the background. I want to discuss it freely without the fear of stabilizing code that is permitted to treat &Void as uninhabited.

The "main" rule about inhabited types is that values must always contain valid values, which for empty enums etc. translates to valid values not existing.

The reason for that is to allow you to pass an &mut T reference pointing to uninitialized memory to an (unsafe) function without fear of "random" UB.

I think that having every reference cause UB if there is any memory "reachable" that is not of the right lifetime would confuse unsafe code authors.

Of course, we can keep the old rules about UB, but have let assert that enums are always come from the right variant - the "new" possibility of UB is not that bad, because let can already UB when dereferencing things.

Imagine there is some (unsafe) module that returns Ok(42) where 42: bool. Now some other code matches on that value but it has no arm for Ok(42) -- this will be considered a valid match, even though it is not exhaustive. So what happens at runtime when this arm is taken?

We don't want UB so clearly we need to change rust to make this code invalid:

match my_bool {
    true => ... ,
    false => ... ,
}

We need another branch to handle the wildcard pattern in case someone hands us an uninitialized bool, right?

match my_bool {
    true => ... ,
    false => ... ,
    _ => ... ,
}

I mean, this is the exact same logic.

When you have an empty match on a &Void you are implicitly dereferencing it and writing out an arm for each of it's 0 variants. You can also not dereference it and write a match arm for Ok(x) where x: &'static Void if you like - you'll just get an unreachable code warning.

Okay. I'll write a PR. :weary:

2 Likes