Recent change to make exhaustiveness and uninhabited types play nicer together

extern type Foo;

Thinking about this for the last 10 minutes: I really like this idea and I think we should add it right away.

I don’t see how it’s different from a zero-sized struct with a private field.

It reflects what’s actually going on rather than being a hack.

2 Likes

I’d guess it’d be treated somewhat like a DST, so it can’t be created, returned, moved, size_of’d, etc, even inside the declaring module.

3 Likes

Exactly, you shouldn’t be able to ptr::read() on an FFI type for example. But you can with @arielb1’s FFIType.

Edit[0]: As I see it, it would just be an opaque type which is ?Sized but which still has thin pointers.

Edit[1]: Although… with all the other DSTs we can at least calculate their size at runtime. With an extern type we couldn’t even do that. I dunno how much of a problem this would be. You’re supposed to be able to stick DSTs at the end of structs for example.

2 Likes

Yes, see the custom DST RFC. We need to split the trait so there’s “referent types” and actual DSTs (with {size, align}_of_val) as a subtrait. Opaque C types and C’s void-in-void*(plain void is unrelated and () ahhh!!!) would use this.

So I basically agree with @glaebhoerl here, and think bool as an analogue is very compelling.

I'd like to thank @arielb1 for addressing the bool case:

Our previous consensus was that values must always have legal values, so Ok(42): Option is not possible to have even in unsafe code without UB, but say Ok(&42): Option<&bool> is possible, and matching on it results in UB.

Loosely inspired by Agda's absurd patterns, we can actually make a coherent policy out this by forcing one to get the value in a "match arm stub", e.g.:

match void_ref_result {
    Err(e) => ... ,
    Ok(&_), // no arm needed
}

This would add clarity when the uninhabited type is deeply nested. I'd consider this an excellent compromise for everywhere, or convent sugar for unsafe code for

match void_ref_result {
    Err(e) => ... ,
    Ok(&r) => match r { },
}

Where one relies on UB to remove the extra branching instead of being explicit.

There is no way to call this function without triggering undefined behavior. Therefore it is better to make it impossible to define this function, statically. Note that this is true even without the match:

enum X { }
fn x(a: &X) { }

Regarding the match, the only clearly reasonable bodies of a match on such values is the empty body, which is a no-op. Therefore, it makes sense for the compiler to statically reject such matches too.

This would be clearer if there were Inhabited and Uninhabited traits. That would give a clear path forward for generic code that needs to be generic over possibly-uninhabited types, where the code for uninhabited types does something different (probably nothing) from the code for all inhabited types, as such code could just define separate implementations for T: Inhabited and T: Uninhabited. Then match expressions, function arguments, and related things could be defined to have an implicit T: Inhabited bound.

One reason to define such a function could be satisfying a trait:

enum NoError {}
// necessary to allow unwrap() on Result<T, NoError>
impl fmt::Debug for NoError {
    fn fmt(&self, &mut fmt::Formatter) -> Result<(), fmt::Error> {
        match *self {}
    }
}

(Usually better to use ! for this, but not in all cases, and your reasoning seems to work just as well for functions taking &!. There’s a proposal to have ! magically impl all the things, but that won’t work for many traits, such as those with static methods.)

Also, it’s pretty easy for free generic functions to end up instantiated with arguments of uninhabited types. For example, in the following, futures::err is if E is uninhabited:

fn dumb_result_to_future<T, E>(r: Result<T, E>) -> futures::BoxFuture<T, E> {
    match r {
        Ok(t) => futures::ok(t).boxed(),
        Err(e) => futures::err(e).boxed(),
    }
}

But there’s no reason to forbid either example. The functions can’t actually be called without undefined behavior, but they never will be; they’re just there to satisfy the type system.

IMO, this would be better:

// No need to implement any methods for implementations of traits
// by uninhabited types, since there are no values of `Self` or
// `&Self`. Instead the compiler will automatically derive no-op
// implementations.
impl<T: Uninhabited> fmt::Debug for T {}
fn dumb_result_to_future<T, E: Inhabited>(r: Result<T, E>) -> futures::BoxFuture<T, E> {
    match r {
        Ok(t) => futures::ok(t).boxed(),
        Err(e) => futures::err(e).boxed(),
    }
}

fn dumb_result_to_future<T, E: Uninhabited>(r: Result<T, E>) -> futures::BoxFuture<T, E> {
    match r {
        Ok(t) => futures::ok(t).boxed(),
        // Err(e) is an impossible case for Unihabited types.
    }
}

Here’s a another way to think about things. Imagine:

struct Result<V, E> {
    Ok(V),
    Err(E) if E: Inhabited,
}

I imagine there are lots of type-parameterized enums where some variants don’t make sense for some types of parameters, so maybe it makes sense to go this direction.

I just discovered that there's even an RFC for this already: Allow uncallable method impls to be omitted by canndrew · Pull Request #1699 · rust-lang/rfcs · GitHub.

But what is the point of requiring everyone to special-case uninhabited types if things work fine without? That would make Result<T, !> pretty much useless (the only reason to use it over just T is to make the same code work for both fallible and infallible operations). In fact, arguably it would make uninhabited types fairly useless as a whole, which seems to conflict with the acceptance of RFC 1216…

2 Likes

I’m trying to understand the issue here, so I decided to write some FFI code:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

struct UninhabitedTest {
  uint64_t data;
};

struct UninhabitedTest* uninhabited_test_new(uint64_t d) {
  struct UninhabitedTest* ut = malloc(sizeof(struct UninhabitedTest));
  ut->data = d;
  puts("new");
  return ut;
}

void uninhabited_test_delete(struct UninhabitedTest* ut) {
  puts("delete");
  free(ut);
}

uint64_t uninhabited_test_get_data(struct UninhabitedTest* ut) {
  puts("get_data");
  return ut->data;
}
extern crate libc;

mod ut {
    mod ffi {
        #[repr(C)]
        pub struct UninhabitedTestImpl {
            _priv: u8,
        }

        extern {
            pub fn uninhabited_test_new(d: ::libc::uint64_t) -> *mut UninhabitedTestImpl;
            pub fn uninhabited_test_delete(ut: *mut UninhabitedTestImpl) -> ::libc::c_void;
            pub fn uninhabited_test_get_data(ut: *mut UninhabitedTestImpl) -> ::libc::uint64_t;
        }
    }

    pub struct UninhabitedTest {
        data: *mut ffi::UninhabitedTestImpl,
    }

    impl UninhabitedTest {
        pub fn new(d: u64) -> UninhabitedTest {
            let ut = unsafe { ffi::uninhabited_test_new(d) };
            if ut.is_null() { panic!("ran out of memory"); }
            UninhabitedTest { data: ut }
        }

        pub fn data(&self) -> u64 {
            unsafe { ffi::uninhabited_test_get_data(self.data) }
        }
    }

    impl Drop for UninhabitedTest {
        fn drop(&mut self) {
            unsafe { ffi::uninhabited_test_delete(self.data) };
        }
    }
}

fn main() {
    use ut::UninhabitedTest;

    let input = 4;
    let ut = UninhabitedTest::new(input);
    let output = ut.data();
    println!("input: {}", input);
    println!("output: {}", output);
}

I’d admit that having to use a u8 as a member in order to stop lints from giving me warnings is a bit annoying, but as far as I can tell, this code is correct?

Rust isn’t allowed to dereference a *mut _ automatically and this code never does it manually, so only the C code will ever do it.

The only problem would be if the author of the ut module leaks a *mut UninhabitedTestImpl or *const UninhabitedTestImpl or creates &UninhabitedTestImpl or &mut UninhabitedTestImpl. I.e. authors of unsafe code have to be careful and know what they are doing. I don’t see why that is an unreasonable burden, unless I’m missing something?

   let foo = unsafe { &*foo };

The issue here is that you’re creating a ‘safe’ reference to a raw pointer. There’s no reason to do this if what you want is an opaque type and there’s no reason that users of the library/module should have the access required to do this either.

I do this literally all the time in my Rust wrappers around C types. Whenever possible I try to use references instead of pointers as the types of parameters in my FFI functions because references denote aliasing and non-null requirements that pointers don't. For example, I have fn add(result: &mut BIGNUM, a: &BIGNUM, b: &BIGNUM) which indicates that none of the parameters may be NULL, and result may not alias either a or b. Therefore my wrapper around BIGNUM has as_ref(&self) -> &BIGNUM { unsafe { &*self.0 } }, which apparently (and surprisingly) is dreadfully dangerous.

If Rust had a true opaque type mechanism like extern { type BIGNUM; } then this would work perfectly safely, AFAICT.

1 Like

I’ll just go with a straight C example. The API for Lua has a lua_State as an opaque pointer by virtue of it being declared but not defined in the public header file. What you’re suggesting is equivalent to dereferencing an incomplete type in C. Why should Rust allow that when even C doesn’t?

If that’s the basis of this ‘problem’ with Rust, then I don’t see why the discussion is still happening, because the goals of having opaque pointers for use in FFI and being able to form safe references to the same types in Rust seem directly opposed.

I’m happy to admit I’m not doing it the right way. Please tell me what the right way of creating a reference to an incomplete type, such as in this C++ example, which compiles just fine, https://godbolt.org/g/M2jdnP:

extern "C" {
    struct BIGNUM;
    BIGNUM *new_bignum();
    void delete_bignum(BIGNUM *);    
}

void foo() {
    BIGNUM *b = new_bignum();
    BIGNUM &b_ref = *b;
    delete_bignum(b);
}

That is in fact, conforming C++. C++ allows using the ‘indirection’ operator on a pointer to incomplete type in limited cases, one of which is to form a reference. But, I fail to see how this gives you anything useful. You can’t call functions on it directly and it can’t be used in a way that would require a lvalue-to-rvalue conversion. I suppose that leaves, passing it as a reference to a function that is defined in a context where the type is complete or to take the address and turn it back into a pointer. It can also be used in some, but not all, metaprogramming techniques.

I still fail to see how having a reference here is useful. The fact that it can’t be null isn’t really useful when you can’t actually do anything with it while it’s incomplete.

In the case of Rust, what point is there in proving that they can’t alias if Rust can’t do anything with the pointers other than pass them to an FFI function? It also doesn’t matter if it can or can’t be null if you can’t dereference it at all.

What benefit are you actually trying to achieve?

First of all, that would be a massively breaking change. Empty matches are used all over the place as a stable construct that generates an unreachable intrinsic.

Second of all, people need to write code generators sometimes. That's why RFC 218 was accepted and implemented. I, at least, want to be able to invoke quick_error! with no arguments and get a data type that implements all the error traits, but happens to be impossible to construct because it doesn't have any variants.