Recent change to make exhaustiveness and uninhabited types play nicer together

Here’s a another way to think about things. Imagine:

struct Result<V, E> {
    Ok(V),
    Err(E) if E: Inhabited,
}

I imagine there are lots of type-parameterized enums where some variants don’t make sense for some types of parameters, so maybe it makes sense to go this direction.

I just discovered that there's even an RFC for this already: Allow uncallable method impls to be omitted by canndrew · Pull Request #1699 · rust-lang/rfcs · GitHub.

But what is the point of requiring everyone to special-case uninhabited types if things work fine without? That would make Result<T, !> pretty much useless (the only reason to use it over just T is to make the same code work for both fallible and infallible operations). In fact, arguably it would make uninhabited types fairly useless as a whole, which seems to conflict with the acceptance of RFC 1216…

2 Likes

I’m trying to understand the issue here, so I decided to write some FFI code:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

struct UninhabitedTest {
  uint64_t data;
};

struct UninhabitedTest* uninhabited_test_new(uint64_t d) {
  struct UninhabitedTest* ut = malloc(sizeof(struct UninhabitedTest));
  ut->data = d;
  puts("new");
  return ut;
}

void uninhabited_test_delete(struct UninhabitedTest* ut) {
  puts("delete");
  free(ut);
}

uint64_t uninhabited_test_get_data(struct UninhabitedTest* ut) {
  puts("get_data");
  return ut->data;
}
extern crate libc;

mod ut {
    mod ffi {
        #[repr(C)]
        pub struct UninhabitedTestImpl {
            _priv: u8,
        }

        extern {
            pub fn uninhabited_test_new(d: ::libc::uint64_t) -> *mut UninhabitedTestImpl;
            pub fn uninhabited_test_delete(ut: *mut UninhabitedTestImpl) -> ::libc::c_void;
            pub fn uninhabited_test_get_data(ut: *mut UninhabitedTestImpl) -> ::libc::uint64_t;
        }
    }

    pub struct UninhabitedTest {
        data: *mut ffi::UninhabitedTestImpl,
    }

    impl UninhabitedTest {
        pub fn new(d: u64) -> UninhabitedTest {
            let ut = unsafe { ffi::uninhabited_test_new(d) };
            if ut.is_null() { panic!("ran out of memory"); }
            UninhabitedTest { data: ut }
        }

        pub fn data(&self) -> u64 {
            unsafe { ffi::uninhabited_test_get_data(self.data) }
        }
    }

    impl Drop for UninhabitedTest {
        fn drop(&mut self) {
            unsafe { ffi::uninhabited_test_delete(self.data) };
        }
    }
}

fn main() {
    use ut::UninhabitedTest;

    let input = 4;
    let ut = UninhabitedTest::new(input);
    let output = ut.data();
    println!("input: {}", input);
    println!("output: {}", output);
}

I’d admit that having to use a u8 as a member in order to stop lints from giving me warnings is a bit annoying, but as far as I can tell, this code is correct?

Rust isn’t allowed to dereference a *mut _ automatically and this code never does it manually, so only the C code will ever do it.

The only problem would be if the author of the ut module leaks a *mut UninhabitedTestImpl or *const UninhabitedTestImpl or creates &UninhabitedTestImpl or &mut UninhabitedTestImpl. I.e. authors of unsafe code have to be careful and know what they are doing. I don’t see why that is an unreasonable burden, unless I’m missing something?

   let foo = unsafe { &*foo };

The issue here is that you’re creating a ‘safe’ reference to a raw pointer. There’s no reason to do this if what you want is an opaque type and there’s no reason that users of the library/module should have the access required to do this either.

I do this literally all the time in my Rust wrappers around C types. Whenever possible I try to use references instead of pointers as the types of parameters in my FFI functions because references denote aliasing and non-null requirements that pointers don't. For example, I have fn add(result: &mut BIGNUM, a: &BIGNUM, b: &BIGNUM) which indicates that none of the parameters may be NULL, and result may not alias either a or b. Therefore my wrapper around BIGNUM has as_ref(&self) -> &BIGNUM { unsafe { &*self.0 } }, which apparently (and surprisingly) is dreadfully dangerous.

If Rust had a true opaque type mechanism like extern { type BIGNUM; } then this would work perfectly safely, AFAICT.

1 Like

I’ll just go with a straight C example. The API for Lua has a lua_State as an opaque pointer by virtue of it being declared but not defined in the public header file. What you’re suggesting is equivalent to dereferencing an incomplete type in C. Why should Rust allow that when even C doesn’t?

If that’s the basis of this ‘problem’ with Rust, then I don’t see why the discussion is still happening, because the goals of having opaque pointers for use in FFI and being able to form safe references to the same types in Rust seem directly opposed.

I’m happy to admit I’m not doing it the right way. Please tell me what the right way of creating a reference to an incomplete type, such as in this C++ example, which compiles just fine, https://godbolt.org/g/M2jdnP:

extern "C" {
    struct BIGNUM;
    BIGNUM *new_bignum();
    void delete_bignum(BIGNUM *);    
}

void foo() {
    BIGNUM *b = new_bignum();
    BIGNUM &b_ref = *b;
    delete_bignum(b);
}

That is in fact, conforming C++. C++ allows using the ‘indirection’ operator on a pointer to incomplete type in limited cases, one of which is to form a reference. But, I fail to see how this gives you anything useful. You can’t call functions on it directly and it can’t be used in a way that would require a lvalue-to-rvalue conversion. I suppose that leaves, passing it as a reference to a function that is defined in a context where the type is complete or to take the address and turn it back into a pointer. It can also be used in some, but not all, metaprogramming techniques.

I still fail to see how having a reference here is useful. The fact that it can’t be null isn’t really useful when you can’t actually do anything with it while it’s incomplete.

In the case of Rust, what point is there in proving that they can’t alias if Rust can’t do anything with the pointers other than pass them to an FFI function? It also doesn’t matter if it can or can’t be null if you can’t dereference it at all.

What benefit are you actually trying to achieve?

First of all, that would be a massively breaking change. Empty matches are used all over the place as a stable construct that generates an unreachable intrinsic.

Second of all, people need to write code generators sometimes. That's why RFC 218 was accepted and implemented. I, at least, want to be able to invoke quick_error! with no arguments and get a data type that implements all the error traits, but happens to be impossible to construct because it doesn't have any variants.

The workaround I currently use for opaque types is also really awful:

mod foo { #[repr(C)] pub struct Foo([u8;0]); }

Which, like your solution using u8 instead of [u8;0] also runs into the moving problems, but has the advantage of being zero sized which means that things like mem::swap-ing between pointers of that type is a no-op.

My suggested solution at one point to this problem was [Pre-RFC] Opaque Structs - but the thread didn’t seem to go anywhere.

Using a [u8;0] to make it 0 size is a nice trick.

I think it would be reasonable to have a way to declare a type which can only be used like *const T and *mut T, can’t be dereferenced and is `#[repr©] by default. This would allow having opaque pointers to something on the other side of the FFI boundary without any risk of Rust (or users) invoking UB, since the pointer can only be stored and passed through FFI to other code.

1 Like

That actually sounds like a hole in the improper_ctypes lint. Not a hole that I’ll like fixed, in any case.

That makes extern type a rather low priority feature.

1 Like

I slapped together an RFC for the extern type proposal: https://github.com/rust-lang/rfcs/pull/1861

That’s probably the best place to discuss this as this thread is getting a little off-topic.

2 Likes

I’ve talked about the &mut issue with @nikomatsakis, and it seems that the following desirable properties are incompatible:

  1. match exhaustiveness is identical between safe and unsafe code
  2. &! is matchck-uninhabited in safe code
  3. matches don’t assert deep validity in unsafe code
  4. uninhabited types are not treated specially for UB
  5. if a match is exhaustive, every arm you add to its end can never be reached

The problem is: Because of (2), you can write this code:

    let x: &! = get();
    match x {
    }

From (1), we also get

    unsafe {
        let x: &! = get();
        match x {
        }
    }

On the other hand, we also want this to be non-UB - aka (3)

    unsafe {
        let x: &Whatever = get();
        match x {
            y => println!("hi {:?}", y as *const _)
        }
    }

Because of (4), this is equivalent to

    unsafe {
        let x: &! = get();
        match x {
            y => println!("hi {:?}", y as *const _)
        }   
    }

Which contradicts (5)

1 Like

There’s a PR to roll-back and feature gate all the changes surrounding uninhabited types in matches here. I also just posted and RFC arguing that these changes should be re-instated here (before having seen @arielb1’s comment).

@arielb1

There’s two ways out of this that I can see: One is to say that we hit UB as soon as we try to return a &!. A value returned by get() isn’t just invalid, it’s provably invalid, statically. This is different to trying to return a NULL &u32 - we may not be able to prove that this value is valid but can’t prove that it’s invalid either. We could make a distinction between safe and unsafe code where, in safe code, we need to be able to prove that all accessible data is valid to guarantee safety whereas, in unsafe code, we just need to not be able to prove that some accessible data is invalid.

The other is to recognize that (5) is true - unless your types are lying to you. We shouldn’t force people to consider that possibility. If they do want to consider that possibility, we can say they’re welcome to and this code will run fine:

unsafe {
    let x: &! = get();
    match x {
        y => println!("hi {:?}", y as *const _)
    }   
}

But the uninhabitedness changes also allow them to write this code:

unsafe {
    let x: &! = get();
    match x {
    }   
}

Why shouldn’t they be allowed to write this? Yes it’s broken but it’s broken because they fucked up using unsafe to create an impossible value and then matched on it. I mean, how much leeway to we need to give people, really? If this code shouldn’t be allowed then neither should exhaustively matching on a bool by testing for both true and false. What if they create a 42: bool and try to match on it? We don’t want their code to break! Better force them to match on every possible bit-pattern of a bool instead,

The current consensus is that you can’t have a NULL: &u8 or a 42: bool any more than you can have a 256: u8 - any attempt to create such is instant UB.

Yes, and you can’t have empty_vector_of_bits: ! either. So if you match on that it’s instant UB. This code:

unsafe {
    let x: &! = get();
    match x {
    }   
}

dereferences the pointer and matches on the contained !. That’s UB.

That’s the !(5) position. Personally, I am split between !(5) and !(1).