Recent change to make exhaustiveness and uninhabited types play nicer together


#66

I do this literally all the time in my Rust wrappers around C types. Whenever possible I try to use references instead of pointers as the types of parameters in my FFI functions because references denote aliasing and non-null requirements that pointers don’t. For example, I have fn add(result: &mut BIGNUM, a: &BIGNUM, b: &BIGNUM) which indicates that none of the parameters may be NULL, and result may not alias either a or b. Therefore my wrapper around BIGNUM has as_ref(&self) -> &BIGNUM { unsafe { &*self.0 } }, which apparently (and surprisingly) is dreadfully dangerous.

If Rust had a true opaque type mechanism like extern { type BIGNUM; } then this would work perfectly safely, AFAICT.


#67

I’ll just go with a straight C example. The API for Lua has a lua_State as an opaque pointer by virtue of it being declared but not defined in the public header file. What you’re suggesting is equivalent to dereferencing an incomplete type in C. Why should Rust allow that when even C doesn’t?

If that’s the basis of this ‘problem’ with Rust, then I don’t see why the discussion is still happening, because the goals of having opaque pointers for use in FFI and being able to form safe references to the same types in Rust seem directly opposed.


#68

I’m happy to admit I’m not doing it the right way. Please tell me what the right way of creating a reference to an incomplete type, such as in this C++ example, which compiles just fine, https://godbolt.org/g/M2jdnP:

extern "C" {
    struct BIGNUM;
    BIGNUM *new_bignum();
    void delete_bignum(BIGNUM *);    
}

void foo() {
    BIGNUM *b = new_bignum();
    BIGNUM &b_ref = *b;
    delete_bignum(b);
}

#69

That is in fact, conforming C++. C++ allows using the ‘indirection’ operator on a pointer to incomplete type in limited cases, one of which is to form a reference. But, I fail to see how this gives you anything useful. You can’t call functions on it directly and it can’t be used in a way that would require a lvalue-to-rvalue conversion. I suppose that leaves, passing it as a reference to a function that is defined in a context where the type is complete or to take the address and turn it back into a pointer. It can also be used in some, but not all, metaprogramming techniques.

I still fail to see how having a reference here is useful. The fact that it can’t be null isn’t really useful when you can’t actually do anything with it while it’s incomplete.

In the case of Rust, what point is there in proving that they can’t alias if Rust can’t do anything with the pointers other than pass them to an FFI function? It also doesn’t matter if it can or can’t be null if you can’t dereference it at all.

What benefit are you actually trying to achieve?


#70

First of all, that would be a massively breaking change. Empty matches are used all over the place as a stable construct that generates an unreachable intrinsic.

Second of all, people need to write code generators sometimes. That’s why RFC 218 was accepted and implemented. I, at least, want to be able to invoke quick_error! with no arguments and get a data type that implements all the error traits, but happens to be impossible to construct because it doesn’t have any variants.


#71

The workaround I currently use for opaque types is also really awful:

mod foo { #[repr(C)] pub struct Foo([u8;0]); }

Which, like your solution using u8 instead of [u8;0] also runs into the moving problems, but has the advantage of being zero sized which means that things like mem::swap-ing between pointers of that type is a no-op.

My suggested solution at one point to this problem was [Pre-RFC] Opaque Structs - but the thread didn’t seem to go anywhere.


#72

Using a [u8;0] to make it 0 size is a nice trick.

I think it would be reasonable to have a way to declare a type which can only be used like *const T and *mut T, can’t be dereferenced and is `#[repr©] by default. This would allow having opaque pointers to something on the other side of the FFI boundary without any risk of Rust (or users) invoking UB, since the pointer can only be stored and passed through FFI to other code.


#73

That actually sounds like a hole in the improper_ctypes lint. Not a hole that I’ll like fixed, in any case.

That makes extern type a rather low priority feature.


#74

I slapped together an RFC for the extern type proposal: https://github.com/rust-lang/rfcs/pull/1861

That’s probably the best place to discuss this as this thread is getting a little off-topic.


#75

I’ve talked about the &mut issue with @nikomatsakis, and it seems that the following desirable properties are incompatible:

  1. match exhaustiveness is identical between safe and unsafe code
  2. &! is matchck-uninhabited in safe code
  3. matches don’t assert deep validity in unsafe code
  4. uninhabited types are not treated specially for UB
  5. if a match is exhaustive, every arm you add to its end can never be reached

The problem is: Because of (2), you can write this code:

    let x: &! = get();
    match x {
    }

From (1), we also get

    unsafe {
        let x: &! = get();
        match x {
        }
    }

On the other hand, we also want this to be non-UB - aka (3)

    unsafe {
        let x: &Whatever = get();
        match x {
            y => println!("hi {:?}", y as *const _)
        }
    }

Because of (4), this is equivalent to

    unsafe {
        let x: &! = get();
        match x {
            y => println!("hi {:?}", y as *const _)
        }   
    }

Which contradicts (5)


#76

There’s a PR to roll-back and feature gate all the changes surrounding uninhabited types in matches here. I also just posted and RFC arguing that these changes should be re-instated here (before having seen @arielb1’s comment).


#77

@arielb1

There’s two ways out of this that I can see: One is to say that we hit UB as soon as we try to return a &!. A value returned by get() isn’t just invalid, it’s provably invalid, statically. This is different to trying to return a NULL &u32 - we may not be able to prove that this value is valid but can’t prove that it’s invalid either. We could make a distinction between safe and unsafe code where, in safe code, we need to be able to prove that all accessible data is valid to guarantee safety whereas, in unsafe code, we just need to not be able to prove that some accessible data is invalid.

The other is to recognize that (5) is true - unless your types are lying to you. We shouldn’t force people to consider that possibility. If they do want to consider that possibility, we can say they’re welcome to and this code will run fine:

unsafe {
    let x: &! = get();
    match x {
        y => println!("hi {:?}", y as *const _)
    }   
}

But the uninhabitedness changes also allow them to write this code:

unsafe {
    let x: &! = get();
    match x {
    }   
}

Why shouldn’t they be allowed to write this? Yes it’s broken but it’s broken because they fucked up using unsafe to create an impossible value and then matched on it. I mean, how much leeway to we need to give people, really? If this code shouldn’t be allowed then neither should exhaustively matching on a bool by testing for both true and false. What if they create a 42: bool and try to match on it? We don’t want their code to break! Better force them to match on every possible bit-pattern of a bool instead,


#78

The current consensus is that you can’t have a NULL: &u8 or a 42: bool any more than you can have a 256: u8 - any attempt to create such is instant UB.


#79

Yes, and you can’t have empty_vector_of_bits: ! either. So if you match on that it’s instant UB. This code:

unsafe {
    let x: &! = get();
    match x {
    }   
}

dereferences the pointer and matches on the contained !. That’s UB.


#80

That’s the !(5) position. Personally, I am split between !(5) and !(1).


#81

I’ve argued (sort-of) for !(5) in that RFC. Or at least !(5) when you have invalid data exposed and pattern-matchable through an enclosing piece of data that you’re matching on.


#82

We added such a feature in OCaml in the last version, precisely for the purpose of better handling empty types appearing deeply nested (otherwise there is a termination issue, of how deep you do your case-analysis to test for inhabitation, and a practical issue of performance of exhaustivity checking). I think this is a very good construction because it both guides the implementation (of the exhaustiveness checker) and expresses programmer’s intent (“here is the reason why this case is dead”).

.

Regarding the larger discussion: it is very common in programming languages that support modularity to have a notion of “abstract type”, whose definition is not known. The client cannot build a value of such a type, but that does not mean that it is not inhabited (using an empty sum type for this is a very bad idea). Once you start reasoning on type equalities (for GADTs for example), the same phenomenon occurs, there are equalities that you can prove are true (from available definitions), equalities that you can prove are false (from available definitions), and equalities that you neither prove or disprove as they involve unknown/abstract types. In general it is important to carefully distinguish “I cannot prove X” and “I can prove not(X)”: the closed-world assumption that lets you confuse the two notions is anti-modular.

One of the good things with the ML module systems is that they force the language design to be honest with respect to abstract type: modules have an interface that is separated from the implementation. You need abstract types in the language as soon as you allow encapsulation, as they describe the interface of private types. Rust does not have an interface language, so it does not ask itself the question of “what is the interface of a type whose definition is private?” – the only natural answer would be “an abstract type”. If you add the extern type Foo; syntax to the language, it will be useful not only to think of types defined through the FFI, but also to think of types from other crates that are visible but whose definition are private.


#83

Can’t overstate how important this is to support very abstract approaches.

If one looks at the similarities between Scala’s underscore and Rust’s, it feels like Rust should be able to source some additional flexibility from there: http://stackoverflow.com/questions/8000903/what-are-all-the-uses-of-an-underscore-in-scala Existential types

def foo(l: List[Option[_]]) = ...

Higher kinded type parameters

case class A[K[_],T](a: K[T])

#84

Could you elaborate here? There’s no arbitrary recursion limits anywhere in the inhabitedness-checking code but I don’t see how it could go into an infinite loop. We also cache the result of the inhabitedness check but there hasn’t been any benchmarks comparing it with being disabled.


#85

I did not participate in the performance measurements, but Jacques Garrigue reported his number in the relevant bugtracker entry, PR#6437. The costs seemed to be non-neglectible, see for example (Exh is for exhaustivity checking, while Unus for analysis of unused cases):

Here are a few more times, for make library on another machine. The number indicates the “exploding depth”, i.e. the number of nested GADTs one may explode for analysis. Exh(2)+Unus(2): 77.0s Exh(1)+Unus(1): 6.55s Exh(1)+Unus(0): 6.35s Exh(0)+Unus(0): 5.95s Exh(1): 6.19s Exh(0): 5.75s

Re. non-termination: a naive implementation of inhabitation will say that if the type is a tuple/product type, then it is inhabited if all components are inhabited, and if is a sum/enumeration type, then it is inhabited if at least one component is inhabited. If you have a recursive type, repeatedly applying this strategy can loop infinitely. For a regular recursive type (type nat = Z | S of nat), you can detect that you are looping (a subgoal is the same as the original goal), and terminate there (deciding inhabitedness or not depending on whether your language allows recursive values). But if the recursive type is parametrized and its definition instantiates its parameter in an non-regular way (say type 'a fulltree = Leaf of 'a | Node of ('a * 'a) fulltree), then there is no clear termination strategy. Your language design might forbid that (those types also make monomorphization difficult), so maybe the performance aspect will speak more to you.