Lets discuss Inhabited trait

RFC 1892 suggests to deprecate std::mem::uninitialized. The main motivation is to solve problem of unsoundness around creating values of uninhabited types (e.g. ! or Void). In my opinion it looks like plugging holes without solving an underlying issue: currently we don’t have a way to distinguish inhabited and non-inhabited types on type-level. Thus constraints which will make code safer can not be expressed properly. The RFC lists “the old Inhabited trait proposal” trait proposal as an alternative, but unfortunately I couldn’t find it.

IIUC the main issue with introducing Inhabited trait is that to make it work, it will have to be an automatic trait bound, so e.g. for Result variants we’ll have to explicitly use ?Inhabited bound on its variants. But this change can’t be done in a straightforward fashion, as it can break a lot of code. The most common uses of uninhabited types include:

  • FFI types (arguably a dirty hack, should be replaced with extern types)
  • Marker types (think e.g. BigEndian and LittleEndian types from byteorder crate), these types can benefit from explicit !Inhabited bounds
  • Void like types (which will be superseded by never type), arguably the only sensible use for them is in enum variants, to denote “impossible” cases.

So how Inhabited trait can be introduced in a backwards compatible way? As I see it, solution is make Inhabitted auto-bound violation to issue warnings instead of compile-time error in Rust 2018 edition, and make it a hard error in the next edition. While this approach raises difficult questions regarding how it can be implemented, I think that in a long run we better have a proper Inhabited trait than plug holes here and there.

Why would we want an Inhabited trait? What (useful) code is enabled by having this trait? What other issues are lurking that can’t be fixed with the proposed MaybeUninitialized type? There’s mem::zeroed which has similar issues with uninhabited types, but that one can also produce invalid bit patterns for inhabited types, so it can’t soundly be used in generic code anyway (without extra bounds that are more restrictive than, and would imply, Inhabited).

Also keep in mind that one possible outcome of the unsafe code guidelines is that it’s straight up UB to ever use mem::uninitialized for anything except maybe ZSTs. In that case we’d want MaybeUninitialized anyway.

2 Likes

Because uninitialized and zeroed are not only ways to create uninhabited type values, e.g. you can write:

let a: Void = unsafe { mem::transmute(ZstType) };

And I think there is other holes like that, which I can’t recall now. So shouldn’t we simply place Inhabited trait bounds on function and let type system handle it from here? And in some cases you may want to use !Inhabitted trait bound, as in marker types, or if we’ll take Result:

// Inhabited bound is redundant, I'll use it for explicitness
impl<T: Inhabited, E: !Inhabited> Result<T, E> {
    /// Safely unwraps Ok variant
    fn always_ok(self) -> T {
        // I think it will not work today, but in future compiler may
        // prove that Result<T, !> is equivalent to T
        unsafe { mem::transmute(self) }
    }
}

impl<T: !Inhabited, E: Inhabited> Result<T, E> {
    fn always_err(self) -> E { .. }
}

One of the arguments which I’ve heard against Inhabited trait bound is that it will make things like Box<[!]> to require explicit ?Inhabited bounds, which can infect a lot of code bases. But I haven’t heard an explanation why we need such strange types in the first place.

Please elucidate on this. I think an automatic trait bound with this much churn is difficult to justify. Moreover, most functions never need to care that they're handling an uninhabited type, because those functions will get optimized away (since they can't be called usually). It is the uncommon case where we want to explicitly ban uninhabited types.

Having value of uninhabited type is UB. Period. We simply can’t rely on “they’ll get optimized away”. So functions should very much care not to get into such cases, and auto-bound will handle this for most of the code, without changing much for most of Rusteceans.

That's UB in any case, and introducing an Inhabited trait won't prevent all those misuses from occurring. Obviously unsafe code can easily cause horrible UB, but that's neither news nor specific to inhabitedness. Deprecating uninitialized is more of a lint, not a soundness fix.

We don't have negative bounds, though, and it's far from clear whether we'll ever get them.

3 Likes

How transmute<T1: Inhabited, T2: Inhabited>(v: T1) -> T2 will not prevent misuse of transmuting ZST into uninhabited type value? Compiler simply will reject code which will place ?Inhabited bound on T2. Yes, it will not prevent all possible problems in regards to uninhabited types, but as I see it most of them will be handled. Authors will have to explicitly opt-into possibility of using uninhabited types and to think about consequences.

For the time being I though it could work in the same way as Sync does.

I don’t think rust has a tradition that make markers for storage representation?

I’d imagine something like this:

trait TypeInfo {
    const SIZE_OF: usize;
    const INHABITED: bool;
    ...
}

Having a value of uninhabited type indicates that that code cannot be reached safely and that the linker can safely delete that code from the binary. The typeck does not know about panicking or anything like that, only about types, which it manipulates abstractly. For example, we can materialize a ! with

fn make_never() -> ! { loop {} }

I can create a reference to it, and dereference it, because ! is Copy:

let ref_never: &! = &make_never();
let never = *ref_never;

None of this code is UB.

Of course, at this point the compiler can assume that this code will never run because it manipulates empty types. As you know, there is no safe way to return a value from make_never. There are unsafe ways, and that is how you get UB. Usually, the compiler will insert a halt-and-catch-fire into these functions in debug mode, but will completely remove them in release mode.

It is completely silly to ban empty types by default, since empty types can be used in a generic context to express something that never happens. For example, if we get an analogue to C++'s ptr-to-memeber, you could imagine that T::*U could be made uninhabited if T has no field of type U. This adds a safety guarantee for calling functions with generic ptr-to-members, which would, in your proposal, generate bizarre errors.

1 Like

Ah, I think I've got what you meant. I should've added "uninhabited type value in code which will run". Yes, your example is not UB, in the same way as error branch is not UB for E=!:

match result { Ok(v) => { .. }, Err(e) => { .. } }

So if I understand you correctly your worry is that code in the error branch will have to use ?Inhabited bound, is that right? it's a tricky one, but can't typechecker process e as Inhabited type if it can be proved that respective block will be never reached?

I am not familiar with C++, but why not make it Option instead? To me your ptr-to-memeber example looks like an excellent way to shoot yourself in the foot.

This is backwards incompatible, including for code that has no UB. You can transmute from an uninhabited type to any other type -- it'll be dead code, but it'll be fine.

Furthermore, there are plenty of ways to write transmute-like operations without using transmute. Pointer casts are one option, unions are another.

And finally, I'll point out that catching some a few misuses of transmute really doesn't seem like a sufficient reason to add a new magic trait and new default bound. It would be much more compelling if the trait enabled any useful safe abstractions.

2 Likes

This is why I've proposed to make Inhabited check compiler error only for post-2018 edition, and implement it as a warning in 2018 edition. With auto-trait bound code without UB will not have any problems migrating. Yes, arguably it's a bit too much magic to liking of some, but I think lack of Inhabited trait will be more harmful in a long run considering Rust safety priorities.

This is exactly why you want Inhabited bound, to be able to reason if cast is safe or not by knowing that type will always be Inhabited. As for unions, if you will not allow ?Inhabited on unions, you will not have the problem, as you will not be able to construct union with uninhabited variant. Of course we have unfixable (?) hole with extern fn which returns !, but IMO one rare hole is better than 5 common ones.

Inhabited, as I see it, is not only and so much about catching transmute misuses, but about an ability to properly encode invariants on type-system level.

If we wanted to support use cases such as:

// Inhabited bound is redundant, I'll use it for explicitness
impl<T: Inhabited, E: !Inhabited> Result<T, E> {
    /// Safely unwraps Ok variant
    fn always_ok(self) -> T {
        // I think it will not work today, but in future compiler may
        // prove that Result<T, !> is equivalent to T
        unsafe { mem::transmute(self) }
    }
}

impl<T: !Inhabited, E: Inhabited> Result<T, E> {
    fn always_err(self) -> E { .. }
}

then one way to do that perhaps is to introduce an auto trait Uninhabited which looks like this:

pub unsafe auto trait Uninhabited {
    // will need to work around:
    // error[E0380]: auto traits cannot have methods or associated items
    fn absurd<T>(self) -> T;
}

we can then write:

impl<T, E: Uninhabited> Result<T, E> {
    /// Safely unwraps Ok variant
    fn always_ok(self) -> T {
        match self {
            Ok(x) => x,
            Err(x) => x.absurd(),
        }
    }
}

...

This involves no negative bounds.

Sorry, I missed that, but then you're still breaking legitimate and possibly-useful transmutes in Rust 2018.

(I don't care much about transmute since I want it deprecated and replaced anyway, but it's not encouraging if there are such glaring false positives.)

The proposal seems to grow in scope, amount of churn, and backwards compatibility breaks, seems to grow with every back-and-forth here. I'm going to cut this short now and just say that I remain unconvinced this trait is worthwhile.

I’ve reviewed the RFC in question and the OP here and I’m unconvinced that these proposals justify removing mem::uninitialized with the original proposal or your addendum. I’m one of those (likely few and reckless) people that actually uses mem::uninitialized for my algorithms, usually in conjunction with mem::swap, and I’d be loathe to hear that this tool was removed from my toolbox because it raised some questions in a corner of the type system (uninhabited types) that doesn’t actually affect my usage.

Perhaps I should comment on the RFC? It’s already sprawling and in its FCP and I suspect I wouldn’t be heeded anyway…

I don’t know if the negation is what we want. I think it’s more helpful to be able to say “I want a type that is inhabited”; in an ideal world, everyone switches over to ! instead of messing about with empty enums or bizarre pathological types.

There are a bunch of types which are uninhabited which are derived types of !, such as [!; N > 0] (which I assume is what you mean by “pathological”)… feels useful to be able to talk about all uninhabited types generally…

With mutually exclusive traits, you can also model the inhabited / uninhabited dichotomy perfectly.

Yeah, this is what I figured you'd have to do. I'm not sure that there's a neat way to enforce this that isn't "make both traits unsafe".

@Centril

Uninhabited will solve only a part of the problem and it will be a significantly less general solution. (btw I don't really get why negative trait bounds or at least mutually exclusive traits haven't got much traction...) Also what will happen with your proposal in case of Result<!, !>?

Also, can you explain why we need those derived "pathological" types? As I've wrote earlier I don't see any real use-cases for them.

@hanna-kruppe

Ehm, in my proposal in Rust 2018 violation of Inhabited trait bound will cause warning, to allow smooth transition, so it will not break anything. If it's too much magic, then Inhabited trait can be introduced in post-2018 editions only and cause compilation errors from the start.

And can you elaborate what do you mean by "glaring false positioves"?

How exactly what I've wrote "grows proposal in scope"?? Part about pointers is just a logical consequence of having Inhabited auto-bound, so in generic code without ?Inhabited you can be sure what you can do dereferencing without worrying about potential UB related to uninhabited types.

As for unions, not allowing uninhabited types in union variants seems quite logical to me, for me union with uninhabited variant is just another "pathological" case without any real use-case.

@acmcarther

Can you elaborate why you don't like Inhabited trait approach?

Can you elaborate why you don’t like Inhabited trait approach?

My usages don't usually cross function boundaries (that aren't FFI), and I suspect that the trait approach will reduce clarity in both cases. Usually I write Rust in one of two modes: "C-like" for FFI code or very low level code data-structure-y code, and "actual Rust", where I'm building on top of that foundational code. I'd be likely to care about initialized-ness in the first mode and demand explicitness, but not the second mode where I'd accept the implicitness brought by trait magic.

I know that's not a well structured argument. I can supply some code snippets (:

  1. Initializing an FFI object (potentially in a multi step process where partial initialization is expected)

The user doesn't always control the shape of the data structures they're working with. They also may not control the way that they're initialized. In this case, I'm instructed to pass a partially initialized struct to a foreign function. I think Rust should not diverge too much from what the C usage would look like in these cases, if possible...

Here's another, simpler example (that I think the trait might be able to handle better?)

  1. Internal data structure logic

I also use mem::uninitialized in contexts not related to FFI. Below is an example of an octree implementation where I perform a resize by extracting the "inner node" of the tree, and reinserting it as one octant of a larger octree:

In this case, the "uninitalized"-ness doesn't cross any API boundary, and I don't really care to communicate that it's initialized or not -- I just want to implement the algorithm.