Size of uninhabited types

Hi all,

This post is inspired by the following ICE bug: https://github.com/rust-lang/rust/issues/36479. Currently, it is semantically possible to construct a value of type ! (or your favorite enum Void {}) unsafely, with

unsafe { mem::transmute::<(), !>(()) }

Of course, materialization of an uninhabited type usually emits a halt-and-catch-fire; in principle, however, such a transmute should be impossible, since () is an inhabited ZST, while ! is uninhabited (but also a ZST, since sizeof::<!>() == 0). I assume materializing a ! should be UB anyways, but Iā€™ve yet to find anywhere that says so.

The RFC for the ! type briefly comments that, in analogy with sizeof::<()>() == 0 == log(1), weā€™d expect sizeof::<!>() == -Infty == log(0). sizeof returns usize though, so we canā€™t have it actually do this, but the above semantic bug (and accompanying ICE) really makes me feel like we should make a stronger distinction between ZSTs and ā€œnever-sized-typesā€ (NSTs, if you like).

The only practical solution I can think of is to make transmute allergic to NSTs, though I think it should be legal to transmute between NSTs, which is dead code that the linker would optimize out anyways.

Weā€™re kind of stuck with the current sizeof situation though, since we can neither change the signature of sizeof or define something ridiculous like sizeof::<!>() == usize::MAX, since that would be a breaking change, and while I have a bit of trouble figuring it what situations itā€™d actually matter in, it might come across as strange that sizeof::<Option<!>>() == 0.

3 Likes

I thought it was the case that transmuting from an inhabited to an uninhabited type was an error. I seem to remember implementing that at some point. Maybe I only imagined it or maybe it got taken out. :confused:

Note that transmute doesnā€™t have to follow the rule that transmuting from A to B is valid if size_of::<A>() == size_of::<B>(). Since it already ignores the trait system we can add other wacky restrictions if we want.

2 Likes

I agree that we could add such a check to transmute, but I am skeptical whether that change would be carrying its weight. Is there any data on how many bugs would be caught by this? Iā€™ve seen a couple bugs caused by creating values of uninhabited types, but all I can recall were via mem::uninitialized or mem::zeroed or other more complex code patterns, not via transmute.

Aside from the question of uninhabited types specifically, I am unhappy with transmute as a whole and its existing special casing (both in the compiler and many peopleā€™s mental models). Itā€™s too powerful and liberally used for these extra checks to make a dent in how dangerous it is. Iā€™d rather deprecate it entirely (replacing it with a suite of more targeted, limited operations) rather than add more and more special treatment.

3 Likes

Just for reference, creating an instance of uninhabited type is UB: https://doc.rust-lang.org/reference/behavior-considered-undefined.html. That probably falls into ā€žinvalid values in primitive typesā€œ, namely ā€žA discriminant in an enum not included in the type definition.ā€œ

ICE is probably not the best thing, but I suspect even that falls within the bounds what UB allows.

1 Like

I'm not sure exactly what you're saying, but this strange fact is already true! This reveals how it's actually different than just a ZST, because sizeof::<Option<()>>() == 1.

2 Likes

I definitely agree that transmute is a footgun that we should be trying to get rid of, and making "transmute as it stands more reasonable" isn't the most compelling reason to think about something.

My main point was about this particular paragraph in the RFC, which is the only mention of the size of !:

They have no logical machine-level representation. One way to think about this is to consider the number of bits required to store a value of a given type. A value of type bool can be in two possible states (true and false). Therefore to specify which state a bool is in we need log2(2) ==> 1 bit of information. A value of type () can only be in one possible state (()). Therefore to specify which state a () is in we need log2(1) ==> 0 bits of information. A value of type Never has no possible states it can be in. Therefore to ask which of these states it is in is a meaningless question and we have log2(0) ==> undefined (or -āˆž). Having no representation is not problematic as safe code never has reason nor ability to handle data of an empty type (as such data can never exist). In practice, Rust currently treats empty types as having size 0.

It seems rather unfortunate to me that, while uninhabited types are eligible for optimizations like flattening Option<!> into a ZST, it isn't possible for a user to check that a type is uninhabited, like we can presently do for ZSTs. I concede to rkruppe that I can't immediately think of a situation in which this prevents bugs, but it bothers me that size_of::<!>() is not specified to be 0, and that this is current-behavior-that-may-change isn't noted in the documentation of size_of. Maybe I overthought this whole thing and noting it in the docs is all that's necessary (fixing the transmute ICE notwithstanding)

It's not clear to me that this is the case with !, since it's not an enum. Maybe it's worth noting separately?

1 Like

Note that all type sizes are subject to change in the future, unless explicitly promised otherwise.

See for example the discussions about bool in https://github.com/rust-lang/rfcs/pull/954 or https://github.com/rust-lang/rust/pull/46156.

1 Like

Eh, just because generated code can do whatever it wants if UB is invoked, I'm not sure any input being able to crash the compiler is what we want

1 Like

Not saying itā€™s what we want, only that I believe itā€™s allowed under the definition of UB. Not that Rust would say much about what it means with UB, but AFAIK C standard specifically lists ā€žNot compileā€œ as one example of what it might mean.

Maybe it would make sense to have a mem::is_inhabited::<T>() and to clarify that transmute only works if the size and inhabitedness are the same.

1 Like

This is my biggest feeling about the question, and about providing a way to detect uninhabited things at all. Things that break with ! often break with any type with limited valid values -- which is most of them -- it's just more obvious with !.

One thing about uninhabitedness is that, like with ZST-ness, it never matters in safe code, and should only rarely matter in unsafe code. And rather than needing to match on size (or inhabitedness), it would be perfectly well-defined to transmute something uninhabited into literally anything.

2 Likes

I think it's pretty questionable that ICE is an acceptable implementation of a UB, since the standard error message asserts that this is a bug and that it should be reported.

Agreed; even in the non-trival example of Vec, which has to deal with ZSTs, you get Vec<!> working-as-intended for free just from handling "inhabited" ZSTs correctly.

Even with standardese, I'd interpret that as "will give a compile error". No textual input should ever crash the compiler.

I also find the reasoning of "but the C standard says it's fine" less than ideal. C does lots of things I wouldn't like to see repeated in Rust :wink:

3 Likes

ā€œItā€™s UBā€ rarely or never justifies ICEs, not because of any standardese or philosophical convictions but simply because UB that is never executed isnā€™t UB. For example, if 1 > 2 { transmute::<(), !>(()); } is perfectly well-defined (to do nothing), so it should be compiled to an executable that does nothing. ICE-ing only if the UB is actually executed is generally impossible. End of story.

This conversation is absurd.

  • An ICE is never a legitimate answer to an input source file. An ICE is a bug in the compiler that needs to be fixed. Period. Maybe the fix is to emit a user-directed error and abort.
  • If something fails to compile (in a legitimate manner), thereā€™s no point discussing if it is UB. What would the reference say, ā€œtransmuting data to an uninhabited type is undefined, and oh, by the way, you canā€™t do it, so never mind?ā€ If a tree falls in a forest and nobody is aroundā€¦
6 Likes

It could still be UB in cases where it's not feasible to diagnose AoT. (This is essentially always the case with UB, because if it could be diagnosed reliably it wouldn't need to be UB.)

Well... that depends on what "diagnosed reliably" is. In C, the following is UB, even though it's something that, in principle, could always be caught (modulo my poor knowledge of C outside of good practices).

int f() {
    /* no return statement */
}
int x = f();

In Rust, the following transmute is explicitly UB, but the compiler will catch it and trigger a deny-by-default lint (though you can still be clever with pointer casts and (*mut T)::as_mut()):

unsafe fn make_me_mut<'a, T>(ptr: &'a T) -> &'a mut T {
    transmute(ptr)
}

"Fails to compile with error" is a totally reasonable way to handle UB, such as in the case above, where we try our best to stop people from materializing mut from the aether but just let LLVM sort it out if we don't catch it (i.e., allow it or halt-and-catch-fire). It is, after all, undefined behavior; we can do whatever we want if we notice it, though an ICE is not something we agree is an acceptable response.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.