Actually I was make a stronger claim: access to any fields inside a uninhabited struct or enum discriminants should not be written outside an unsafe block.
FWIW, I've felt confused about the motivation for this core thesis since I read the post, as if several steps were missing somewhere, but I haven't managed to articulate that confusion into anything coherent yet. So maybe this amounts to the standard bikeshed effect, since it's very easy to have opinions on any specific form of auto-never rules. Maybe I should make time tonight to try harder at figuring this out.
Fortunately, there is one small piece of it I can articulate now, and the last several posts have given me the impression this is not as irrelevant as I previously thought: When is match x {}
UB? Why is it ever UB? How is that anything other than a no-op under any rules or models we've ever discussed? Why does the compiler have to accept it in whatever cases where it is UB?
So I claim that whenever I spoke "unsafe code" in this thread I mean "code that should be wrapped inside an unsafe block".
In my understanding (after discussion with @notriddle) type is uninhabitted if it's impossible to create valid value of this type in safe code. (well, types with private fields and unsafe constructors are out of scope) In other words, creating value of Endless
type should be considered UB, as it can't be a correct value.
In my interpretation having initialized value of uninhabited type in executable code is automatically UB, even without accessing its contents. So it's even stronger.
Thanks for bringing this up. Uninit
is indeed not the only way to get uninitialized values.
That said, just as a caveat, today with borrowck, there are various limitations -- the most basic being that you need let mut x
here (#21232) -- but also that we wouldn't permit match x { (y, _) => ... }
. Also !
is not stable.
But indeed this compiles on stable:
enum Void { }
fn main() {
let mut x: (u32, Void);
x.0 = 22;
}
Inconsistently, this does not (this is #21232 again):
enum Void { }
fn main() {
let mut x: (u32, Void);
x.0 = 22;
let (y, _) = x;
}
Though it does compile on Nightly if you enable NLL.
I think to discuss what is UB we should only look at desugared matches, i.e., where auto-never has happened. Your match is not a valid desugared match, because it is not exhaustive.
match x { true => ..., false => ... }
is UB if x
is, for example, the result of mem::transmute::<u8, bool>(3)
. (Actually likely the transmute would already be UB, but certainly the match is.) We generate code under the assumption that one of the two branches is being taken, and we justify this by saying that if this assumption is ever wrong, that must be the programmer's fault -- hence undefined behavior.
Does that help?
I assume you mean "impossible in safe code". But I also assume that you will permit using the safe part of libstd, which does involve tons of unsafe code. And maybe some other important crates as well?
So, this is not an easy-to-define concept. I also think it is not useful.
The notion of uninhabited that I think is most relevant for the discussion here is: A type is uninhabited when having an instance of this type leads to instantaneous UB. Under this definition, !
is uninhabited.
However, &!
is inhabited because we currently do not make it instantaneous UB to have a reference to an uninhabited type. It is still unsound to pass such a reference to safe code, but unsafe code can (and does, in practice) make carefully controlled use of this type. There are many examples of values and types that are okay to have in unsafe code, but not in safe code -- for example, a Vec
where the capacity field is wrong is okay to have temporarily in the implementation of Vec
(everything else would be a disaster), but it is not okay to ever hand out such a Vec
to safe code.
Yes, I've edited my post a bit before your reply. My current point of view is result of the discussion in this thread:
I agree that generally determining if type is inhabited or not is complicated (but I think is desired for proper model of the language), this is why here I've proposed to use two traits, which will allow to leave complicated corner cases out of scope for the time being.
In future Rust may become smarter and will be able optimize-out uninhabited non-ZST types as it currently does !
.
I don't think that it's a usable definition if our goal is to properly define semantics of the language. (although I admit initially I was following a similar one)
Isn't it the same story as with creating &mut T
to uninitialized memory? Unsafe code in practice does this, but nonetheless AFAIK doing so is considered UB.
UPD: As for &!
which is often used for FFI opaque pointers, I think many consider it an unfortunate hack, which should be replaced with extern type
s.
I think I'm already pretty clear on what is being proposed. My puzzlement is all about the why. I completely understand how match x {}
on an uninhabited type is deemed UB under the proposal in the original post. But why do we want it to be UB instead of a no-op or compiler error? In particular, some of the recent posts made it sound like this empty, trivially exhaustive match already has semantics on stable Rust that force a backcompat requirement on us, but I couldn't work out what that requirement was supposed to be.
Note that I am not even talking about determining whether a type is inhabited. I am talking about defining the term "inhabited".
Care to elaborate? Literally the only reason why I care about inhabitedness is for insta-UB. So I am pretty sure that this is exactly the definition we need for discussing match and UB. There might be other situations where other definitions are relevant, but this is a very specific discussion we are having here.
Well actually, it's probably not even so relevant. We are drifting off into "basic layout invariants" again. Which is closely related but not the same thing...
If unsafe code does this, then we either have to quickly go ahead and change all that unsafe code (it probably does this for a reason, so we would have to provide some better way of doing things), or we better do not make it UB.
There is no space for "this is UB but code does it anyway".
I assume you mean x: !
? How can it be a NOP? Imagine this function:
fn foo(x: !) -> i32 { match x {} }
We have to generate some code. So a NOP is not an option.
And a compiler error... well that is possible, but the thing is: We do want a way to express in safe code that "this is an impossible situation we are in, so I do not need to write any more code here". The function I wrote above looks funny because it is impossible to call, but that's just because I am writing concrete code. (Also this is all to some extend counterfactual reasoning, so intuition can get a bit wonky here if you haven't seen it in its "pure form", e.g. in type theory.) This is in fact very similar to the ()
type -- many other languages do not have it (void
in C, C# or Java is not a type, you cannot use it for arguments). And indeed it is totally useless to write a function that takes an argument x: ()
. However, once you get to generic code, you can see the price that C# is paying: Abstracting over function types has to always handle both the case "f
returns some type T
", and "f
returns void
". By making ()
a proper type, Rust has avoided this situation, making generic code so much easier to write -- now "there is no data here" is just a special case of "there is some data of type T
here" with T = ()
. Now we just have to also make !
a proper type so that "this is impossible" is also just a special case of "there is some data of type T
here" with T = !
. Then it'll be beautiful.
For example, say you have plenty of code that works with a generic Result<T, E>
, but then you have some specific code that knows that a certain action cannot go wrong, so it uses Result<T, !>
. Now you want to get the data out there, i.e., turn the Result<T, !>
into a T
. You know there cannot be an error, so this should be possible in safe code without writing panics or so, and without run-time checks. Here is how:
fn bar<T>(x: Result<T, !>) -> T {
match x { Ok(t) => t, Err(!) }
}
And then many people think that having to write the impossible cases is not very ergonomic, so they want to be able to write
fn bar<T>(x: Result<T, !>) -> T {
match x { Ok(t) => t }
}
This is particularly useful when combined with irrefutable patterns in let
:
fn bar<T>(x: Result<T, !>) -> T {
let Ok(t) = x;
// Now go on using `t`
t
}
The "completely empty" match x {}
is then again just the case that falls out of generalizing !
to be a type that is usable anywhere. It is an edge case, and it is very useful for discussing properties of types like !
but not very useful to motivate them.
I would've expected this to be a compiler error for a completely different reason: ()
is not a value of type i32 (or perhaps !
is not an i32; what type is an empty match anyway?).
Did you mean one of these more plausible cases?
fn foo(x: !) -> ! { match x {} }
fn foo(x: !) -> i32 { match x {}; 0i32 }
In the former, I don't see why any code would be generated at all. In the latter, obviously we need to generate something for the return value, but I'm still not seeing a reason for the match x {}
to not be a no-op. Nor why any of these cases have any significance for UB rules.
But this also feels like it's off in the weeds far away from your original point. I'm already familiar with the motivation for the !
type existing in the first place and why Result<Foo, !>
is a thing and all of that; the fact that I provoked you to reproduce all of that background implies I've completely failed to say anything comprehensible, so I'm just going to come back to this after I get off work and can focus on re-reading all the posts.
How about: "Type is uninhabited if it's impossible to create a valid instance of the type"? It's important to properly define term because it can open gates for additional optimizations. For example by making struct Foo { a: u32, b: ! }
uninhabited, compiler will be able to eliminate execution paths in which this type is used. To prevent this in generics contexts you will need to use Inhabited
trait bound. Ad-hoc approach can get us only so far, and may create problems in future.
Currently &mut T
to uninitialized memory is considered UB (even in unsafe code) because you can safely create &T
from it. Note that the real source of UB is possible read from uninitialized memory, but higher level rule is used, which is easier to track and enforce. It's matter of language philosophy: do we use higher level, easier to grasp rules to define UB, or finer-grained, lower-level rules which allow more freedom, but in return require significantly more knowledge and attention from programmer?
If we use higher-level approach, then &!
should be considered uninhabited and creating it in executable code will be UB. However *const Void
(in FFI code you usually see it, not &Void
) will be inhabited and safe to work with.
There is no ()
in my code...? And also no ;
that might produce a ()
implicitly.
And I meant the code exactly as written. Why would it be a compiler error? The target type does not matter, we can change this to
fn make_something_up<T>(x: !) -> T { match y {} }
The key is that we have, as input, a value that cannot exist. From that, we can derive a contradiction, which frees us from the obligation to produce a result.
This is the programming language version of "ex falso quodlibet", meaning that from something impossible we can derive anything.
Another way to think about it is: With enum Bool { True, False }
, when we match, we have two possible cases, corresponding to two match arms. So how many possible cases do we get when we match an enum Void {}
? None at all!
Now I am indeed confused, because from there to the match we are discussing here it is just a small step. I am looking forward to your comments after you had some more time to think about this
Then every type is inhabited, just use transmute
.
So this is about optimizations after all? But then you pretty much already bought into my definition: For optimization purposes, the only part we can really rely on is for the program not to have undefined behavior. So the only types we can assume are uninhabited are those where having them would cause undefined behavior instantaneously. This definition will be set once we decide on the basic invariants for every single type that the compiler (NOT the language!) offers.
What we do for match
, and what these basic invariants are, are separate discussions. This thread is mostly about the former. The main interaction is around &
, but it seems more and more likely that e.g. &bool
pointing to 3
is not UB until the data is loaded from memory, so similarly, having &!
is not UB until loaded.
"used" is again a fuzzy term, remember the example from above:
let x : Foo;
x.a = 4;
This is legal safe code (on nightly). Does this "use" the "uninhabited" x
?
I have yet to see a case where Inhabited
would help.
The only reason !
and "uninhabited" is special in the context of match
is that it leads to the situation that you do nothing (i.e., write no match arm) to state something ("this is impossible").
Once we go away from match, once we talk about which assumptions the compiler can make and which data may be "used", !
is not special and inhabitedness is just a red herring. These are important questions, but they are just as important and just as hard for types like bool
.
And this is not even about "uninitialized" data. It is 100% about valid data. For example, with this
#[repr(u32)]
enum Test { T = 1 }
Both mem::zeroed::<Test>()
and mem::zeroed::<!>()
are immediate UB, because in both cases we are creating an invalid value.
You keep bringing up inhabitedness for discussions where it really is not relevant.
No, that's not correct. We do not know if &mut T
to uninitialized memory is UB, but if it is, then (a) it has nothing to do with being initialized (the real question is whether the data in memory is valid; e.g., &mut Test
pointing to all-0 memory would be the same), and (b) it as nothing to do with &T
. Please stop making definite claims about things that are not decided upon yet.
So, let's go back to talk about !
patterns and match, shall we? We have drifted quite far away. And I am not even sure what you are actually proposing we do for match, i.e. whether you think that having a !
pattern to mark a "use" of an uninhabited type is a good idea or not. Judging from the fact that you talk about "uses" being important, I would think yes, but then I am not sure what we are arguing about.
You are missing "valid" keyword.
No, the difference between out definitions, is that you incorporate in you definition current behavior of compiler, which is a no-go in my opinion.
OP post covers larger set of issue than just match
statements, one which is "should we consider &! or not".
I was talking about "initialized value of uninhabited type in executable code is UB". So according to this definition this code is correct, because here we have only partial initialization, so if in future compiler will be able to track partial initializations (IIUC nightly does exactly this) you will be able to use x.a
for reads and writes
Uninhabited non-ZST types should be processed in the same way as !
. For example matching on Result<(), Foo>
should be no different from matching on Result<(), !>
.
No it's about determining if type can have valid values and possible uses of this information for optimizations.
std::mem::uninitialized
docs are pretty definite about it:
Note that it's written in general, even if you work with plain bytes. Initially I had a similar discussion (with opinion close to yours) with @gnzlbg in this issue.
Probably it will be better if moderators will split our discussion into a separate thread.
(This is my last reply in this off-topic part of the discussion.)
Oh, great, another undefined term. Now what is "valid"?
No I am not. Undefined behavior is an abstract concept defined as part of the language semantics, it is not defined by whatever the compiler does currently.
(You had another sentence here that just disappeared. Good, then I will not respond to it.^^)
Ah, finally something where I can see the relation to this discussion. So you are essentially saying one of these two things:
!
as a pattern should apply more generally to uninhabited types, not just specifically to empty enums and!
(which is nothing but a built-in empty enum). I disagree, because empty enums are a very simple concept that is easy to explain; uninhabitedness is much more subtle. Pattern matching is already very complicated, I'd like to avoid stuffing more complication into it.- Auto-never should generate the pattern
Foo { a: _, b: ! }
, letting you leave away match arms. That is a possible extension of auto-never, I do not have a strong opinion here.
I am not sure which of these positions you are arguing for though.
And maybe you are also arguing that self-referential structs like your earlier example should let you leave away match arms. While that has some appeal to it, it's also really complicated with (from what I have seen so far) no benefit. Having an analysis to detect this case, and having to explain this to everyone learning Rust, is a cost only worth paying if we get something out of this.
Pattern matching is, to a large extend, about enum
, so I feel pretty good about restricting its reasoning about uninhabitedness to enums. When teaching this, we should then probably avoid talking about uninhabitedness in general and related the !
pattern directly to empty enums specifically, not uninhabited types in general. I think that makes everything much simpler while getting us almost all of the benefits and staying coherent.
These docs are written extremely conservatively because we do not know the exact rules yet.
But I appreciate that it can be hard to see what is a conservative approximation, and what is definite. The sad part is that almost nothing is definite.
Why not ask what is value or type? As I've said defining inhabitness and incorporating it into language is not trivial. This is why I propose to start from "trivially" inhabited and uninhabited types and gradually expand definition from there.
So then you use in your definition fuzzy term UB. (which is not fully defined for Rust right now) In my construction UB is dependent on definition of uninhabitness, so in some sense both our definitions can be equivalent, just formulated from different ends.
Not quite (but certainly not the first variant), I argue that Err(e) => { .. }
should be the same for compiler for all uninhabited types, be it ZST (!
, Void
) or non-ZST (Foo
). And maybe compiler should be able to reduce non-ZST uninhabited types to effectively ZST. (In other words Result<T, Foo>
will be equivalent to just T
in memory)
Ideally, yes. But to start things off I propose to use auto-derived (in trivial cases) unsafe Inhabited
and Uninhabited
traits. In some corner cases compiler will not be able to perform derive, but it's not a big problem. (though programmer will be able to implement Inhabited
trait by hand) And if type is Uninhabited
for compiler it will be practically equivalent to !
, up to reduction to ZST outside of partial initialization contexts. Thus you will be able to use always_ok
or let Ok(v) = res;
on Result<T, Foo>
. This will aslo allow us to keep pattern matching complexity at bay.
Uninhabitedness is but a detail in the notion of "validity" that UB really cares about. Making uninhabitedness special for UB seems rather ad-hoc to me.
UB should not be fuzzy. It is, currently, but that's something we are working on.
All I really did was relating inhabitendess to UB, but of course that is not a full definition. So a better picture of what I imagine is:
- We have a notion of what makes a value "valid". This definition should be as simple as possible, because it must be in every unsafe programmer's head all the time and we do not want to make their work harder than necessary.
Notice that "safe code can construct the value" does not work as definition; at this point we have not yet defined what it even means to execute any code! So talking about what code does at this point leads to a cyclic definition. - We define what happens when code is executed; in particular, we make it UB to ever read an invalid value: In MIR, whenever we have evaluated an
Operand
and proceed to perform some operation on it (writing it to aPlace
, applying an arithmetic operation, passing it to a function -- whatever), that operand must be valid for the given type. T
is uninhabited if a value has no valid inhabitant and hence it is UB to ever access something of typeT
.
I get the sense we actually mostly agree on this rough structure. We just disagree on what it means for particular types to be valid, as in, step 1. Do you agree?
Specifically, to make your cyclic Foo
invalid would require a very complex analysis of the type that takes into account uniqueness of references, whereas I am proposing a much simpler definition (basically, "must be non-NULL and aligned"). But anyway the concrete definition should not matter much for what we do about match.
I think, yes, we mostly agree with each other. Unihabitness is indeed just a subset of “validity”-related UB. I think we disagree on the following points:
- Do we need traits to describe if type is inhabited or not? I think this not only allow some nice things (see
always_ok
andalways_err
), but also will make life of unsafe programmers a bit easier, as they will be able to slap trait bound and forget about uninhabited types corner case. - How “trivial” uninhabited non-ZSTs (e.g. structs or tuples with
!
element) should be handled? I believe that outside of partially initialized contexts they should be reduced to essentially!
. (currently Rust does not perform this optimization) - To consider having value of
&!
UB or not. I have a conservative stance and think that it will be easier to require&
and&mut
to point only to valid values even in anunsafe
code. - That ideally Rust should be able to recognize non-trivial uninhabited types and handle them in the same way as “trivial” uninhabited types. (and e.g. also warn user) But I am fine with postponing it as not practical enough.
There's already a topic for that. Please stop hijacking this one.
I'm not debating the usefulness of the !
pattern, I just want to (maybe) see
different rules for auto-never. The post specifically talks about (u32, !)
and how it's not going to be auto-nevered, but it doesn't really explain why.
Do you mean (y, _)
in the top example? Because if not I'm very confused
Assuming you are, consider these examples of allowing (u32, !)
to auto-never:
let mut x: (u32, !);
x.0 = 123;
match x {
// This is a compiler error. We are implicitly matching on x.1 and x.1 has
// not been initialized.
}
let mux x: Uninit<(u32, !)> = Uninit::uninit();
unsafe { x.value.0 = 123; }
match x.value {
// This is a compiler error. We are taking the value of a union member
// outside of an unsafe block
}
let mux x: Uninit<(u32, !)> = Uninit::uninit();
unsafe {
x.value.0 = 123;
match x.value {
// This is a compiler warning. We are auto-nevering a (u32, !) and
// auto-nevering is always a warning inside an unsafe block
}
}
In all these cases, the compiler does tell you when you're missing branches.
Is there any situation at all where auto-nevering (u32, !)
can be a footgun? I don't see how there can be since, in safe code any !
that the compiler thinks is initialized really must be initialized, and in unsafe code using auto-never is a warning.
If I have
let x: Uninit<!> = ...;
... // do some stuff
match x.value {
!, // assert that my ! has been initialized
}
and I refactor to
let x: Uninit<(u32, !)> = ...;
... // do some stuff
match x.value {
!, // assert that my (u32, !) has been initialized
}
Then I'm not introducing any bug that wasn't already there. The !
-match in the old code was already asserting that the !
was initialized.
I'm totally on-board with the !
pattern and the general idea of the auto-never mechanism. Well done for coming up with this .
It just don't see why we should start with an illogical rule that the blog post specifically advocates but doesn't justify.