Mem::uninitialized, `!` and trap representations

That's what MaybeInitialized is for. If you are working with fire, at least put a warning sign.

Both of these examples WILL be miscompiled if, say, T = &u32. dereferencable + undef = trouble.

This is, of course, totally fine, as long as mem::uninitialized is legal. The trap representation is never loaded in any way.

Of course the compiler can and does do LICM - it's not like your computer literally explodes when you load a trap representation. Even at the source-code level, you can call ptr::read(data as *const MaybeInitialized<T>).

Since basically forever, the rule in Rust is that you can't have invalid values (a boolean with the value 42), but can have references to invalid values (a &bool that points to the number 42) as long as you don't dereference them. This seems to give the most clear and useful semantics (specifying invalid values seems to require extra effort, and forbidding invalid pointees seems to require extra effort).

I don't see any good reason for special treatment of uninhabited types in this context. They are 0-sized types with 1 trap representation and 0 non-trap representations.

The problem is that an &StructWithPadding can have arbitrary bit-patterns in its padding (that's it, unless we place some type-based restriction on the padding's content, like #[single_repr] does), and copying it to an &mut StructWithPadding must be allowed to use memcpy, which would copy these arbitrary bit patterns into your enum's discriminant.

And in any case, the reason I don't like the "closed" definition for MaybeUninitialized is because it has the effect of "basically works, but corrupts your data in some edge cases for no particularly good reason".

Do you have a source for that claim? The reference says that dangling references are not allowed. It does not go into detail as to what constitutes dangling, but I'm sure it's stricter than "contains the address of an allocated section of memory."

&!, as far as my limited analysis goes, is either always dangling or never is. "Always dangling" makes more sense, because we already have a perfectly good never-dangling reference called &().

Don't give much weight to the reference - it was never properly fact-checked, and that section specifically is pretty random. In any case, dangling references are references that don't refer to any valid allocation. The data behind these references has nothing to do with it.

Because ! is zero-sized, &! is indeed never dangling. That's a simple consequence.

In any case, dangling references are references that don't refer to any valid allocation

Why is it useful to bring allocation into the definition? I would have thought a dangling reference is any reference that no longer points to valid data. If a reference got freed then something else got allocated at the same address does make the reference no longer dangling?

No safe code can produce a &bool which points to 42. So surely the compiler can always assume that a &bool doesn't point to 42 (not that it can tell without dereferencing it). In the case of &! though, it can tell without dereferencing that the reference is dangling.

Very interesting thread so far. I think this is an interesting test case for the question of ā€œto what extent can we break existing code if there is a good reasonā€.

It seems inarguable that, if we have !, the most overall consistent semantics is to say that functions cannot return !, and to migrate people so that they use unions in place of mem::uninitialized<!>. I would imagine we might want a targeted lint saying that calling uninitialized for some type T that may not be instantiable (it seems that these same concerns apply to empty enums, I would think) is deprecated, and to prefer unions. (The idea here was to try to avoid tagging too many projects that are using uninitialized as a poor manā€™s out pointer). The fact that uninitialized breaks in a loud way (panic) is a very good thing here, I would think. In any case, it seems like we would also want some kind of warning periods ā€“ in general I donā€™t feel like we have the ā€œwarning-period-then-deprecateā€ rhythm working especially smoothly right now.

OTOH, I think that @arielb1ā€™s original thoughts have a certain appeal as well. It seems to fit into the more ā€œaccess-basedā€ way of thinking. It certainly validates a lot of code that, at first glance, seems quite reasonable to me. My biggest concern is that while it makes some set of code work, it also makes other very reasonable patterns, like those cited by @gereeter illegal. Iā€™m not sure if this is a win.

That definitely can't be right, because there's lots of code that does the 0x01 as *const () as &() transformation. Vec, for example, does it. There's no allocation at address 1.

Doesn't every pointer point to a valid zero sized allocation?

I thought ! is -INF sized. Which makes any talks of the dangliness of &! irrelevant, because you cannot have a value of type &! ever.

@canndrew probably has more info on &!

Not according to jemalloc.

Not according to size_of::<!>().

jemalloc allocates somewhere and yields specific pointers, the question is when you do have a pointer, whether that pointer is a legal pointer to a zst alloc.

imho that should be a compile time error (+ it's not too late to change that).

1 Like
enum Never { }

size_of::<Never>()

should return the same result and it is too late to change that

1 Like

The address 1 contains a valid 0-sized allocation. Probably every non-null address contains a valid 0-sized allocation (we still haven't decided on that).

mem::uninitialized::<&u8> has exactly the same conceptual problems as mem::uninitialized::<!>. The most overall consistent semantics is that mem::uninitialized is removed, and people have to use unions.

It does not make any code illegal that wasn't already so. The SmallVec example was already playing with fire.

Does "as long as you don't dereference them" mean "not at all in the source code" or "not in any code that gets executed"? Because optimisations like LICM can make "code that gets executed" tricky to define and I suspect it's possible to construct an example where a reference to invalid values causes bad things to happen under optimisation.

Isn't it fairer to say that the code miscompiles rather than is illegal (since defining illegality is the hopeful eventual outcome of discussions like this?).

It means "not in any code that gets executed", as in all UB discussion. That's not such a big constraint on the optimizer - if it speculatively executes a load, it just has to not to speculatively execute the "UB on invalid data" part.

What would be interesting to allow is always-safe indirect reads (*****x), but that is something we really are not sure about.

Sure. But it is orthogonal to the ! case that forced this discussion.

@ubsan

Could you elaborate about your intended semantics for unions?

@arielb1 and I were chatting about this today. I think we came to a few conclusions.

The key question comes down to: what constitutes ā€œusingā€ a value?

I think everyone agrees that ā€œusingā€ an invalid value should be illegal. As @arielb1 pointed out in his initial post, this applies to values of type !, but also values of most any type that has illegal values (e.g., &T, Box<T>, etc). One critical point is when returning a value counts as using it ā€“ and in particular if mem::uninitialized deserves special status.

One might imagine a predicate like VALID(a: T) that says "the memory at address a can be typed as T". (Just bear with me for a bit.)

What is using the referent of a reference &T?

Another key question: Under what circumstances if a &T required to point to memory of type T? This is another question that is really bigger than !. It seems clear that if an &T must have a valid referent and T=!, then this code should not be reachable or something ā€œUB-likeā€ has happened.

So, for example, I think that in most any model is you read the referent of an &T, then the referent must be valid (i.e., let x = *r or let x = ptr::read(r)). Similarly, if you assign to the referent of a &mut T, the old value will be dropped, and hence the same is true *r = ... (ptr::write is different in this respect, of course). But models will vary in terms of when a &T which is not dereferenced must be valid. For example, the memory it points at may have been freed. (We may want some rules around fn entry and exit, for example, as I talked about in this blog post).

But I think the logic of: ā€œT is unhabited, therefore &T is uninhabitedā€ is not really valid unless we take a strict position that the referent of an &T must always be valid. I suspect we want rules that are quite a bit looser.

IOW, VALID(a: &T) would not necessarily imply that VALID(*a: T), though it would presumably require that "a is not null". Instead, VALID(*a: T) would only be required at some other times, such as when the pointer is dereferenced.

What about returning a value?

In general it seems like returning a value does require that value is valid. The key question then is whether to exempt mem::uninitialized from this requirement. Currently we do not. But that makes mem::uninitialized unsuitable for almost any type, as @arielb1 pointed out. Certainly any type that is not a simple scalar like u8 with no illegal values. e.g., mem::uninitialized::<&u8>() is invalid because the returned value x does not (necessarily) satisfy "x is not null", and yet it was returned.

So what are our options?

  1. We could deprecate uninitialized ā€“ or at least deprecate it for types that we cannot statically see are reasonable. This would probably be some ad-hoc rules much like transmute. I have no idea how much code would be affected but probably a non-trivial amount. Said code would want to be rewritten to use unions, or at least a MaybeInitialized type that is in the libstd (which is implemented with unions).
  2. We could special-case uninitialized, as @arielb1 initially proposed. This means that returning a value from uninitialized is not considered a ā€œuseā€. It does mean that a trivial wrapper is impossible and so forth.
  3. We could also do both. =) This would preserve existing code while encouraging people to move off of uninitialized and onto more future-proof and well-behaved things, like a MaybeUninitialized type.

In practical terms, the difference between 1 and 3 is that if we only do 1, then uninitialized::<!>() will still panic, whereas under options 2 or 3 it would not.

I would argue that at minimum we should pursue a MaybeUninitialized type in libstd based on unions and deprecate uninitialized. I am not yet sure whether we can get away without special-casing it, but it would be nice if we could ā€“ as ! is not yet stable, I guess we have some time to deliberate? (As @arielb1 poined out, this applies more broadly, but the problem seems most acute for !)

Sounds good, glad we are trying to avoid the special case. On the flip side, when ! lands would would be a good time to make a stable unreachable: ! as one can write it in stable (albeit deprecated) code.

If someone wanted to implement a branch that checks for calls to uninitialized whose types are not known to be scalar or otherwise safe, it'd be great to do some crater runs and try to estimate the impact thereof.

1 Like

Upon further thought, the one downside is I guess that most uses of uninitialized are static, so if we ever get the sort stuff I propose in Stateful Mir, weā€™ll have a double churn of T to MaybeInit<T> and back to T, which is fine but annoying.

1 Like

MaybeUninitialized<T> would also presumably serve as the NoDrop<T> that some have asked for.