Mem::uninitialized, `!` and trap representations

Why deprecated?

unsafe fn unreachable() -> ! {
    std::mem::transmute::<_, !>(());
}

Are you including returning composite types with uninitialized values in some of their fields here? (But the composite value was otherwise normally constructed).

1 Like

Sure. A null (&u32, bool) is just as bad as a null &u32.

1 Like

Gah! My !::Size = -infgut instinct makes me to forget that.

I don’t understand why returning a value would constitute accessing it. If you have a struct, and you encapsulate its field accesses into a safe API, which would just moving it be an access? It seems to be a far step away from where we de facto are at in Rust today.

Because being able to assume that a pointer is not dangling is extremely useful. It allows us to turn this:

fn a() -> &T { bla bla bla }
fn b() {
    let n = a();
    c(); // remember, this function might panic or abort or something
    let p = *n;
}

into this:

fn a() -> &T { bla bla bla }
fn b() {
    let p = *a();
    c();
}

We don’t need to be able to make assumptions like that inside a function because we can just know.

1 Like

Hmm. So this is an interesting point, but it (like &T) seems somewhat separable. That is, we can debate when we ought to recurse into the fields of structs and apply validity predicates there. I guess what you are suggesting is that a field would be required to be valid only when it is actually used?

1 Like

Yes, only valid when it’s used. However, the user is not entirely in control of that, since enum layout optimization can convert accesses to a something else into an access of fields in your struct. So that would have to be exempted somehow.


Now I’ll try to make this more concrete by reimplementing “arrayvec” using the ideas in this thread.

The implementation is here (playground).

ArrayVector<[String; 8]> is there really implemented as ArrayVector { array: [MaybeUninit<String>; 8], len: usize } and I think that was what you wanted us to do. The “HKT” part of associating [T; 8] → [MaybeUninit<T>; 8] seems to be no harder than what the existing code does for “generic” arrays.

My question is the part about initializing part of an array, lines 85-101. There is no expression that allows us to create [MaybeUninit<T>; N] for arbitrary N, so we do as usual, create an uninitialized value and write into part of it. Then reinterpret it as [MaybeUninit<T>; N]. Is that valid? Is it an improvement over using mem::uninitialized() on line 91 (indicated below)?

fn make_array<A, I>(iter: I) -> ArrayVector<A>
    where A: ArrayExt<Item=I::Item>,
          I: IntoIterator,
{   
    // Create an uninitialized array.
    // This is an MaybeUninit<[MaybeUninit<T>; N]> where T is the element type (I::Item)
    let mut array: MaybeUninit<A::RawArray> = MaybeUninit { empty: () };   /* LINE 91 */
    let mut len = 0;
    unsafe {
        // write into the uninitialized space
        for (element, slot) in iter.into_iter().zip(array.value.as_slice_mut()) {
            ptr::write(slot, MaybeUninit { value: element });
            len += 1;
        }
        ArrayVector {
            array: array.value,
            len: len,
        }
    }
}

Sure enough. array.value.as_slice_mut() takes array.value by mutable reference. So you get a (valid) mutable reference to each of the uninitialized slots without ever reading array.value as a value, and write a value to it.

Assuming no ABI issues, you end up with the memory represented like a correctly-partially-initialized array, which the read of array.value transforms into a value.

In your example, there is an explicit access to the value (*n), so it’s not the mere fact of returning it.

The compiler doesn’t necessarily know if it will get that far. Someone could write code so that c() panics whenever a() returns an invalid pointer, and be very surprised when the programs SEGVs or worse because the read got moved over the function call.

1 Like

mem::uninitialized::<&u8> has exactly the same conceptual problems as mem::uninitialized::<!>.

Not really. The compiler can't assume that a value of type &u8 is invalid. It can't make optimizations based on any assumptions about a &u8 other than it being non-NULL. It also can't make decisions about how to structure the program's control-flow-graph based on the mere presence of a &u8. But a &! can never be valid. So the compiler can assume that any code after a line of code that produces a &! is unreachable.

To put this another way: A value of an inhabited type may or may not be valid - we don't know until we "use" it at which point the compiler is allowed to assume it's valid and invoke UB otherwise. But uninhabited types are a special case because we don't even have to use them and we already know that they're not valid. Just having a value in scope means that someone called some dodgy unsafe code which is not observationally equivalent to any safe code. This should be enough to allow invoking UB.

But models will vary in terms of when a &T which is not dereferenced must be valid...

What's the value in taking a subtle approach to this? "! is uninhabited therefore &! is uninhabited" seems the most straight-forward approach. As pointed out, if we have a &! then we know for certain that something UB-like has happened, just like if we have a !. And being able to recognize that something UB-like has happened is what allows the compiler to interpret -> ! as "doesn't return". Can someone explain or give an example where treating &! and ! differently in this regard is useful?

Certainly any type that is not a simple scalar like u8 with no illegal values. e.g., mem::uninitialized::<&u8>() is invalid because the returned value x does not (necessarily) satisfy "x is not null", and yet it was returned. [..] We could deprecate uninitialized -- or at least deprecate it for types that we cannot statically see are reasonable.

Agreed that this is a good long-term option. But for now I think at the very least we should make uninitialized panic when given any known-uninhabited type. Producing definitely-invalid values is even more broken then producing maybe-invalid-until-you-look-at-them values.

I don't want to play this game. The rules for whether a type can be "safely" inhabited can get hairy when you have things like associated types and existentials. I would prefer to not involve them in the definition of what is defined.

Also, I can't find any good reason to have that rule. You don't want to have UB without a reason.

Sure enough. mem::uninitialized::<!>() should be a trap rather than an unreachable. With it being UB, translating it to a trap is fine.

But it can assume that a mem::uninitialized::<&u8> always returns an invalid &u8 (0, perhaps). uninitialized isn't like rand, where it's specified to return a random value. Its return value is simply not specified, and it's allowed to be invalid if that's what's most convenient to the compiler.

This is in fact what Ada (the GNAT flavour) does in debug mode: it initializes all uninitialized values with an explicit invalid value and inserts runtime checks at various use sites.

I mentioned this discussion in this thread, which discusses the impact of a recent change in exhaustiveness checking – it seems somewhat at odds with what we were discussing here.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.