Make invalid primitive values that are not read valid


#1

Currently the reference states under “Behaviour Considered Undefined”:

Invalid values in primitive types, even in private fields/locals:

  • Dangling/null references or boxes
  • A value other than false (0) or true (1) in a bool
  • A discriminant in an enum not included in the type definition
  • A value in a char which is a surrogate or above char::MAX
  • Non-UTF-8 byte sequences in a str

However I would like to argue that it’s valid to do any of these things as long as its not observed. That is, each of these should be equivalent to mem::uninitialized(). This is simply a matter of consistency. There does not seem to be any fundamental difference between

  • let x: bool = mem::transmute(3);
  • let x: bool = mem::uninitialized();

#2

Once upon a time you could observe all private fields with {:?} formatting and the compiler-inserted introspection it used. I assume that was one root of ensuring private fields have valid values. Is there anything else, that remains in current Rust, though?


#3

Oh and I discovered of course that observation migth appear where the user does not consider it:

Some(mem::uninitialized::<&T>())

Reading the enum tag equals reading the “inner” value, because of the enum layout optimization. Consider what struct and enum layout optimizations are still valid if we release the private fields restriction.

For a more vivid picture, imagine the Option is in one crate’s code, and the unintialized &T is nested inside a struct in another crate’s code. This means that pointers must not be uninitialized ever (?).

How does arrayvec avoid this mess? It uses an #[repr(u8)] enum NoDrop<T> { A(T), B }repr(u8) fortunately is a barrier to layout optimization across the inside/outside of NoDrop<T>. (Which I tried to document)

Since both arrayvec and servo’s smallvec need to use an uninitialized (or zeroed, same problem) interior, it would be nice to solve this officially. There’s some proposals towards that in the manually drop RFC.

Edit: I think I initiated Gankro’s question with a question, so I’m shooting down myself here. I’m happy as long as we discuss these issues so that we get to know the language better, and that’s exactly the goal of Gankro’s current project too right?


#4

One potential issue is that compilers are usually at liberty to propagate information about valid ranges of values, so if you had

let x = 3;
let y: bool = mem::transmute(x);
if x > 1 { panic!(); }

the compiler is usually at liberty to say: “sweet, x is either 0 or 1, so we can optimize out that panic”. I don’t know what LLVM actually does, but … ew.


#5

Hmm, good point!

This is of course equivalent if you s/3/mem::uninitialized(), but subtly different because the uninit-ness back-propagated into things that were definitely initialized!

I think this is sufficient to establish that uninitialized memory is special because it is always invalid for every type. Although this doesn’t seem to affect str…?


#6

We currently mark arguments of reference type with the LLVM dereferenceable attribute, so even just passing a dangling/null reference to a no-op function can lead to UB.

Honoring the current version of the rule seems like it shouldn’t be a big burden anyway: you can usually dodge it using transmute. You can use raw pointers instead of references, u8 instead of bool, a struct with layout equivalent to an enum instead of an enum, a u32 instead of a char, and a &[u8] instead of an &str.


#7

I should say: I really like the idea of trying to thin down the amount of undefined behavior.

Here is the one holding me back (that I know of): “reads of undef (uninitialized) memory”. As part of serializing slices of types that may have byte padding (e.g. (u8, u64)) a transmute to a &[u8] reveals some undef values (the byte padding), but … I don’t see why anyone cares; LLVM doesn’t seem to care (it uses undef for the value, rather than poisoning it), but maybe Rust really does care.

If the badness of this case were captured by the case “invalid values in primitive types, even in private fields/locals”, because undef is often invalid (though not for u8), that would make me very happy. I don’t know if it is, nor whose feet to hold against the fire to find out… :slight_smile: