I don’t know of a black-box test case that will prove this for a character type. I can black-box demonstrate that both LLVM and GCC ascribe convenient values to uninitialized char, e.g.
int main(void) { unsigned char x; return x != 0; }
will produce machine code equivalent to
int main(void) { return 0; }
but they could still do that if this only constituted use of an unspecified value. More compelling is the observation that both compilers’ IRs can explicitly represent “this variable is uninitialized” (and provokes UB if used) but not “this variable has an unspecified value”, and if you look at IR dumps you will see “this variable is uninitialized” sentinels used for unsigned char quantities:
*** IR Dump Before SROA ***
; Function Attrs: nounwind uwtable
define i32 @main() #0 {
%1 = alloca i32, align 4
%x = alloca i8, align 1
store i32 0, i32* %1, align 4
%2 = load i8, i8* %x, align 1
%3 = zext i8 %2 to i32
%4 = icmp ne i32 %3, 0
%5 = zext i1 %4 to i32
ret i32 %5
}
*** IR Dump Before Early CSE ***
; Function Attrs: nounwind uwtable
define i32 @main() #0 {
%1 = zext i8 undef to i32
%2 = icmp ne i32 %1, 0
%3 = zext i1 %2 to i32
ret i32 %3
}
*** IR Dump Before Lower 'expect' Intrinsics ***
; Function Attrs: nounwind uwtable
define i32 @main() #0 {
ret i32 0
}
And further note that this still happens if you force x to be in memory with e.g.
static int foo(unsigned char *p) { return *p; }
int main(void) { unsigned char x; return foo(&x) != 0; }
the inliner reduces it to the original and the same thing happens.
I think memcpy was not intended to be allowed to copy indeterminate values, and was intended to have UB when applied to fully uninitialized storage.
I also think that the “padding takes on unspecified values” language discussed earlier was intended to imply that it doesn’t have UB when applied to certain forms of partially uninitialized storage, e.g.
struct S { int x; double d; }
struct S s, t;
s.x = 1;
memcpy(&t, &s, sizeof(struct S)); // intended to be well-defined
but I am no longer certain whether the specification achieves that goal, and anyway it wouldn’t appear to apply to anything other than struct and union types, so
int x[10], y[10];
x[0] = 1;
memcpy(y, x, 10 * sizeof(int)); // probably still UB
This is probably turning into another case of the “what exactly does the C standard mean by ‘object’?” argument that has never been resolved to anyone’s satisfaction, despite, again, multiple attempts to fix the wording.
I think the proper conclusion to be drawn from all of this is that the C standard’s concepts of “indeterminate values” and “trap representations” are ill-specified and should not be used as a template for similar concepts in Rust.
I think we should maybe take a few steps back and try to pin down what we want out of mem::uninitialized and similar Rust language features. The main thing I know mem::uninitialized is useful for is result buffers for C library primitives, e.g. nix’ signal.rs has
pub fn empty() -> SigSet {
let mut sigset: libc::sigset_t = unsafe { mem::uninitialized() };
let _ = unsafe { libc::sigemptyset(&mut sigset as *mut libc::sigset_t) };
SigSet { sigset: sigset }
}
This is effectively a promise to the compiler that sigset was somehow initialized by the time we get to reading it from safe code, so it does not have to emit code to initialize it. I would really like to find a way to express that promise in a way that the compiler actually understands, but it’s a complex thing to express, especially when you start thinking about system primitives like read and recvmsg that can fill in some, but not all, of an uninitialized buffer (and that buffer might be a scatter vector, for extra headaches). Maybe the people doing formal proofs of correctness on unsafe code have some ideas?
There’s been a lot of talk in this thread about the safety of copying partially uninitialized aggregates. Here I think we need to have a conversation with the LLVM team about what their expectations for that sort of thing are — I did a bunch of tea-leaf-reading above, but I don’t actually know. A rule that would make sense to me as someone writing code in Rust would be: copying is always safe, but preserves uninitializedness: e.g. the equivalent of the C above
struct S { i: i32, d: f64 }
let mut s: S = unsafe { S { 0, mem::uninitialized() } };
let mut t = s;
does not have UB in and of itself, but the field t.d is still uninitialized and a subsequent read will have UB. This is what valgrind does, which indicates that it’s been found useful in separating real bugs from chaff.
(I would also be perfectly fine with the existence of a std::mem::byte type such that it was always safe to transmute a datum, initialized or not, to &[byte] and examine the contents. I don’t think either i8 or u8 should have that property, though.)
(These proposals might cause difficulties for code generation on ia64, where uninitializedness is sort-of visible in the hardware (the NaT bits on registers); but frankly as far as I’m concerned ia64 can go jump in a lake.)