Are there any definitive statements anywhere about the rules Rust follows for testing equality of dangling pointers? I think this is something that should be covered in the language reference.
Concretely, consider the following function:
fn compare_dangling_pointers()
{
let p1: *const i32;
{
let x = Box::new(0);
p1 = &*x;
}
{
let y = Box::new(0);
let p2: *const _ = &*y;
let b = p1 == p2;
println!("Are they equal? {} {}", b, b, p1 == p2);
}
}
With “rustc 1.5.0-beta.2 (b0f440163 2015-10-28)”, this prints “true true true”. I assume “false false false” would also have been allowed, showing that even safe Rust allows some non-determinism here.
Personally, I found that very surprising: In C, once an object is deallocated, all pointers to it have an indeterminate value. This is the same or at least closely related to the values of uninitialized variables and padding bytes. Generally, Rust successfully shields its users from all the complications around indeterminate values - with the exception of dangling pointer comparison, it seems.
Complications around indeterminate values
The rules around what is and what is not allowed for such values are fairly unclear, to the effect that one of the very few formalizations of C [1] makes pretty much anything here (including comparison of an indeterminate value with anything else) undefined.
In LLVM, as far as I know, indeterminate values become “undef”, admitting many surprising effects. There are attempts to formalize the intended semantics of “undef” [2], with interesting consequences: Under these semantics of LLVM, the above Rust program could output any of the 8 combinations of three boolean values. In particular, not only is the result of comparing a dangling pointers indeterminate and hence non-deterministic, it is actually kind of “lazily picked”: Multiple comparisons can have different results. The non-determinism also does not happen when the equality test takes place. Using the same variable that contains such an indeterminate value multiple times can result in different outcomes, and this propagates through further arithmetic operations (think of let b2 = b | 0
, now b2
could still have different values when used multiple times - that’s assuming BitOr
would be implemented in the obvious way for bool
) and potentially even through memory.
I will have to do some more digging to find the official LLVM documentation related to testing equality of dangling pointers, to figure out whether that is really modeled as “undef” or whether they treat this indeterminate value differently. I am reasonably sure that “undef” is used for uninitialized variables, which are also having indeterminate values according to C.
[1] http://robbertkrebbers.nl/thesis.html [2] https://www.cis.upenn.edu/acg/papers/popl12_vellvm.pdf
Indeterminate values in Rust
I was unable to find out whether Rust actually admits the other 6 combinations of output from the above program.
However, unless extra measures have been taken in the translation, I would assume that LLVM actually permits all of them.
I think this is important (and strange) enough to be actually documented in the Rust semantics. In particular, as far as I am aware, this is the only piece of underspecification that Rust inherits from C. Generally, Rust has been very careful not to have anything strange like that, either by ruling out certain cases through its type-system (like dereferencing dangling pointers, or using uninitialized variables) or by explicitly defining way more than C does (i32
is 32bit wide and naturally wraps on overflow - or panics, but nothing else). So, most of the time, there is not a big need for defining the exact semantics of safe Rust programs, they mostly have only one “obvious” behavior. For unsafe Rust, the answer is “go check LLVM’s semantics”, and that’s fine. However, I think it’d be a shame if that would also be needed for (any piece of) safe Rust.
Summary
By allowing comparison of raw pointers in safe code, Rust actually opens a can of worms that - in C - is pretty much equivalent to uninitialized variables. This is not documented in the official documentation (to my knowledge), and it’s unlikely to be widely known. It may be too late to close this Pandora’s box and disallow comparing raw pointers in safe code (this would be a breaking-change), but at the very least its intended semantics should be carefully documented, and should be checked to actually match what LLVM does.
Or maybe I am entirely missing a point here, in which case I’d be happy to become enlightened