I was almost ready to do that, but yesterday I talked to some other people about an upcoming paper and learned some new things. I haven't yet fully digested what this means for the alloca icmp folding, but there are some more fundamental observations. Namely, folding int2ptr(ptr2int(x))
to x
is unsound -- assuming we also permit the optimization where after if (x == y)
(with x
, y
of integer type), we can replace x
by y
. (Notice that permitting this replacement at pointer type is also unsound if you want to do any kind of provenance tracking, so treating the two kinds of comparisons equivalent is not sound, either -- unless of course you don't even permit this kind of replacement for integers.)
The long story is at https://bugs.llvm.org/show_bug.cgi?id=34548, here is the short story:
Start with the following C program (on 32bit, because I am too lazy to write uintptr_t
all the time):
void f() {
int p[1], q;
int ip = (int) &p[1]
int iq = (int) &q
if (ip == iq) {
*(int*)iq = 0
}
}
This is not UB. Notice that the &p[1]
is subtle because this is out-of-bounds, but it is one past the end of an array object, so taking this address is legal. Comparing this address at pointer-type results in unspecified behavior, which I would assume to mean the comparison is non-deterministic.
Let us first use substitution based on the ip == iq
to replace iq
by ip
, and then propagate the definition of ip
, and we arrive at
void f() {
int p[1], q;
int ip = (int) &p[1]
int iq = (int) &q
if (ip == iq) {
*(int*)((int) (&p[1])) = 0
}
}
Notice that, because comparison at pointer-type is non-deterministic in some cases, this substitution cannot happen when comparing at pointer type (the comparison could evaluate to true once and to false later). But we probably don't want such non-determinism when comparing at integer type, because we do want to be able to do this optimization at integer type.
If we can now fold the casts away, we have
void f() {
int p[1], q;
int ip = (int) &p[1]
int iq = (int) &q
if (ip == iq) {
p[1] = 0
}
}
This last program is UB if it ever enters the conditional, which is a real possibility. (The bugreport has observable miscompilations based on this.)
Essentially, the problem is that just because ptr2int(p) == ptr2int(q)
, we cannot deduce p
and q
to be usable interchangably -- the two pointers could have very different provenance. Together with the special rule for one-past-the-end pointers, we can have pointers to different "objects" (pointing one past the end of p
vs. pointing to q
) have the same integer representation.
If you allow the substitution based on ==
even for pointer types, then the above counterexample does not need int-ptr-casts. As a consequence, either the substitution is never legal, or comparing at integer vs. comparing at pointer type is a very different operation.