Types as Contracts

Oh, right, I completely forgot that optimizing away a “dead” ptr-to-int is not justifiable in intptrcast. And same for dead loads, because a load from memory may happen to load a pointer at integer type and thus also perform a task. That is clearly not desirable. I think I did not realize until during this discussion how many optimizations are lost in intptrcast.

So, it doesn’t seem like we currently have any formal/checkable model that would actually be sound wrt. to LLVM and support the full range of integer operations on pointers. The best proposal I can come up with right now is two implement two modes in miri:

  • A sound, but incomplete mode. “sound” means that (as far as we know) this detects all UB. “incomplete” means that it may flag programs as having UB that do not actually have UB. As far as the pointer model goes, the model currently implemented by miri should suffice (see below). However, soundly modelling noalias annotations remains an open problem, and there are probably other subtle sources of UB in LLVM – so soundness may actually not be possible, but it would be the goal.
  • A complete, but unsound mode. “complete” means that if the checker detects UB, then we really see this as being UB under the Rust rules. “unsound” means that some UB might fail to be detected. miri+intptrcast would be the starting point here.

This would provide an “overapproximation” and an “underapproximation” of the UB we really want, and hopefully, over time, we can close that gap. It is not very satisfying though…


Currently, miri treats memory locations as being abstract (block ID + offset). This can model the fact that locations are unforgeable, and that offsetting a pointer cannot cross from block to block. However it only supports very limited arithmetic on integers obtained from pointer: You can add/subtract an “actual” integer, and you can do bit-ops on the bits that are “below” the alignment (this makes HashMap work). However, you cannot even add/subtract two pointers. You cannot currently compare pointers to different locations for inequalities (less-than etc.), but that could be fixed.

Also, there are some open questions here around overflow: Since the actual base address of the allocation is left undetermined, we cannot tell if (ptr as usize) + 50000 overflows or not, which is something that safe code can observe. Currently, the overflow behavior is as if the base address was actually 0, for any pointer. (ptr as usize) - 1 overflows, but (ptr as usize) + x never overflows for any x: usize.

If at all possible, I would very much like types to be trustworthy outside of explicit unsafe blocks. That is, given this function:

fn foo(&mut bar) {
  // A
  unsafe { func_a(); }
  // B
  func_b();
  // C
}

bar might change between A and B, but I still want to be sure that it doesn't change between B and C, and I want optimizations to take advantage of that.

I don't have any experience with formal modeling, but if we were to model the unsafe block as a separate function, I would expect it to be treated as though everything flows into it (and gets released), and then everything that is still live flows back out (and gets reacquired). Based on my understanding from reading other blog posts (which may well be inaccurate), I'm under the impression that MIR already tracks liveness, so it seems like it should be fairly straight forward to implement the appropriate locks. (Though I understand integrating that kind of analysis into the formal model may be more difficult.)

Thanks for giving a concrete example! I like this idea, though I am not sure if I can implement it. :wink: Basically, we could make use of the fact that unsafe is a block and hence always well-nested. Let's assume we have a lifetime for every unsafe block. Upon entering an unsafe block that does not have a surrounding unsafe block, we could suspend all our variables (it should be possible somehow to figure out which ones these are) for this block's lifetime. So all the locks would come back when the unsafe block ends. That would handle your example.

One problem with this proposal is that suddenly, the exact extent of unsafe blocks matters. We could change this to do the suspend-reacquire around calling any unsafe function (ignoring unsafe blocks), but then e.g. split_at_mut would still be UB.

Personally, I think it makes far more sense for the extent of an unsafe block to matter than the function containing it. Unsafe blocks are already places where rules are relaxed and additional operations are allowed.

I do think we would only want to reacquire variables that are still live after the block. This would nicely mirror NLL, and avoid some situations where code that might appear correct was actually UB. (I feel like I had a nice example of this last month, but I neglected to respond and have since forgotten it.)

Perhaps a good example would be split_at_mut. Since self isn’t live after the unsafe block, it would be correct using my formulation. swap would also be correct, as both arguments would be released entering the unsafe block.

@hanna-kruppe I had to think about this some more, and came up with this litmus test: https://godbolt.org/g/AkZYVy.

We have a function that compares an integer against all possible integers using comparison at pointer-type:

static void compare_to_all(uintptr_t i) {
    uintptr_t cur = 0;
    while(true) {
        if ((int*)i == (int*)cur) return;
        if (cur == UINTPTR_MAX) break;
        ++cur;
    };
    unreachable();
}

and then we call that on the address of a local:

void test() {
    int i = 1;
    compare_to_all((uintptr_t)&i);
}

This optimizes to

test(): # @test()
  jmp unreachable() # TAILCALL

Even I can read this piece of assembly. :slight_smile: (gcc, btw, produces the same final assembly.)

First I thought this is definitely a miscompilation. However, then I was reminded that casting out-of-range pointers to integers is UB in C. So gcc is actually perfectly in its right to do whatever; for LLVM, we have to look at the IR of the comparison:

  %5 = load i64, i64* %2, align 8, !dbg !29
  %6 = inttoptr i64 %5 to i32*, !dbg !32
  %7 = load i64, i64* %3, align 8, !dbg !33
  %8 = inttoptr i64 %7 to i32*, !dbg !34
  %9 = icmp eq i32* %6, %8, !dbg !35

From what I can tell, inttoptr is never UB. However, all of this made me realize something I had missed before: This alloca-optimization we talked about only kicks in when icmp works at pointer type! That’s great, we can have a rule saying something like comparisons at pointer type require both pointers to be allocated or 0; otherwise the result is non-deterministic.

Does this make sense?


There are other troublesome examples that do not involve comparisons at pointer type, but instead involve noalias. I am not surprised by noalias making trouble. :wink: However, this does show the problems of pure lock-based approaches: [Playpen]

#[inline(never)]
fn id(x: usize) -> usize { x }

fn main() {
    let mut x = Box::new(0u32);
    let xp = &mut *x as *mut _ as usize;
    drop(x);
    let mut y = Box::new(0u32);
    let yp = &mut *y as *mut _ as usize;

    if id(xp) == yp {
        unsafe { *(yp as *mut u32) = 2u32; }
        unsafe { *(xp as *mut u32) = 1u32; }
        println!("{}", unsafe { *(yp as *mut u32) });
    }
}

When compiled with optimizations, this prints 2. IOW, the pointers compare equal (at integer type!), but the write to xp does not affect the following read from yp. I cannot come up with any other explanation for this other than somehow making this program UB. (If this was a comparison at pointer type, since xp is deallocated, the comparison would be non-deterministic and could evaluate to true even if the pointers are different, explaining the behavior. But when comparing at integer type, I don’t think such a rule can be made to work. After all, 3 == 2 will probably compare two un-allocated pointers, and we certainly want the result to be deterministic.)

However, with a lock-based approach satisfying the principle “if you can access an address, it doesn’t matter which alias you use”, the above program cannot be UB – it is certainly fine to access yp, and hence because xp is an alias in the conditional, accessing xp cannot be UB. (Unless, again, we find some justification for the conditional to be non-deterministic.)

Interesting enough, I was not able to reproduce this kind of optimization in C.


Another point that surprised me here is that LLVM has special optimizations for alloca that seem mostly orthogonal to noalias-based optimizations. I would have expected alloca to just have a noalias annotation and then get all of its optimizations from there.

LLVM happily equivocates between pointer and integer comparisons, in particular it will turn icmp iN, (ptrtoint %p), (ptrtoint %q) into icmp T* %p, %q [1] during routine canonicalization. So no, it doesn't really make sense from an LLVM perspective to strongly distinguish pointer comparisons and integer comparisons. It does sound like a good rule for Rust's semantics, I just expect it to be superseded by a more general rule that handles more LLVM craziness.

[1] This is currently sometimes inhibited if the pointers have different pointee types, but pointee types are a fiction and on the way out anyway.

Interesting example, though I can't say it surprises me (AFAICT it's a natural consequence of the icmp-of-ptrtoint simplification mentioned above).

Well, not any way that doesn't include provenance tracking. We could make integers "derived" from pointers as wobbly and non-deterministic as the underlying pointers are /me waves hands

Allocas are considered to not alias many other pointers (see BasicAA), just not via the noalias attribute, so optimizations based on "does this alias that" will fire for allocas. I'd guess some optimizations are just simpler or cheaper to do for allocas than for arbitrary other pointers. Consider, for example, how the alloca comparison folding checks whether the alloca escapes.

Is this possibly related to the based on rules for pointers in LLVM? The pointers computed in compare_to_all are based on only the null pointer, which is associated with no address.

So I think the rule can be stronger than just "allocated"; it can be non-deterministic for any pointers not into the same "allocated object" (in the GEP inbounds sense, whatever that precisely is).

Oh. But then, shouldn't it be possible to change my example to use integer comparison, and still have the entire loop optimized away? Also, why does this alloca optimization check for pointer types if that doesn't matter?

Well, given the canonicalization you mentioned, it would be a wrong rule.

Also, I realized in the mean time that even the rule I gave is not even sufficient to explain the optimization with pointer types: The loop will definitely cover the actual address of the local, and casting that to a pointer and comparing it cannot be non-deterministic.

The point is, provenance tracking would be used to explain why the program is UB. Or are you suggesting provenance tracking would legalize that optimization without making the source program UB?

I see. However, the point remains that there are extra optimizations that fire only for allocas, like the comparison folding. It is those optimizations that I am having trouble with right now. :slight_smile:

I haven't found a clause in which this is related to pointer comparison. Also, "based on" IIRC is about noalias while the alloca optimization is independent of that.

I disagree; comparing whether two valid pointers from different allocations are equal should be guaranteed to return "not equal".

This is getting into information flow control, which is in fact not dynamically enforceable. :wink: For example, is the result of this function "derived" from its argument?

fn id(x: usize) {
  let mut y = 0;
  loop {
    if x == y { return y; }
    y += 1;
  }
}

LLVM's provenance tracking says "no". However, LLVM still optimizes this the same way. Strange enough, this only applies if I keep the #[inline(never)]. If I let LLVM do inlining or remove the call to id altogether, it optimizes the comparison away entirely. Seems like it does do noalias-based optimizations of comparisons, after all?

Or was it smart enough to turn the boxes into allocas and then the alloca-icmp-folding kicked in? But this happens even if I add other comparisons, so I don't think this can be it.

Well, the loop is optimized away, just into a return; instead of unreachable();. There could be a few reasons for that, from optimizations being intentionally conservative to not break real C and C++ programs, to missing optimizations, to phase ordering issues (optimizing compare_to_all before inlining and seeing that i is the address of a local), to conflicting optimizations (e.g., if you cast to int* and back to uintptr_t, those two conversions may be cancelled out early, putting you back at square one and losing the information that there was a pointer comparison).

That something is a pointer does matter, what doesn't matter is what type of value it points at. The plan is for future LLVM version to have a simple type ptr, and only describing/typing the memory at that address when actually accessing it. This gets rid of pointless bitcasts (that impede optimizations, unless they specifically go through the trouble of seeing through bitcasts) and doesn't lose any information (because of the aforementioned bitcast).

Well, not wrong, just insufficient. What I meant is not "let's adopt this rule" (clearly we need something stronger) but that strikes me as a reasonable property to have.

Perhaps I'm stretching the term "provenance tracking" here. In plainer terms, it's conceivable to say that integer comparisons are non-deterministic if at least one of the operands is "derived from" (leaving aside that we still don't have a good definition of this, this is a general problem) a pointer and the equivalent pointer comparison would also be non-deterministic.

However, this sort of thing is treacherous ground, so I don't want to put any more weight than "it's conceivable" behind this idea.

The fact that these optimizations are implemented for allocas doesn't have to mean that allocas have special semantics. It may just mean that some optimizations are less practical (e.g., break too much real C and C++ code, or would be disproprortionally harder to implement, or don't bring enough benefit, or don't neatly fit into the existing pass pipeline) to implement in general than for allocas specifically. Or nobody ever bothered to implement them.

Doesn't seem like it: Compiler Explorer

Can you share the version with multiple comparisons?

Even then, icmp ptr and icmp i64 will be syntactically different. You said above, however, that the two are considered equivalent. The alloca optimization has a specific check to distinguish these two cases.

Here you go: [Playpen]

I made it extra-complicated to try and make sure that the extra comparisons don't just get optimized away.

Come to think of it again, I don't think equivalent is quite right, apologies. LLVM prefers pointer comparisons to integer comparisons even when it could remote inttoptr operations by using integer comparisons (see Compiler Explorer) though I'm not sure if that's because this form is more convenient (e.g., easier for pointer-based optimizations to handle) or if there is some reason this transformation would be semantically problematic.

Anyway, I have two replies for this point:

  1. The icmp (ptrtoint %p), (ptrtoint %q) -> icmp %p, %q transformation is also part of instcombine, and instcombine is run to a fixed point, so (even a single run of) instcombine can perform the alloca comparison folding even if the input IR does it as an integer comparison.
  2. When the above does not apply (e.g., comparing an alloca with an integer constant), yes, it may miss possible optimizations, but missed optimizations are a fact of life. Specifically, the implementation uses GetUnderlyingObject — a very cheap and limited procedure that does not look through ptrtoint casts. Furthermore, int<->ptr casts are considered (by omission) to escape the alloca. So even if the check for pointer types was removed, the optimization still wouldn't fire. This raises the question of why the check is there at all. My best guesses are "for compile time" and "author wasn't sure / didn't want to rely on what GetUnderlyingObject does with non-pointer values".

Furthermore consider that optimizing icmp iN (ptrtoint %alloca), %n may not help any real code bases (recall that the version that is in LLVM trunk measurably decreased binary size of Chromium).

Now, don't get me wrong, it's entirely possible that there were also concerns about optimizing comparisons of ptrtoint'd allocas and it's deliberatly not done. I'm just one guy with some LLVM expertise. But everything you and I have found in the code and discussed here, as well as the discussion on the patch that introduced it (⚙ D13358 InstCombine: Fold comparisons between unguessable allocas and other pointers) seems consistent with "the stronger optimization would be valid, but wasn't implemented for boring reasons". Still, maybe it's time for a mail to llvm-dev asking for clarification?

Thanks! Unfortunately I can't get godbolt to optimize the IR as rustc does (maybe because Rust's allocation functions aren't recognized by stock LLVM?), so I'll have to get more creative about analysing which passes do what here (worst case: -print-after-all, blegh).

I was almost ready to do that, but yesterday I talked to some other people about an upcoming paper and learned some new things. :slight_smile: I haven't yet fully digested what this means for the alloca icmp folding, but there are some more fundamental observations. Namely, folding int2ptr(ptr2int(x)) to x is unsound -- assuming we also permit the optimization where after if (x == y) (with x, y of integer type), we can replace x by y. (Notice that permitting this replacement at pointer type is also unsound if you want to do any kind of provenance tracking, so treating the two kinds of comparisons equivalent is not sound, either -- unless of course you don't even permit this kind of replacement for integers.) The long story is at https://bugs.llvm.org/show_bug.cgi?id=34548, here is the short story: Start with the following C program (on 32bit, because I am too lazy to write uintptr_t all the time):

void f() {
  int p[1], q;
  int ip = (int) &p[1]
  int iq = (int) &q
  if (ip == iq) {
       *(int*)iq = 0
  }
}

This is not UB. Notice that the &p[1] is subtle because this is out-of-bounds, but it is one past the end of an array object, so taking this address is legal. Comparing this address at pointer-type results in unspecified behavior, which I would assume to mean the comparison is non-deterministic.

Let us first use substitution based on the ip == iq to replace iq by ip, and then propagate the definition of ip, and we arrive at

void f() {
  int p[1], q;
  int ip = (int) &p[1]
  int iq = (int) &q
  if (ip == iq) {
       *(int*)((int) (&p[1])) = 0
  }
}

Notice that, because comparison at pointer-type is non-deterministic in some cases, this substitution cannot happen when comparing at pointer type (the comparison could evaluate to true once and to false later). But we probably don't want such non-determinism when comparing at integer type, because we do want to be able to do this optimization at integer type.

If we can now fold the casts away, we have

void f() {
  int p[1], q;
  int ip = (int) &p[1]
  int iq = (int) &q
  if (ip == iq) {
       p[1] = 0
  }
}

This last program is UB if it ever enters the conditional, which is a real possibility. (The bugreport has observable miscompilations based on this.)

Essentially, the problem is that just because ptr2int(p) == ptr2int(q), we cannot deduce p and q to be usable interchangably -- the two pointers could have very different provenance. Together with the special rule for one-past-the-end pointers, we can have pointers to different "objects" (pointing one past the end of p vs. pointing to q) have the same integer representation.

If you allow the substitution based on == even for pointer types, then the above counterexample does not need int-ptr-casts. As a consequence, either the substitution is never legal, or comparing at integer vs. comparing at pointer type is a very different operation.

Did you mean to write &q?

Did you mean to write &q?

Yes, sorry. I will fix the post.

1 Like

*wince* Oh, that's absolutely horrible. I love it.

Nitpick:

Notice that, because comparison at pointer-type is non-deterministic in some cases, this substitution cannot happen when comparing at pointer type (the comparison could evaluate to true once and to false later).

Well, LLVM is allowed to make stronger assumptions about its pointer-comparison operator than the C or C++ spec does, if it wants to. I'd say it's more that the problem with substituting in this case and the non-determinism of pointer comparisons have the same root cause: the fact that compilers want to be able to treat pointers differently even if they have the same numerical value.

For anyone else reading - the fact that this is unspecified and not undefined behavior is significant, because if it were undefined, the misoptimization would be justifiable (since anything can happen in a program containing undefined behavior).

By the way, the C standard is more vague than the C++ standard, but the wording suggests this might not be allowed to be nondeterministic:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. ^109)

  1. Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.

But that doesn't really matter, since the issue isn't related to the compiler folding away ip == iq (to either true or false).

Anyway, I'm quite happy to learn about this bug, because whatever the eventual fix is, it seems likely to make our problem at least slightly more tractable!

:wink:

Oh, of course. But the assumptions I made about LLVM here are fairly small: I assumed that the translation from C to LLVM in the beginning is correct, and then in the following I just used C code as a more readable representation of LLVM IR. For the final code, I assume that accessing an inbounds getelementptr that's one-past-the-end is UB even if that address is otherwise valid, which I am fairly certain is how LLVM would look at this.

Agreed. However, then we would have comparison be UB (or at least an unspecified value, branching on which may be considered UB), which would be a big problem -- you could no longer do things like hoist comparisons out of loops. Comparisons would be at least as problematic as integer division when it comes to code movement.

:+1:

Btw, not being able to fold away int2ptr(ptr2int(x)) has some further consequences. Consider the following, and assume we turned off C’s type-based alias analysis (because we are actually looking for a Rust semantics here and just using C syntax):

int *foo(int *a) {
  int **p = &a;
  int x = *(int*)p;
  *(int*)p = x;
  return a;
}

Loading through an int* casts the value to int, so this function effectively returns int2ptr(ptr2int(a)). We would like to optimize the load and store of x away to a NOP. However, that would let us optimize the function to just return a immediately, so we optimized away an int2ptr(ptr2int(a)) – which we can’t do.

So maybe it is a mistake to say that loads perform a cast, but that just shifts the problem. Consider

int *foo(int *a) {
  int **p = &a;
  int x = *(int*)p;
  *(int*)p = (x * 3 - x * 2);
  return a;
}

Certainly, multiplying by 2 or 3 has to perform a cast; abstract (non-integer) locations cannot be multiplied. But the compiler should be able to optimize (x * 3 - x * 2) to x, and then go on as above, and then we have the same problem again.

Or, if this doesn’t convince you, what about

int *foo(int *a) {
  int **p = &a;
  int x = *(int*)p;
  *(int*)p = (int)(int*)x;
  return a;
}

We probably want to permit optimizing (int)(int*)x to x but then we can again go on as above. Many, many integer optimizations have this problem, and the source of all this is being able to put a pointer into an integer variable without casting.