Pointers Are Complicated II, or: We need better language specs

Unfortunately not; the optimization pays close attention to whether the pointers escape. The most striking examples I can get are things like:

pub fn test(dxy: isize, dyz: isize, dzw: isize) -> bool {
    let x = [1u8; 1 << 31 - 1];
    let y = [1u8; 1 << 31 - 1];
    let z = [1u8; 1 << 31 - 1];
    let w = [1u8; 1 << 31 - 1];
    let px = x.as_ptr().wrapping_offset(dxy + dyz + dzw) as isize;
    let py = y.as_ptr().wrapping_offset(dyz + dzw) as isize;
    let pz = z.as_ptr().wrapping_offset(dzw) as isize;
    let pw = w.as_ptr() as isize;
    px == py || px == pz || py == pw
}

which optimize to return false, meaning that it believes that for no setting of the values dxy, dyz, dzw can the expressions evaluate to true, which is physically impossible in a 32 bit address space. (It's harder to play the same game in 64 bit because the physical address space is 48 bits so you have to use arrays of size 1<<47 - 1 and the virtual address space still has room to spare.)

1 Like

Miri knows what's up here though; it will give a UB error at the write to p, saying that it does not have permission to write where it did, namely on top of y, because the mutable borrow of y used to create p has expired due to the first write y[0] = 1. Rust's aliasing rules are strong enough to eliminate examples of this form.

What if p is a raw pointer which doesn't have borrow restrictions? [note: example reworked, in next post, to avoid the need to do anything even remotely resembling aliasing].

The core of the issue is that clang thinks that the flag is false, so you can put anything in the if statement and clang will just optimize it away - and this is not licensed by Rust's operational model.

Sorry if the machine code didn't make the problem clear. The problem wasn't just that the compiler assumes the if is false, but rather the generated code from gcc performs the test, and if it's true it would performs the write to *p but then unconditionally returns 1 even if y[0] now holds a different value, which would be allowable since p has semantics akin to what I understand of raw pointer semantics in Rust.

I think I've reformulated the example to avoid any need to use any borrowed references:

extern "C" {
    pub static mut X: [u16; 2] ;//= [0,0];
    pub static mut Y: [u16; 2] ;//= [0,0];
}

pub unsafe fn test(i: usize) -> u16 {
    let px1: *const u16 = &X[0];
    let pyi: *const u16 = &Y[0];
    let flag = px1.add(2) == pyi.add(i);
    Y[0] = 3;
    if flag {
        Y[i] = 4
    }
    return Y[0]
}

which generates the code:

example::test:
    push    rax
    mov     rcx, qword ptr [rip + Y@GOTPCREL]
    lea     rdx, [rcx + 2*rdi]
    mov     rax, qword ptr [rip + X@GOTPCREL]
    lea     rsi, [rax + 4]
    mov     word ptr [rcx], 3
    cmp     rdx, rsi
    je      .LBB0_1
    mov     ax, 3
    pop     rcx
    ret
.LBB0_1:
    cmp     rdi, 1
    ja      .LBB0_4
    mov     word ptr [rax + 4], 4
    mov     ax, 3
    pop     rcx
    ret
.LBB0_4:
    lea     rdx, [rip + .L__unnamed_1]
    mov     esi, 2
    call    qword ptr [rip + core::panicking::panic_bounds_check@GOTPCREL]
    ud2

.L__unnamed_2:
    .ascii  "./example.rs"

.L__unnamed_1:
    .quad   .L__unnamed_2
    .asciz  "\f\000\000\000\000\000\000\000\f\000\000\000\r\000\000"

The observation that addresses y+i and x+2 are equal causes the indexing expression y[i] to be effectively replaced with x[2], which is then assumed not to be capable of aliasing y[0], even though in all defined circumstances where y+i and x+2 could be equal, i would be equal to 0.

1 Like

The & produces an immutable reference. Using the unstable &raw const to avoid any references produces the same result though.

Just as a suggestion, you can clean up the panicking-related code from your example by replacing Y[i] = 4 with *Y.get_unchecked_mut(i) = 4; . In this case, taking the argument i as a bool and casting it to usize would also have the same effect of removing the bounds check.

As does directly taking pointers with X.as_ptr().

Here's another reformulation, whose behavior, so far as I can tell, would only be justifiable if converting a just-past pointer to an integer is considered Undefined Behavior.

extern "C" {
    pub static mut X: [u16; 2] ;//= [0,0];
    pub static mut Y: [u16; 2] ;//= [0,0];
}

pub unsafe fn test(i: usize) -> u16 {
    let px: usize = X.as_ptr().add(2) as *const u16 as usize;
    let py: usize = Y.as_ptr().add(i) as *const u16 as usize;
    let flag = (px ^ 123) == (py ^ 123);
    Y[0] = 3;
    if flag {
        Y[i] = 4
    }
    return Y[0]
}

The compiler's behavior would be correct in the language it seeks to process if, in that language, conversion of a just-past pointer to an integer is UB. Is there any other justification for such behavior?

This does look like a miscompilation. Removing the bounds check on Y[i] (i.e. assuming i < 2) gives us this (asm annotated by me, with assistance from godbolt):

Rust

extern "C" {
    pub static mut X: [u16; 2] ;//= [0,0];
    pub static mut Y: [u16; 2] ;//= [0,0];
}

pub unsafe fn test(i: usize) -> u16 {
    let px: usize = X.as_ptr().add(2) as *const u16 as usize;
    let py: usize = Y.as_ptr().add(i) as *const u16 as usize;
    let flag = (px ^ 123) == (py ^ 123);
    Y[0] = 3;
    if flag {
        *Y.get_unchecked_mut(i) = 4
    }
    return Y[0]
}

LLVM-IR (optimized)

define i16 @_ZN7example4test17h716b2f3df427f079E(i64 %i) unnamed_addr #0 !dbg !6 {
  %0 = getelementptr inbounds [2 x i16], [2 x i16]* @Y, i64 0, i64 %i, !dbg !10
  %flag = icmp eq i16* %0, getelementptr inbounds ([2 x i16], [2 x i16]* @X, i64 1, i64 0), !dbg !20
  store i16 3, i16* getelementptr inbounds ([2 x i16], [2 x i16]* @Y, i64 0, i64 0), align 2, !dbg !21
  br i1 %flag, label %bb6, label %bb8, !dbg !22

bb6:                                              ; preds = %start
  store i16 4, i16* getelementptr inbounds ([2 x i16], [2 x i16]* @X, i64 1, i64 0), align 2, !dbg !23
  br label %bb8, !dbg !22

bb8:                                              ; preds = %start, %bb6
  ret i16 3, !dbg !24
}

ASM

example::test:
        ; rcx <- address of Y
        mov     rcx, qword ptr [rip + Y@GOTPCREL]
        ; rdx <- address of Y, offset rdi u16           ; rdx is py
        lea     rdx, [rcx + 2*rdi]
        ; rax <- address of X
        mov     rax, qword ptr [rip + X@GOTPCREL]
        ; rsi <- address of X, offset 2 u16             ; rsi is px
        lea     rsi, [rax + 4]
        ; Y[0] <- 3
        mov     word ptr [rcx], 3
        ; compare rdx (py) and rsi (px)
        cmp     rdx, rsi
        ; if equal, jump to .FLAG_IS_TRUE
        je      .FLAG_IS_TRUE
        ; return 3
        mov     ax, 3
        ret
.FLAG_IS_TRUE:
        ; X[2] <- 4
        mov     word ptr [rax + 4], 4
        ; return 3
        mov     ax, 3
        ret

The fact that it's storing to X[2] instead of Y[0] is fine, because the check shows that the two addresses are identical. The issue is that we then return 3.

This is, if I'm not mistaken, the exact same issue that Ralf pointed out in the original blog post: a write through one pointer is exchanged for a write through another (because they're shown equivalent) and then assumed to not write memory accessed through the original pointer. Obviously this combination of optimization passes is incorrect.

This is just another edge case miscompilation bug, though. LLVM is a complicated beast, and because LLIR has largely been informal to this point, it's not surprising that some optimization passes combine to cause miscompilations. All software has bugs.

It's for reasons like this combination of optimizations causing an e2e miscompilation that Ralf makes his point in the blog post: we need to actually decide what the semantics of LLIR are, so that we have a hope of showing optimization passes individually correct.

2 Likes

This problem has been reported ages ago, at least with respect to C. The fact that gcc and clang both treat the fact that pointers have values that compare equal (gcc only does so if pointers themselves compare equal, rather than if integer-converted pointers compare equal) as evidence that they can be substituted without regard for whether they have the same aliasing sets suggests that it is a behavior that was designed in without regard for whether language specifications actually allow it, rather than a "bug" as such. If the behavior were viewed as a bug, it could be fixed by disabling the use of comparisons to infer equivalence in cases that can't be proven sound. The lack of any mode to disable unsound optimizations without literally disabling all optimizations completely suggests to me that it this discrepancy between the compiler's behavior and the language specification is not viewed as a bug in the compiler, but rather a defect in the language spec.

I agree about the need to define semantics, but I don't have much hope of a consensus between people who implemented the optimization and don't want to forbid it, and those who want LLVM to have semantics consistent with languages like Rust.

1 Like

This is Ralf's point though: LLVM's semantics is inconsistent with itself, and this manifests as end to end miscompilations regardless of the source language. The reason this happens is because some optimizations assume the semantics is one way and other parts of LLVM assume it is a different way, and when they communicate they implicitly assume they are using the same semantics and a contradiction results. The solution is to make the devs commit to a particular semantics; it's not even that important what it is, but any answer to the question will enable follow up bug reports that need not be end-to-end miscompilations but can instead be of the form "optimization X is inconsistent with those semantics we agreed on", and progress can be made. As long as LLVM is noncommital about its semantics, every optimization can just argue for its favored interpretation of the semantics and nothing will get done about these "holistic" bugs.

Once the semantics is pinned down, it becomes much easier for Rust or C or any other language to target it and locally check its annotations for correctness. It doesn't matter if LLVM has a different semantics than C or Rust as long as frontends can insert operations to make up the difference. As a simple example, if language X decided that uninitialized memory has zeros in it, it would just have to ensure that any uninitialized memory is zero-initialized by inserting extra instructions when it can't prove that the initialization is unnecessary.

In the present case we're talking about pointer provenance, and Rust requires that integer types can be compared like integers, regardless of whether they are derived from pointers, so that has to be communicated to LLVM. As long as LLVM uses "nice-to-have" style semantics in terms of allowable optimizations like "ptrtoint(x) == ptrtoint(y) is equivalent to x == y" and "x == y is false if x and y are derived from different allocations" and "x + n is derived from x, for any constant n" then this is very difficult to achieve, because the rules I just gave are sufficient to prove that equality of integers is always false. (So if we throw in x == x is equivalent to true then we can prove a contradiction and get miscompilations, from any source language.)

If we instead have a semantics that speaks about the underlying mathematical model for the operations, for example "u64 is the type of 64-bit integers with an additional element called poison" and "addition of elements of type u64 adds modulo 2^64 unless one element is poison, in which case it is poison", then we can actually say with confidence how to compile operations; in this case compiling Rust pointer equality would mean converting the pointers to ints and then using int equality, because the above description implies that ints have no provenance. If instead the semantics said ints have provenance but there is an operation erase_provenance to remove it, then we would have to compile pointer equality to erase_provenance(x) == erase_provenance(y). And so on.

The exact form of the source and target semantics determine how to construct a correct lowering of operations, unless the choice of semantics means that there is no correct lowering, which is probably an expressivity bug on LLVM's end unless your source language is truly weird. (For example, if your source language has a primitive read_next(x) that reads the value that will next be written to a variable, then you might have trouble compiling to LLVM which has only partial support for time travel.)

11 Likes

How do you propose to make developers whose "optimizations" rely upon certain corner cases not being defined agree to changes to the language spec that would block those "optimizations", or make developers whose code relies upon those cases being defined agree to changes that would declare them "erroneous"?

(So if we throw in x == x is equivalent to true then we can prove a contradiction and get miscompilations, from any source language.)

I don't think the language that LLVM actually processes specifies that x==y is false if x and y are derived from different allocations, but rather treats comparisons as yielding UB.

If instead the semantics said ints have provenance but there is an operation erase_provenance to remove it, then we would have to compile pointer equality to erase_provenance(x) == erase_provenance(y) . And so on.

From what I've seen, proposals to add explicit means of erasing provenance to C or C++ get opposed on the grounds that they would "block optimizations". I'm not sure if the reason for that opposition is a failure to realize that the blocked "optimizations" are phony, or the opposition stems from the fact that optimizers don't actually track provenance but instead infer it based on address equality, and would thus have no means of reliably laundering pointers that are coincidentally equal.

In my example, unlike Ralf's, there is nothing in the code that would suggest that the expression which is used to access y[0] not derived from y but instead from x. To the contrary, the expression y[i] is clearly and unambiguously derived from y, but the fact that it is coincidentally equal to x.add(2) is sufficient to make the compiler infer that its provenance derives from x.

What law in the universe is there that requires that LLVM be suitable for reliably expressing the semantics of any language other than LLVM? I certainly think that any quality back-end language should be capable of doing so for a wide range of source languages, and languages that can't do so for a particular source language should not be viewed as being quality back-end languages for that source language, but I don't think anything requires that LLVM be a quality language suitable for any particular purpose.

If the corner cases are corner enough then ideally the faulty optimizations can be modified to detect the situation, with the situation being sufficiently rare that no one will notice or miss it. But I concede that there isn't much that is harder than taking away someone's "illicit gains".

In that case, the contradiction stems from inttoptr(x) == inttoptr(y) being equivalent to x == y and x == y being defined whenever x and y integral types. An equivalence isn't allowed to introduce UB.

I'm not talking about C/C++ here, I meant that as notation for an LLVM operation. I doubt that C or C++ would get an operation which explicitly talks about provenance, even though that seems like a very reasonable course of action.

I think your example is a pretty clear cut example of miscompilation. What LLVM is doing there is nonsense under any reasonable definition of pointer provenance. (That said I also find pointer provenance as a concept dubious because people have many intuitions about what it means and if you combine all the intuitive things you can say you get a contradiction. In fact Ralf's post does essentially this.)

I wouldn't call it a law of the universe so much as a quality metric of the toolchain. I think that it should be general purpose and capable of supporting things that can reasonably be performed in most hardware, but on individual features intended to support the details of the various source languages I don't have much opinion; if lots of people want it then ok. What should be avoided is a situation where the source language has a simple construct for something the hardware supports, but LLVM doesn't and so it has to be encoded and decoded and sometimes decoding fails and you get bad code.

Really interesting topic and a great post!

My own intuition is that casting an integer to non-volatile pointer should be disallowed. Or at least subsequent access (in all branches) to such pointer should be treated as volatile and accesses after the volatile access perhaps could be optimized but only based on information known after the cast. This is probably equivalent to obtaining a pointer from black-box memory allocator (unknown allocation algorithm - impossible to optimize).

Whether the pointer should be assumed aliased is a hard question but I think the person who wrote such code should be required to mark it appropriately. (bad marker - UB)

People mentioned that storing stuff in lower bits of aligned pointers is useful. To help optimize this, we could pretend that a pointer is internally something like this:

struct Pointer<T> {
    #[has_provenance]
    actual_pointer: NBitType<size_of::<usize>() - log2(align_of::<T>())>,
    extra_bits: NBitType<log2(align_of::<T>()>,
}

Then we optimize pointer accesses on actual_pointer. Dereferencing it would just assume extra_bits is all zeroes. extra_bits is treated as integer. Overflowing it is UB. For convenience safe type with operations on extra_bits could be defined explicitly so that no unsafe casting is actually needed.

This roughly matches current standard that misaligned access is UB.

As far as I can tell, in the language LLVM seeks to process, i=ptrtoint(p) is defined only in cases where p is null or identifies an object and is free of aliasing restrictions, or various other conditions hold, and p=ptrtoint(i) only has defined behavior in constructs where i is visibly formed directly from a pointer, or when various hard-to-describe conditions hold. In that language, attempting to convert a just-past pointer to an integer (as in both Ralf's example and mine) would invoke UB, rendering any notions of equivalence after that moot, and meaning that nothing that happens afterward can create a problematic contradiction.

While an equivalence between integers and pointers can be handy for some purposes, a language can be useful for many purposes without treating pointers and integers as equivalent. I think that having integer-to-pointer conversions do an implicit laundering operation would generally be useful, since the cost of laundering in cases where it wouldn't be necessary to make a program behave usefully would generally be less than the value of supporting existing code which doesn't include not-yet-invented laundering directives.

The C++ standard actually includes a launder operation. Unfortunately, compiler vendors have insisted that its effects be extremely narrowly focused, lest it impede optimizations.

If one were to specify that attempted pointer-to-integer conversions invoke UB if the pointer doesn't identify an object, then both Ralf's code and mine would invoke UB and there would be no contradiction.

The maintainers of clang, gcc, and LLVM have for years been aware of how they treat these situations. Given the number of compiler releases since then which continue to behave the same way, I think they view their treatment of such cases as being "by design". Many people might regard this treatment as "miscompilation", but I don't think the maintainers of clang, gcc, and LLVM do. The most charitable explanation I can figure is that they would view such programs as somehow invoking UB, and the most logical place to find that they invoke UB is at the pointer-to-integer conversion.

Unfortunately, language designs get caught between a number of contradictory goals. I think much of my earlier discussion with Ralf went in circles because we prioritize the following goals differently:

  1. A language should be designed so as to allow automated verification that the behavior of a correct program given correct data would be defined on all correct implementations.

  2. If multiple possible behaviors by a program would all satisfy application requirements, a language intended to facilitate optimization should allow programmers to invite compilers to select freely from among them.

If one is seeking to satisfy the former goal, then all actions which could be performed by correct programs should have precisely defined behavior. If one is seeking to satisfy the latter goal, then correct programs should be able to perform many actions whose behavior is more loosely specified.

An ability to automatically test that a program would have defined behavior on all correct implementations, at least when fed a particular data set, can certainly be useful, but would in many cases require foregoing optimal performance. In some cases, the benefits of automated verification may outweigh the costs, but in many other cases they do not. The only way I can see for a language to facilitate both automated testing and the generation of maximally-efficient code would be to recognize a category of programs which, in order to be correct, would need to satisfy some constraints that facilitate automatic testing, but also recognize a category which are correct even though they cannot be automatically verified as such.

Or perhaps the LLVM developers have been prioritizing higher-impact bugs and language support over fixing one (of many!) long-standing miscompilations. If it's stood for as long as you say without someone reporting it happening in code not deliberately constructed to trigger the miscomp, then it's obviously not as critical to fix as you make it out to be.

The three optimizations Ralf points out are all powerful optimizations that typically don't cause problems. Obviously they do in this case, so they need to be reformulated, but it's not just as simple as disabling the problematic optimization. For one, it's not immediately obvious which is to blame (because of the aforementioned lack of defined LLVM-IR semantics). More importantly, though, all of them have significant positive impact when optimizing real programs. "Just disabling" the broken optimization would noticably lower the quality of optimization, which LLVM would obviously like to avoid.

The miscomp existing doesn't mean that LLVM isn't fit or that the LLVM devs don't care about it; it means that it's a complicated issue with a nontrivial implication to a fix, that just hasn't been prioritized.

1 Like

To add to what @CAD97 said, note that these bugreports are still open (as in, they have not been closed with "wontfix"), and some of them (like this one) contain a lot of discussion. While I'd love if the LLVM devs would give these issues higher priority, and while it is frustrating how long it takes for some bugs to get fixed, I think it is unfair to say that LLVM devs consider these things not-a-bug. In my experience, LLVM devs would love to fix all these bugs and have a consistent, precisely definable semantics -- they just do not have this concern at the top of their priority list, and they want to sacrifice the absolute minimum of performance for this task.


Sorry for the fuzz that I caused here. I usually link to the Rust forum instead of Reddit for discussion based on some feedback that I got about how welcome people feel in either community and which one is worth promoting more. Clearly, I should pay more mind to how Rust-specific the blog post is, and link to more widely-scoped discussion venues for my more general posts. I will take this into account in the future.

2 Likes

I would expect that a reputable compiler vendor who becomes aware that a certain optimization sometimes causes miscompilation would disable the optimization except when the compiler is invoked with a mode which requests dangerous optimizations, or at minimum offer a mode that would limit a compiler to sound optimizations.

The ability to draw inferences based on comparisons may be "clever", but when applied to pointers, even in cases where it doesn't cause problems, I think the benefits would often be fairly limited . Replacing a store to 4*rax+y with a store to rip+x+8 doesn't really seem like much of a savings.

The ability to elide a round-trip cast sequence would only be a useful optimization in cases where it was necessary to use pointer-integer-pointer round trips for purposes that would not require laundering the pointers, e.g. allow a function to sometimes return a number and sometimes return a pointer, depending upon the value of a "mode" parameter. Such conversions are used far more often, as in Ralf's code, for purposes of laundering pointers; in cases where omitting the laundering step would allow the generated code to be more "efficient", it would typically replace useful behavior that would be defined if "laundering" is recognized as an exception to aliasing rules, with broken behavior.

Reordering an access to q across a store to an lvalue which in the code as written is made via non-laundered pointer derived from p is a good optimization. For the optimization to be correct, however, the compiler must know whether the code as written used a non-laundered pointer derived from p.

A reputable compiler manufacturer should only make optimizations which they know to be sound, according to the dialect they are configured to process. If a programmer uses a compiler option or directive to indicate that a compiler may treat certain weird pointer-related constructs as Undefined Behavior, that may cause otherwise-unsound optimizations to be sound. Avoidance of Heisenbugs should be a top priority, and if optimizations complicate a compiler so much as to defy analysis, that should be recognized as a danger sign.

There's a quote from a prominent LLVM developer in the early 2010s that said, in effect, that he didn't want a formal semantics for LLVM. That said, the response to the work that Nuno Lopes et al. around Alive and Alive2 has been well-received, and the poison changes needed to make a lot of stuff sound has been accepted (even if it's taken years).

My experience is that, when you bring up a miscompilation, almost every compiler vendor responds with "Does this affect real code? No? Not going to worry about it then!" A patch that fixes a soundness bug but regresses benchmarks will usually be rejected, which is part of the reason why several of the LLVM soundness bugs have been languishing for years.

2 Likes

No wonder that nobody can trust any software then.. it starts with the compilers.

2 Likes

Thank you for this summary, it is certainly helpful. Let me first, however, correct by own point as reflected there: verification is the key, automation doesn't really matter. I am also totally okay with manual verification approaches. Given enough time, anything that's feasible for manual verification will be automatable, at least for the common cases.

But what I am demanding is that there be some principled way for a programmer to ensure that a given program "does the right thing" on all correct implementations of the language. IMO, a spec that does not achieve this goal is worthless and should be thrown away. The entire point of a spec is that I can program against the spec, without knowing the details of the implementation. If I have to know the details of the implementation to do anything, I might as well not have a spec.

Is there a single example of a specification that turned out to be useful in practice without satisfying this condition? I am not aware of one. Specifications are a form of abstraction, and as is the case for all abstractions, if we make them too leaky, they become useless.


Secondly, I will comment on your summary of your own position:

If multiple possible behaviors by a program would all satisfy application requirements, a language intended to facilitate optimization should allow programmers to invite compilers to select freely from among them.

I do not see a contradiction to my own position here. In fact, to make your goal useful, there needs to be some way to know if "multiple possible behaviors by a program would all satisfy application requirement". What do you think how one can know if this is the case? My point is exactly about being able to ensure, by reading the specification, that a program satisfies application requirements! I do not see a way to satisfy your goal (except vacuously) without also satisfying my requirement.

But I do see a contradiction between your stated goal here and your proposal of specifying a language through a list of permissible optimizations. To work with the kind of spec you are asking for, the programmer has to consider every possible way to apply these optimizations anywhere in the entire program and, for each of them, consider if all of the resulting executions would "satisfy application requirements". The number of programs to consider here grows exponentially with the size of the program. This is clearly entirely intractable.

This is related to the point made above several times that the behavior of a program after a bunch of reorderings is basically unpredictable, and has nothing to do with the original program any more.

In other words, the "multiple" in your claim does all the work. If "multiple" were 10, I would agree with you. As things stand, "multiple" will probably be closer to 10^100, since the number of "possible behaviors" we have to consider is exponentially large in the size of the program. Sure, if they all satisfy application requirements, then we'd be good, but it is totally impossible to tell if that is the case since there are too many of them! I am not disagreeing with your position on any fundamental level, but what I am saying is that you need a better way to express the "multiple possible behaviors". Just expressing them via "take this source program and apply any of these optimizations any number of times in any order" leads to an unusable system.

So, which approaches do we have to express the "multiple possible behaviors" that are all permissible, in a way that does not lead to combinatorial explosion? I know of two of them.

  • Program synthesis. This is very different from C or any other "normal" programming language. Here the programmer describes their intent, IOW they describe the "application requirements", in some machine-readable way, and then the compiler generates the code that is required to satisfy this specification. This is the most direct realization of your goal. Sadly, with the current state of the art this is also impractical for anything but fairly simple specifications.

  • Abstract machine. Clearly, considering every possible sequence of optimizations is not an option, but maybe there's a pattern? Maybe there is some nice way to characterize all the resulting behaviors that can result from all sequences of optimizations at once, so that by considering that "abstract behavior", I do not need to worry any more about all the individual behaviors of the individual optimized programs? It turns out yes, there is, at least for some cases of sufficiently well-behaved optimizations -- and this is exactly the "Abstract Machine" approach I am proposing, and that C, C++, and Rust are following!

    To give a concrete example in the context of Rust -- as you probably read by now, Rust's reference types &mut and & come with very strong aliasing guarantees, akin to restrict in C. We both agree that the way that restrict in C is specified is useless, so that's not what we are going to do in Rust. Your approach, if I understood correctly, would be to list the reorderings that the compiler is allowed to do, and call that the "aliasing model". I think that this is a bad idea, for the reasons given above (there are simply too many reorderings to consider, so it becomes impossible for a programmer to tell if their program meets "application requirements"). So instead, I propose we find some abstract way to characterize programs that are not affected by the reorderings. This way, we only have to consider this single, abstract execution, and if that execution "passes" (no "aliasing reordering violation" -- no Undefined Behavior), then we know that none of the reorderings can affect this program. This is exactly what Stacked Borrows does.

    Another example of this principle in action is the handling of weak-memory concurrency. This is often explained in terms of reorderings, but the actual model under the good works again by characterizing programs whose behavior is not changed under these reorderings. This model (flawed as it still is) provided, for the first time, the ability for a programmer to be sure that their concurrent code actualy was correct (without having to inspect the resulting assembly code or resorting to implementation-specific assumptions). Again, without this model, it was not practically possible to achieve your goal of ensuring that the program meets application requirements, at least not in any precise mathematical sense (sure, you can do a lot of testing and then claim "it works in practice", but I hope we agree that this is not satisfying).

    Of course, it can also turn out to be the case that there is no nice abstract way to characterize the summed-up effect of a bunch of optimizations. In that case, I'd say that this set of optimizations is inherently broken (such as the three optimizations in my blog post), and writing a program that will definitely meet application requirements after being subject to these reorderings is near impossible. So defining something like Stacked Borrows is also a crucial sanity check for the reorderings -- a language that does not pass this sanity check is not fit for use in any setting where correct behavior is required.

6 Likes

FWIW, sequence points in C are not specified via non-determinism. When two unsequenced operations conflict, this is UB. Accordingly, a Miri-like tool could be built fairly easily that detects C sequence point violations. For example, this paper describes an operational semantics for sequence points.

There is also non-determinism (if there is a sequence point between the LHS and the RHS of an operand, then the execution order is non-deterministic -- if I understood correctly). That is harder to explore, but one could still imagine doing things similar to what model checkers for concurrent programs do: detect equivalence of various interleavings, and only explore interleavings that will provide some new information.

I think a Miri-like tool for C could still be done, even if cannot explore all non-deterministic executions. (Even for Rust, Miri cannot explore all non-deterministic executions when concurrency is involved, or when the base addresses of allocations matter.) In fact, various such tools exist, with a different degree of coverage of the standard. A much bigger problem is that the standard has so many open questions that any claim of covering all of it is rather questionable. Another problem is that in C, there is no such thing as a "safe abstraction", so there is no way to solidly test the API surface of one library and then be sure that it is being used correctly by another library. You always have to test the entire program, which of course is often infeasible (the checker will simply be too slow).

3 Likes