Operational semantics and high-level vs low-level

Here's a simple, high level overview over where I think the disconnect between you and @RalfJung is.

Compilers have two main jobs: compile code to get the correct result, and compile code to get the correct result quickly.

The former is more or less specified by the language specification, however it exists, in a somewhat formal definition or just by the behavior of existing compiler(s).

The latter is achieved by applying optimizations to the generated code/IR. This is made reasonably possible exclusively by "dead space" in language semantics: the "undefined behavior.

It is impossible to add new (meaningful) optimizations without exploiting this dead space. This means that any new optimization (or bug fix!) will lead to code that existed on the old compiler giving a different result. (At the very very minimum, timing is different, and that is technically observable.)

Though it may not be your position, it comes off as if you're effectively advocating for any change in the result of a compiled program to be disallowed. This is because giving the compiler to the entire world to work with is basically isomorphic to every possible program being written. This argument simplifies down to no more optimizations, permenently, because it will break someone.

There's a reason that there aren't any (good?) raw assembly optimizes: it's downright impossible to optimize "what the machine does" programmatically. Any higher level language (here: has some abstraction layer between it and the hardware) adds extra rules (and thus, undefined behavior for breaking those rules). That dead space is exactly what allows optimizations to work.

Your argument is basically that the formalization effort needs to consider practical uses and lend itself towards comprehensibility. Nobody disagrees with you there. In any language that exposes a way to write something that is UB, it should be as simple as possible for the author (with the help of static tooling) to avoid causing UB.

At this point I'm going in circles, so I'm not going to go off the point of this thread any further.

But the key point of this is that a feature like this which is intrinsically dependent on the actual hardware (as it sits exclusively within the designed dead space of the language) is not and cannot be specified/guaranteed by the language specification. At maximum, it can be promised by a specific implementation/backend to fulfil specific hardware-level goals.

Optimizations don't break code, they make logical inferences based off of UB impossibilities that make sense locally to transform code to be quicker. This breaks when the author does not come in with the same exact assumptions. And not even all optimizations are solely modelable as source transformations.

15 Likes