Formal specs and optimizations in general

flatffinger · December 29, 2020, 5:59pm

This basically requires you to formalize the application requirements in a way that the compiler can understand them. And then the compiler would have to do some code synthesis or automatic theorem proving or so to make sure that this specification is always upheld. I don't think anything like this is even remotely realistic with the current state of the art.

It isn't necessary to express all possible sets application requirements perfectly, in a manner that would avoid mandating anything beyond what is actually needed. The more accurately programmers can express what is and isn't needed, however, the more efficiently an optimizer should be able to meet the actual application requirements.

My complaint is that the developers of compilers as well as the LLVM language seem to have been operating off the assumption that the C Standard sought to fully and accurately describe all the things that applications should need to do, and thus modeled the range of concepts that can be expressed in LLVM off those that could be specified in C without UB, ignoring the fact that the authors of the Standard stated back in 1992 that UB was intended to, among other things, "identify areas of conforming language extension", and that many real-world tasks that wouldn't be supportable on all imaginable platforms be performed using such "popular extensions". The design of LLVM thus suffers from being focused more upon what the C Standard mandates, than upon the varied need of actual applications. Unfortunately, designs of languages which would generate LLVM are severely constrained as a consequence.

The concept of having non-deterministic values, but allowing programmers means of "freezing" them, would allow optimizers a huge level of flexibility, while being fairly easy to validate if programmers forcibly freeze things at the places that would be necessary to keep the range of possible behaviors from exploding beyond comprehension. If a huge number of objects might end up in non-deterministic states, but their value will end up ultimately being frozen or ignored, it should often be possible to reason about the overall behavior of code that generates them (fill the objects with unspecified frozen values) without having to examine the behavior of all the individual steps therein. Such reasoning will be trivial if none of the individual steps could have any side effects beyond yielding possibly non-deterministic values. It will be massively more difficult, however, if any of the individual steps could produce "regard as impossible any situation that would lead to this" UB.

Further, there are many situations where a programmer might know that a situation will never arise when given valid data that it would be required to process usefully, but not be certain that it couldn't arise as a consequence of invalid data. Such situations could be most usefully handled by a directive whose meaning would be "in debug mode, trap; in release mode, at the compiler's convenience, either ignore this directive or trap so as to make the following code unreachable". In cases where an "assume unreachable" directive would have awarded big benefits, a compiler could still reach those benefits minus the cost of the trap, but unlike the former directive which could arbitrarily alter program behavior, this alternative directive would only have two possible consequences: (1) trap, or (2) do nothing. IMHO, for many purposes, that would seem to offer a much better compromise between safety and performance than an "assume unreachable" directive.

A compiler need not find all possible benefits that could be received from such directives in order to make them useful. In many cases, the directives would provide a concise way of documenting preconditions that would need to hold in order for a piece of code to behave usefully even if compilers ignored them, and in many cases it would be fairly easy for a compiler to reap benefits by evaluating the cost of processing the following code in cases where e.g. x is known to be even [which would among other things allow x/2 to be replaced with x>>1], compared with the cost of processing the code if that were not known to be the case, and if the difference exceeds the cost of a bit test and trap, generate a bit test and trap. Note that even if a compiler guessed wrong as to which course of action would be most efficient, both behaviors would be correct; proving the correctness of the transformation would thus be trivial..

If languages sought to give programmers ways to invite optimizations, there are many things they could do which would be far easier to prove correct than optimizations based upon UB. There may be a few applications for which UB-based optimizations would be a good fit, but for most applications they make things needlessly difficult for programmers and compiler writers alike, compared with what better languages could offer.

CAD97 · December 29, 2020, 7:40pm

Part of what I think the disconnect here is that all these schemes you present are focused on (enabling the) frontend to let the developer hint the compiler without adding UB.

And that's a fine goal! But in the middle-end, the simplest way to implement and process your assert_if_useful! is to have regions where some predicate is assumed to be true, i.e. UB if it isn't true.

Even if your frontend is completely well-defined, and has no cases of UB or even nondeterminism, UB is still a useful tool in the middle- and backend to encode states which are impossible to reach.

RalfJung · December 30, 2020, 1:50pm

On top of what @CAD97 said, let me also note that UB in LLVM has some major differences to UB in C, so I don't think it is fair to say that LLVM is restricting itself to the expressiveness of C here. LLVM diverges from C in almost all interesting aspects of UB -- which is quite natural given that the design goals of an optimizing IR ("middle end", as @CAD97 put it) differ greatly from those of a surface language.

If someone has good ideas for adding surface / front-end-level operations to the language that let the developer hint things to the compiler, but that do not rely on UB, that would be quite interesting. I am quite certain that this could be added to Rust without fundamentally changing the way we think about compiler correctness or UB. But this is best discussed by showing concrete APIs and code using them -- things that have been curiously absent from the discussion here so far.

flatffinger · December 30, 2020, 6:39pm

Even if your frontend is completely well-defined, and has no cases of UB or even nondeterminism, UB is still a useful tool in the middle- and backend to encode states which are impossible to reach.

What sorts of useful mid-end or back-end optimizations do you see that could be facilitated by UB that could not be facilitated just as well by more localized non-deterministic behavior? The primary "unique feature" of UB compared with localized non-determinism is that it allows compilers to make inferences about an operation's preconditions. This can sometimes allow optimizations which would not be available via other means, but if none of the inputs to a program would cause UB in the source language, any inferences a sound compiler might make about the preconditions to any stage would need to be supportable based upon post-conditions from the previous stage, and thus analysis based upon pre-conditions wouldn't allow a sound compiler to do anything it couldn't do without such analysis.

For example, even if a source language has precise wrapping integer semantics, a 32-bit or 64-bit compiler given something like:

int test(int dx, int dy)
{
  if (dx < -1000 || dx > 1000 || dy < -1000 || dy > 1000)
    return INT_MAX;
  return (dx*dx+dy*dy)*1000/250;
}

could safely factor out the division because the largest possible intermediate value would be 2000000000, which on a 32-bit system would be less than INT_MAX. On the other hand, factoring out the division wouldn't require that an out-of-range multiplication yield UB--it would be just as legitimate if the upper bits produced from overflowed computations were considered non-determnistic. The fact that integer overflow would produce UB would entitle an implementation to infer that the multiplication will only be reachable if dy is in the range -1071 to +1071, but I would think there would be an easier way for it to make such an inference.

If integer overflow didn't have UB in the source language, the only way a sound transform could generate a "multiply but treat overflow as UB" instruction would be if it knew that the values being multiplied couldn't generate an overflow. Thus, in any situation where it could infer that operand2 would always be in the range INT_MIN/operand1 to INT_MAX/operand1, such inference would either be be something the compiler knew anyway (not useful), something that was wrong (because the transform was unsound), or something the compiler knew about but forgot (maybe useful, but probably not as useful as having the compiler remember such things).

flatffinger · December 30, 2020, 6:42pm

By my understanding, LLVM is designed to draw both forward and backward inferences from the fact that an operation would produce UB. Not only does it assume as post-conditions that an operation will never produce values that could not have been produced via defined means, but it also assumes that no sequences of operations will never receive as inputs any values that would "inevitably" result in UB.

Does it not do that? How is what it does different from what the C Standard would allow (but was never intended to encourage)?

jcranmer · December 30, 2020, 7:08pm

If you mark a function as noreturn, then any optimization that takes place assuming that the function doesn't return is relying on that particular scenario being undefined. Similar rules apply to virtually any annotation of a function's behavior.

One key difference (especially as it relates to integer overflow, since that seems to be the only UB that you're willing to discuss) is that integer overflow is not automatically UB. Instead, the relevant operations produce poison values, which are UB only if used in some ways (but not all ways; the freeze instruction, for example, converts a poison value to an unspecified, frozen, indeterminate value and ceases poison propagation).

quaternic · December 30, 2020, 8:01pm

Well, here is one proposal that could answer some of the requests to enable writing code with a less strict specification, mainly for integer semantics: Add a way to obtain some unspecified value of a given integer type, obtained in any side effect free way at the compiler's discretion. For the sake of the example I'll use an associated function i32::unspecified_value().

Based on my cursory reading of LLVM's reference, the corresponding IR for the Rust

let x = i32::unspecified_value();

could be simply

%x = freeze i32 undef

To quote the reference, (edited for the specific case of undef) "[freeze <ty> undef] returns an arbitrary, but fixed, value of type ‘ty’. All uses of a value returned by the same ‘freeze’ instruction are guaranteed to always observe the same value, while different ‘freeze’ instructions may yield different values."

In theory, this should enable optimizing e.g.

x.checked_mul(100)
 .map(|y| y/50)
 .unwrap_or(i32::unspecified_value())

into e.g.

x.wrapping_add(x)

partially in the same way as

.unwrap_or_else(|| core::hint::unreachable_unchecked())

should. Unfortunately it doesn't look like this currently had the intended effect, possibly due to how Rust implements the checked operations.

It might be that in practice LLVM will have trouble actually optimizing such code given that freeze is a relatively new addition, and maybe this is more difficult for the entire model.

Another pattern this would enable would be non-deterministic execution paths:

match i32::unspecified_value() {
    0 => implementation1(),
    _ => implementation2(),
}

This would let the compiler freely choose the implementation to use for a given target depending on its own analysis, use different implementations at different inlined locations, or even use some method of runtime selection. Hypothetically it could use the union of UB of both possible paths to optimize both.

It's not clear to me if such code is desired or should be encouraged, but currently it doesn't even look possible in Rust.

197g · December 30, 2020, 8:16pm

A few of these hints already exist; albeit one might note some usability issues.

For example, instead of just casting a known memory address to a pointer and dereferencing and therebby ignoring provenance, a correctly sized (extern) static may be declared and placed at the right address by the linker. Although it might lose some optimization by such an address no longer being a constant without lto, it informs the compiler of this extra memory location of which it then assumes the right properties. The attributes on the item, such as no_mangle, link_section, its type and attributes(alignment, packed, freeze'ness) etc. are arguably an existing and correct mechanism by which the programmer's knowledge is transferred into semantically meaningful constructs. (Some aren't unsafe as required but that's an oversight/known deficiency that's being worked on separately). Might this be a model for other kinds of programmer assertions? I don't know.

There is also already is a canoncical hint for local assumptions in the form of unreachable_unchecked and assume but results are sometimes negligible, don't have a particularly predictable effect on llvm or in some cases using them leads to worse optimization ¹ ². Even the unstable internal attributes for marking types to only contain certain value ranges do not have their full intended effect on optimization ³.

It seems fairly involved to add more hints across all layers down to LLVM. When the existing ones already do not work in general and/or do not have a strictly additive effect in combination, that time seems better spent implementing different things, even libraries that encapsulate common algorithms with handwritten optimization instead. Or coming up with a fundamentally different model of applying such hints in the first place but see the above timespan of 3 years for a single IR instruction to curb your expectations at such a project coming to fruition with LLVM.

flatffinger · December 30, 2020, 8:32pm

A noreturn qualifier is an explicit description of a function's postconditions. My squawk is with the use of UB to infer precondtions. If inferences flow only "downstream", the it's possible to validate a program by examining blocks individually to ensure that every block will satisfy all of its postconditions in all cases where the preconditions are satisfied, and examining the interconnections between blocks to ensure that the preconditions of every block can be matched up with corresponding postconditions in the blocks that feed them. The only way a block's precondition could fail to be satisfied would be if there weren't a corresponding postcondition on a preceding block, or if a preceding block failed to satisfy its postconditions despite the fact that all of its preconditions were satisfied. Either of those would represent a structural defect in the program.

Trying to back-propagate preconditions through blocks makes validation much more difficult, since block preconditions and post-conditions can no longer be evaluated in context-free fashion. Suppose a value produced by a block A feeds two other blocks B and C, and block B doubles that value and passes it to block D whose precondition specifies that the value must be less than 100. Back-propagation would allow a compiler to infer that if the program is sound, A must as a post-condition guarantee that its output will be less than 50, and thus replacing C with an operation that as a precondition assumes that its input is less than 50 would not make the program unsound.

On the other hand, determining that the validated program was sound would require validating that as a post-condition A will always produce values less than 50. If one can do that, that would allow C to be replaced with an operation that assumes its operand is less than 50 without having to draw any inferences based upon B and D. Back-propagating inferences may sometimes allow one to prove that a transform will not turn a sound program into an unsound one in cases where one couldn't prove whether the program was sound before or after the transformation, but I don't see how that could be useful outside cases where a programmer would know that a program was sound but be unable to prove it. Even in those cases, I would think efforts on back-propagating pre-conditions could be better spent providing programmers with tools that would allow a wider range of programs to be provably sound.

One key difference (especially as it relates to integer overflow, since that seems to be the only UB that you're willing to discuss) is that integer overflow is not automatically UB. Instead, the relevant operations produce poison values, which are UB only if used in some ways (but not all ways; the freeze instruction, for example, converts a poison value to an unspecified, frozen, indeterminate value and ceases poison propagation).

Aliasing-related issues should be defined in terms of program structure rather than program state. Threading-related issues should be defined in terms of when various sequences of operations should be considered equivalent, which should in turn also be described in terms of program structure. About the only form of potential UB which would really be "data-dependent" would be side-effect-free endless loops, and I'm not sure what else I'd want to say that I haven't already.

Integer overflow is probably the form of data-dependent UB that would benefit most from having a robust model for non-deterministic values. A problem with poison values is that they add side-effects to what would otherwise be side-effect free operation. Even if those side effects don't "usually" trigger, an operation that may produce UB for some operand values may only substitute for one that can't if one can prove that the function will never receive such values. While one could avoid such issues by either having every function freeze all inputs that could trigger UB if they were poison, or having every function freeze all outputs that might potentially be poison when given non-poison inputs, but doing so would negate any optimization benefits that poison was supposed to provide.

jcranmer · December 30, 2020, 8:35pm

To me personally, one of the issues with modern languages and compilers is that there isn't enough integration with the linker. I would definitely prefer that you have the memory-mapped registers at known addresses be represented by symbols at well-known addresses, which I believe requires custom linker scripts to effect. Another feature (unrelated to anything else discussed in the thread) would be getting section start/end addresses, although a more minimal accumulate-all-symbols-of-type feature (à la LLVM's appending linkage) would suffice for most use cases--there's been a few discussions about this on previous messages.

flatffinger · December 30, 2020, 9:36pm

To me personally, one of the issues with modern languages and compilers is that there isn't enough integration with the linker.

A big problem is language designers' paranoia about including features that may not be supportable on all implementations. A lot of the weird goofy rules related to external linkage in C are a result of an unwillingness to say "Implementations should generally be configurable to do things in fashion X when practical, but implementations may do things in other ways provided they predefine macros to indicate deviations from recommended practices." Although the ability to support things like weak symbols isn't universal, it's pretty common, and there's no reason a language shouldn't allow such features to be accessible in consistent fashion when they exist.

Another issue is that many language standards don't include any way for a programmer to tell a compiler to process a function and a call to it with the same semantics that be produced by a machine-code implementation of a function that knew nothing about its caller, and a machine-code implementation of the caller that knew nothing about the called function other than its prototype. In the days before link-time optimization there was no need to specify that cross-module calls should have such semantics, since implementations couldn't really process them any other way. Unfortunately, there's no way to distinguish situations where such semantics would represent an accidental and needless barrier to optimization from those where they would be cause actions which the Standard regards as UB to be processed "in a documented fashion characteristic of the environment".

RalfJung · January 2, 2021, 2:47pm

Quite a few of them; see this blog post for another example.

Oh, the general "UB framework" is the same, that's true. The concrete cases that trigger UB are very different though.

Also, ever since strict aliasing is part of the C standard, I think it is safe to say that the standard is very explicitly intended to encourage these kinds of optimizations. So "never intended to encourage", I think, might have been true in the 70s, but has stopped being true a long time ago. The C standards committee is full of compiler writers and they do intended to encourage exactly these kinds of optimizations.

Note that non-determinism is a side-effect though, so freeze poison is not side-effect-free. In particular, performing a side-effect-free operation twice is equivalent to performing it once and re-using the result (let x = f(); (x, x) is equivalent to (f(), f())); that is not true for freeze poison.

But yes, exposing such "unspecified values" to Rust could be a useful feature. However, it also causes some interesting issues so care should be taken before adding such a feature.

flatffinger · January 3, 2021, 4:19am

Quite a few of them; see this blog post for another example.

That doesn't strike me as much of an example, since a call to an uninitialized function pointer would seem like a literal "anything could happen" situation, with or without optimization, so the fact that such a pointer might happen to hold the address of a function whose address is never taken shouldn't be surprising.

Also, ever since strict aliasing is part of the C standard, I think it is safe to say that the standard is very explicitly intended to encourage these kinds of optimizations.

If one looks at the published Rationale document, it's clear the authors intended to encourage optimizations in cases where there was no apparent relationship whatsoever between a double*, and a static-duration object of type int. The Standard makes no attempt to specify when quality compilers should recognize that pointers are related because the extent to which a compiler should be expected to notice when things are related depends upon how aggressively it tries to exploit cases where they're not. Consider something like:

int i,j,*p;
short *q;

*p = 1;
do
{
  i=*p;
  *q = 2;
  j=*p;
  ...

Should a single-shot compiler pessimistically make allowances for the fact that *q might alias *p on the second or subsequent iterations of the loop? How would one write a standard so as to require that a compiler recognize the interaction between p and q in:

*p = 1;
q = (short*)p;
*q = 1;
i = *p;

I think it could be done, but I would think compiler writers who make an honest effort to notice relationships between objects could do so just fine without such a detailed spec.

Further, the authors of the Standard wanted to encourage "variety" among implementations, because they recognized that different implementations should be expected to process constructs in different [presumably useful] ways that the Standard couldn't fully describe. If they had intended that no implementations not be expected to do anything other than "behave uselessly unreliably unpredictably", that wouldn't really represent any useful form of "variety".

Note that non-determinism is a side-effect though, so freeze poison is not side-effect-free. In particular, performing a side-effect-free operation twice is equivalent to performing it once and re-using the result ( let x = f(); (x, x) is equivalent to (f(), f()) ); that is not true for freeze poison .

Code that performs the sequence "x = y; freeze x; return someFunction(x);" could be safely replaced by code which does it once and reuses the result, even though the reverse would not be true. I don't think substitutions should in general be expected to be bi-directional, however.

CAD97 · January 3, 2021, 8:20am

There is no call to an uninitialized function pointer involved. The function pointer in said example is a static, so it's guaranteed to be initialized to NULL.

It's disingenuous at best for you to say that locally nondeterministic can handle everything better than UB, ask for a counterexample, and when presented with a potential counterexample, basically say "yes, that's unbounded arbitrary behavior (aka UB)".

If the value were actually uninitialized, I could sort of see your point: calling an effectively random address is unquestionably catestrophic. But I have a hard time accepting that "calling function pointer 0 calls some nondeterministic pointer as a function, treating arbitrary memory as machine code" is somehow "better" than "calling function pointer 0 is UB (see glossary)".

UB is fundamentally a way for the middle and backends to encode state that is impossible to reach. Even if it is proven in an earlier stage that said state is actually unreachable, the simplest way to maintain that information through further passes is to say that it's UB.

flatffinger · January 3, 2021, 9:15pm

There is no call to an uninitialized function pointer involved. The function pointer in said example is a static, so it's guaranteed to be initialized to NULL .

Sorry, I should have said a null function pointer. In many cases, unless a platform specifies the behavior of attempting to read the bytes at a null address, replacing an operation that would read the bytes at a null address with one that yield an arbitrary bit pattern would not be considered surprising. This principle would apply whether the bytes were being read as code or data.

I would regard the transformation being performed here as replacing:

 void (*proc)(void);
 ...
proc();

with

 void (*proc)(void);
 ...
if (proc) proc(); else __behave_arbitrarily();

A compiler which knows that unless proc is written via means the compiler isn't required to recognize, it will either equal NULL or someFunction(), and that allows that a compiler may regard any object as holding the last value written via recognized means, could replace that with:

if (!proc)
  someFunction();  // Allowed form of arbitrary behavior when calling a null pointer
else if (proc == someFunction)
  someFunction(); // Natural implication of comparison
else
  someFunction(); // Last legitimate write will have been either null or `someFunction`.

which could in turn, of course, be transformed into an unconditional call to someFunction(). Note that the key observation necessary to make this work--that the only legitimate writes of the object store either NULL or someFunction, doesn't require that a compiler use UB to regard anything as unreachable. All the compiler needs to perform the optimization are (1) permission to treat the null address as pointing to bits that hold unpredictable values, and (2) beyond to have a read of an object that was modified via unrecognized means yield either the last value that was actually stored by any means, or the last value that was stored to it legitimately.

If an implementation extends the language to specify that attempting to invoke a null function pointer will trap, then an attempt to invoke the function pointer in this example when it is null should cause a trap as specified by that extension. If an implementation doesn't specify anything about how it treats calls to null pointers, however, treating them as inviting arbitrary behavior would not be unreasonable. Which behavior is better would depend upon the nature of the application, and while implementations seeking to be as broadly useful as possible should be configurable to trap on the null-call scenario, that doesn't mean that other configurations shouldn't also be available.

RalfJung · January 4, 2021, 3:38pm

flatffinger:

I would regard the transformation being performed here as replacing:
 void (*proc)(void);
 ...
proc();
with
 void (*proc)(void);
 ...
if (proc) proc(); else __behave_arbitrarily();

If you define __behave_arbitrarily to mean "Undefined Behavior" (which is how the C standard views it), you have now replicated what C, Rust, and LLVM do here -- calling NULL function pointers is UB. In the source code of Miri, you will find something very similar to this if, except that it raises a clear error message instead of behaving arbitrarily.

flatffinger · January 5, 2021, 7:49am

Prior to Annex L, the C Standard made no effort to distinguish actions which implementations intended for various purposes should be expected to process "in a documented fashion characteristic of the environment" when doing so would usefully serve those purposes, versus actions whose effects should not generally be considered predictable even in non-portable code. Annex L attempts to subcategorize actions into those that invoke Critical Undefined Behavior versus Bounded Undefined Behavior, but unfortunately fails to offer any meaningful requirements or even recommendations about what programmers should generally be entitled to expect of the latter.

On some platforms, a jump to address zero might be meaningful and useful. On others, the effects would be very unlikely to be useful except under contrived circumstances. While there should be a means of conveying the notion "make a call to this address, without regard for whether it might happen to be zero", given the lack of any explicit syntax for that I think it's reasonable to have the choice between "call this address if non-zero whether or not it's zero", "call this address if non-zero, or trap if zero", or "call this address if valid, or behave in not-necessarily-predictable fashion otherwise" specified through compiler configuration.

Note that unless an implementation specifies how function pointers are stored, it may be entirely reasonable (and for some purposes useful) for an implementation to store function pointers using indices into a target address table rather than direct pointers. On some popular 8-bit platforms, this could allow function pointers to be stored using one byte each if a program doesn't take the address of (depending upon the target) more than 64, 85, or 128 functions of any particular type. If an implementation doesn't specify that a call to a default-initialized function pointer will be processed as a call to address zero, it shouldn't be surprising for a compiler to treat it as a call to some other function.

InfernoDeity · January 5, 2021, 11:57am

Annex L was pretty much a mistake. I know of no major compiler vendor that actually implements it. The issues, in my opinion, are two-fold: 1) it limits optimizations compilers can perform in some cases of UB, 2) it provides the fallacy that you can reason about "Bounded UB" to programmers, and reasoning about UB is the last thing you should ever do (other last thing is rationalizing UB). It's always a footgun.

flatffinger · January 5, 2021, 4:56pm

it limits optimizations compilers can perform in some cases of UB,

A transform which replaces a program that meets application requirements with one that does not meet application requirements is not an optimization, even if the program wouldn't take nearly as long to perform a useless task as the original would have taken to perform a useful one.

it provides the fallacy that you can reason about "Bounded UB" to programmers, and reasoning about UB is the last thing you should ever do (other last thing is rationalizing UB). It's always a footgun.

Of what category of behavior does the authors of the Standard say:"It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior"? I'll give you a hint: its initials are not IDB.

Is there any evidence suggesting whether or not the authors of the Standard wished to preclude the use of C as a form of "portable assembler"?

The C Standard was never intended to specify everything an implementation must do to be suitable for any particular purpose. When it says that implementations may process UB "in a documented manner characteristic of the environment", that wasn't just suggested as a vague hypothetical As the authors of the Standard noted:

C code can be non-portable. Although it strove to give programmers the opportunity to write truly portable programs, the C89 Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler”: the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program.

If the Standard were to specify ways of telling an implementation to either process particular actions in a documented fashion characteristic of the environment, or reject a program if it's unclear what that the actions should mean in the target environment, then it might make sense to deprecate programs that rely upon such treatment without demanding it. For that to happen, however, the authors of the Standard would need to recognize the existence of actions which many implementations should be expected to process meaningfully, but that others might not be able to.

As it is, the Standard waives jurisdiction over such actions with the expectation that the marketplace will fill in the gaps, but the language has been taken over by people who are immune from marketplace pressures by virtue of their compiers' default presence on many systems. Further, the authors of the Standard probably also expected that any implementation which upheld all the corner cases mandated by the Standard would also uphold many other useful corner cases, making it unnecessary for the Standard to specify them. So far as I can tell, they may have been correct, since the compilers that most aggressively apply aliasing optimizations handle unambiguous corner cases wrongly.

void fatal_error(void);
void erase_effective_type_of_four_byte_value(void *p)
{
    unsigned char a,b,c,d;

    if (sizeof (float) != 4 || sizeof(int) != 4)
        fatal_error();
    unsigned char *pp;
    pp = p;
    a = pp[0] * 253;
    b = pp[1] * 251;
    c = pp[2] * 239;
    d = pp[3] * 57;
    pp[0] = 10;
    pp[1] = 20;
    pp[2] = 30;
    pp[3] = 40;
    pp[2] = c*15;
    pp[1] = b*51;
    pp[3] = d*9;
    pp[0] = a*85;
}
float test(float *p, int i, int j)
{
    float *p1 = p+i;
    *p1 = 1.0;
    erase_effective_type_of_four_byte_value(p1);
    unsigned *p2 = (unsigned*)(p+j);
    erase_effective_type_of_four_byte_value(p2);
    *p2 += 1;
    erase_effective_type_of_four_byte_value(p2);
    float *p3 = p+i;
    erase_effective_type_of_four_byte_value(p3);
    return *p3;
}

As the Standard is written, the only justification for a compiler to refrain from treating erase_effective_type_of_four_byte_value as though it erases the effective type of the four bytes at the target would be if if one absurdly stretches the concept "or is copied as an array of character type", or says that the actions of changing the value stored in a byte and later changing it to a value that coincidentally matches the original do not "modify the value". I suspect clang is adopting the latter interpretation, but saying that writing a byte with a value that it coincidentally held when the object had a different type may negate the effect of intervening modifications that changed the Effective Type seems like crazy semantics.

Awhile ago, when I downloaded a chip vendor's IDE that was based on gcc, I ported a lot of existing code to it. When I reconfigured the project to use the commercial compiler I'd been doing all my development with, the resulting program was more efficient than what gcc could produce even with maximum (and possibly-breaking) optimizations enabled. The commercial compiler wasn't doing anything particularly exotic. It simply did a better job with things like register management and opcode selection on my particular target platform (Cortex-M3). For many purposes, a compilation mode which simply went after low hanging fruit while still supporting the "popular extensions" alluded to in the Rationale would be vastly more useful than one which pursues aggressive optimizations while ignoring low hanging fruit.

CAD97 · January 5, 2021, 6:31pm

I fall to see how "programs may be nonportable" and "implementations may define language extensions" seems to imply "the existence of UB as an optimization tool is bad," as you seem to be arguing.

If the operation weren't UB, the implementation wouldn't be allowed to define it how is convenient for it, anyway! It'd be stuck with what the standard says.

Topic		Replies	Views
Pointers Are Complicated II, or: We need better language specs	148	10308	January 17, 2021
Terminology around unsafe, undefined behaviour, and invariants Unsafe Code Guidelines	40	3105	December 22, 2024
Types as Contracts Unsafe Code Guidelines	118	12218	March 25, 2019
"The optimizer may further assume that allocation is infallible" language design	73	5242	May 12, 2024
Blog post: Thoughts on trusting types and unsafe code Unsafe Code Guidelines	47	6186	March 25, 2019

Formal specs and optimizations in general

Related topics