Arguably when non-terminating code is able to fulfill a type system obligation, then you don’t have “code which cannot be easily shown to terminate” anymore. Well.. technically, taken literally, you definitely still have “code that cannot easily be shown to terminate”, but it’s actually “code which can be easily shown to not terminate”, which feels (to me) like something @flatffinger wouldn’t want to include in their “code which cannot be easily shown to terminate” category.
Edit: I’m having an (in my opinion) better example (and it’s one not involving any type system obligations)
let mut v = <Vec<i32>>::new();
loop {
if v.len() > 10 {
break;
}
let value = side_effect_free_computation();
// forgot to call v.push(value) here
}
let x = unsafe {
// should be safe since we checked v.len() > 10 before getting here
v.get_unchecked(10);
};
Since we forgot updating v at all in the code above, it turns out to be side-effect free (reading a local (i.e. non-volatile, non-atomic) variable is not a side-effect). So the whole loop { ... } does not have any side-effects. On the other hand, our program structure suggest that calling v.get_unchecked(10) is safe since the only execution path leading there tested for v.len() > 10. Hence it would be a very bad idea if the compiler could just optimize the loop away.
That's not the only way the type system can be coaxed into generating arbitrary types. You can do it with a self-recursive generic function fn foo<T>() -> T { foo() } as well, and it's not necessarily easy to tell if this is going to terminate or not:
fn maybe_panic<T>(i: i32) -> T {
if i == 0 { panic!() } else if i > 0 { maybe_panic(i - 1) } else { maybe_panic(i) }
}
If this function is passed a value >= 0, it will panic. Otherwise, it will recurse forever (which may or may not eventually segfault, depending on whether it does proper tail calls). But in spite of the fact that no code is ever explicitly marked "unreachable," the fact that the function never returns normally is the reason why it is sound to summon arbitrary T from thin air.
There might be even more complicated settings, I agree. In this case, another question is whether a panic is side-effect free. (IMO it should actually be a side-effect.) Without panics, mutually recursive functions (e.g. function a calling function b, b calling c and c calling a) can probably become hard to analyze (not necessarily impossible, but at least inpractical).
Another question here is returning values. If I re-interpret a function call with return value as passing a pointer to an uninitialized return variable to that function, expecting it to fill that return variable, then the function execution is not side-effect free (since it writes to outside variables). In fact, I’m not sure in the maybe_panic example above, if the panic was e.g. replaced by a loop {}, whether there is any part of the code that is actually side-effect free.
..a few minutes later..
On a second thought, taking the “returning from a function is like writing to a return-value-pointer that was given to the function as an argument” intuition, maybe_panic does actually never write to that return-value-pointer, so in a similar sense as my example above, without any kind of special-case analysis for return values from the compiler, we would have a situation where the maybe_panic code should fulfill the condition “if it returns then the return-value has been written”, but if it is “optimized” away due to being side-effect-free, then that condition is violated. In a sense quite similar to the “vector length should be greater that 10” example I gave above.
Does it matter? We probably want constant propagation and dead code elimination, after all, so if someone invokes maybe_panic(-1)...
fn maybe_panic<T>(i: i32) -> T {
if i == 0 { panic!() } else if i > 0 { maybe_panic(i - 1) } else { maybe_panic(i) }
}
let v = maybe_panic(-1)
fn maybe_panic_n1<T>() -> T {
if -1 == 0 { panic!() } else if -1 > 0 { maybe_panic(-1 - 1) } else { maybe_panic_n1() }
}
let v = maybe_panic_n1()
... it quickly doesn't have any side-effects any more ...
fn maybe_panic_n1<T>() -> T {
maybe_panic_n1()
}
let v = maybe_panic_n1()
In other words, anything the optimizer is "allowed to do" has to be accounted for in the type system, or Safe Rust won't work properly (for example, disallowing data races accounts for optimizations in the CPU memory hierarchy). And terminating out of a mutually-recursive function is not accounted for.
Currently you have to provide an example of an actual miscompilation for a fix to be done in LLVM. I guess this happens partially because it is hard for LLVM crew to judge what is a bug and what is not otherwise.
Hopefully if IR semantics is formalized merely pointing out that a given optimization violates it will be enough for a bug report to be acted upon.
Actually, I would argue very much the opposite. What you are proposing, effectively, is that you legislate nondeterministic behavior via saying how the compiler (or, in some cases, the hardware) is permitted to transform it, and you are arguing that this is easier to understand for both compiler writers and programmers than a more direct operational semantics. This does not agree with my experience: given a choice between an indirect, axiomatic semantics and a direct operational semantics, I will always prefer the latter.
First off, it's easy to omit preconditions for transformations--especially when you're working off of a naïve "obvious" core semantics. Going off the original example here, there are three separate transformations that all, individually, seem to make sense, yet in combination, their result is nonsensical. Clearly, we shouldn't have applied one of those transformations, which means one of those transformations has a necessary precondition that's been omitted. Discovering that you've omitted preconditions is difficult; you essentially need what's still pretty advanced tools to generate testcases that are capable of covering corner conditions to find these results.
Secondly, axiomatic semantics tend to be harder to reason about, especially if you're comparing them to an operational semantics. Take the C++11 memory model, which is written in such a manner unlike the more operational semantics that dominates the specification. Even though memory models were already decently understood before the memory model was written, and even though the specification itself drew from existing experience, the results still managed to be incorrect. It proved to be simultaneously too strong (it couldn't be efficiently implemented on POWER and ARM) and yet too weak (it permitted out-of-thin-air behavior). If experts have a hard time understanding the actual consequences of this kind of semantics, what hope do mere mortals have?
A third point to bring up is that the code you see in the middle of an optimizer is only vaguely similar to the original source code. Describing the impact of optimizations in terms of the original source code can be difficult. Control flow is a big culprit (the definition of natural loops is not isomorphic with language-level looping constructs!), but the biggest pain in my experience involves memory. Every major programming language pretends that every variable lives in memory, yet reality is that most variables are expected to be registers, and when you want or need to assign more attributes to the load and store instructions, actually separating out which variables are which can be immensely challenging.
It is possible in principle to constrain UB so that it's not quite so all-reaching, and indeed, LLVM's definitions of UB-like constructs do do this to some degree. However, when you actually execute an UB statement, what you usually end up with is a program which is seemingly self-inconsistent: a variable may appear to have different values in nearby sections of the program without anything appearing to cause it to change value. Trying to constrain such inconsistency is challenging, and that work would probably be better spent making it easier for users to realize they're actually hitting UB. And UB is generally incorrect code anyways, even if you gave it defined semantics (signed integer overflow is an example here; >99% of the time, the overflow is already a bug in the code, although C's behavior is problematic here since there's no way to actually safely do it in the rare instance it actually would be correct).
OK, I'm going to throw out a straw-man, just to see how much it can solve.
We can logically define a program as a pair consisting of its State and it's Actions. State is memory, file system, etc. and Actions are pure functions that operate on state. For those of you familiar with it, this is basically just a Moore machine; see the schematic for a simple explanation.
Now the reason I'm talking about this is that the only part that really matters in the machine is output, and that's because it's the only observable part (it's a partially observable system). Optimization is the work involved in mutating one chunk of transition logic into a different chunk such that the observable outputs remain the same, and the transition logic is 'better' in some measurable manner.
So that forms the basis of my definition of what is allowed and what isn't allowed; so long as the observable output remains the same, a mutation is permitted. It is up to the optimizer and compiler to decide if a given mutation is better in some way.
Issues with this model that I know of off the top of my head:
Undecideability. As @notriddle and @steffahn have shown, there are examples of code where it is either difficult or utterly impossible to decide if a particular transition will occur, and therefore what its output will be. I don't know if that will affect this model or not.
Observability. How do we decide what is observable and what isn't? This is more complex than saying that if it changes something outside of the crate, then it's observable. Examples:
If I mutate the code so that it uses less (or more) power, I can observe the change in various ways. Does this count?
Likewise for memory, time, disk space usage, etc.
Using a debugger means that everything is observable. I like my release code being fast, not limited because a debugger now made it 'observable'.
Memory is weird. Memory-mapped I/O means that some memory really isn't memory. Some types you can read but not write, some you can write but not read, and some you can do both. Some types of memory mapped I/O allow you to write to them, but they ignore the write completely. So, if a write is performed, but it has no effect on the state of the system, was it ever really done?
Rust's compilation unit is the crate. But what about link-time optimization (LTO) and profile-guided optimization (PGO)? As I understand it, LTO can cross crate boundaries, blurring what is observable state, and PGO is making measurements of what is 'unobservable' to make better decisions in the future on what optimizations to perform. How does this model deal with all of that?
These are all good tools, but what I meant is that we can, theoretically, never be sure that something work the way we want it even when we try to prove it.
It ends in an infinite circle proving that physics follows the rules formulated in mathematics and so on.
Practically, we need to make assumptions, the lower the assumption, the better.
Say we formulated the semantics of machine code, and we believe that each x86_64 cpu did follow it, then we can prove each frontend to follow its defined semantics by proving if the mapping preserves semantics in the backend. This is possible but very cumbersome. The same could be done between llvm and Rust.
I think this would give us enough confidence to be sure that every optimization/code generation doesn't change semantics, though I don't think it outweighs the costs incurred by this formalization.
On the other hand, our program structure suggest that calling v.get_unchecked(10) is safe since the only execution path leading there tested for v.len() > 10 . Hence it would be a very bad idea if the compiler could just optimize the loop away.
Consider something like:
while(!(x & 1))
x = someValue(x);
If the value x is never used after the loop exits, treating an endless loop as observable behavior would mean that its behavior would be essentially:
if iterating x=someValue(x) would eventually yield an odd number
do nothing
otherwise
hang
Although there are times when such behavior would be useful, in most cases where a program would use a loop of that sort, the purpose would be to compute the value of x in case downstream code needs it, and if downstream code doesn't need the value of x, it would be just as well to skip the computation entirely.
If a programming language were to specify that an implementation would be allowed to omit the execution of a loop like the above unless either x were observed by later code, or later code would execute a "treat the value of x as though it is observed" intrinsic, then it could deprecate constructs which would rely upon the "hang if x would never become odd" semantics of the original without indicating such reliance; it could then recognize the concept of "compatibility mode" configurations which would support deprecated constructs, and "new-code-only" implementations which would require that any programs used with them not be reliant upon deprecated constructs for correct operation.
Integer overflow detection is another kind of side effect which languages should accommodate with semantics that would allow reordering or omission of computations. There are many situations where it may be necessary that a program never produce incorrect output as a consequence of integer overflow (instead reporting that the overflow occurred), but where ignoring overflows and producing correct output would be no less useful than reporting the overflow. If a program computes x = y*z; but x is never used, determining whether y*z overflows would likely be a waste of time unless that such information would be useful for some other purpose. Further, if after constant propagation a compiler can see that a program is computing p=q*(10*r)/(5*r);, such a calculation would be equivalent to p=q+q; in cases where the latter calculation would overflow or the first would not, and even in cases where the first expression would overflow, the behavior of the latter would be just as useful.
Many languages provide no features to automatically detect integer overflow because, as conventionally implemented, such detection would be incompatible with many optimizations. Accepting the idea that integer overflow only needs to be detected in cases where it would yield results that were observably numerically incorrect would greatly reduce the cost of integer overflow detection, but that would only be practical with language-level support for such semantics. If a programmer has to write something like:
int mul_with_overflow_check(int x, int y)
{
long long temp = (long long)x * y;
if (x < INT_MIN || x > INT_MAX)
integer_overflow_occurred = 1;
return (int)x;
}
and p = mul_wtih_overflow_check(q, mul_with_overflow_check(r, 10))/mul_with_overflow_check(r,5);, there would be no way for an optimizer to avoid a bounds check on r as well as a full multiply with overflow check, and it would be hard to avoid an extra multiply and divide as well. If the code had been written to use wrapping arithmetic, that would make things even worse for the optimizer. If the language were responsible for overflow detection, and were allowed to simply compute correct results even when an overflow would have occurred in the code as written, then even in cases where the result of the computation is observed, it could be simplified to a single add with a branch-on-overflow.
We don't need to prove that the LLVM implementation matches the IR spec, for that spec to be useful.
An essential part of trying to formalize a language spec is identifying the range of tasks for which it is intended to be suitable. If some task could be easily accomplished in a language which translates constructs into machine code in a context-free fashion, and defines the behavior of the source language as that of the resulting machine code, languages suitable for such a task should allow it to be accomplished essentially as easily.
The "undefined behavior" parts of the LLVM spec may be useful for tasks which would impose no requirements on program behavior in response to maliciously-constructed input, or for those which would precisely specify all responses to all inputs, but I would regard them as unsuitable for tasks where many responses to invalid input would be equally acceptable, but some possible responses would be intolerable.
Unless there is a consensus about the range of tasks for which a language is supposed to be suitable, it will be difficult to reach a consensus as to what its corner-case semantics should be.
it could be desastrous if the compiler simply inlined all these function calls with constant arguments, evaluated them at compile time (under the assumption that making u8 sometimes behave like it won’t ever overflow will always yield "correct" results) and made all the tests pass.
Many programs are subject to two application requirements:
When possible, behave in useful fashion.
When it is not possible to behave usefully (either because of invalid input or resource limits), behave in tolerably-useless fashion.
The situations the Standard characterizes as UB generally arise for three reasons:
A program executes a construct which would be erroneous even when given valid data.
A program which would process valid data correctly, receives invalid data.
A non-portable program needs to exploit a characteristic behavior of the underlying environment in circumstances which the target environment would process in predictable documented fashion, but some other environments might process unpredictably.
In the former situation, the erroneous code should be fixed, and languages should focus on helping programmers to fix the erroneous code rather than trying to limp along in spite of it. In the other situations, however, a much better remedy would be to have the language define more behaviors, at least loosely, and recognizing that different tasks may benefit from stronger or looser semantic guarantees.
For example, on many platforms it would almost never cost anything to guarantee that any read of a word-sized object, no matter when it occurs, will yield either the object's default initialization value or some value that has been written to it sometime in its lifetime, and to say that writes to a word-sized object will have no side effects beyond adding to the pool of possible values that might be read and perhaps influencing the choice of which values are yielded by which reads. Note that these semantics are looser than even the "relaxed-memory-order" atomics, but would nonetheless be adequate for some purposes.
Consider, however, what code would be needed for the following functions on 8-bit 8088:
int x;
int test1(int v)
{
int temp;
temp = x;
if (!temp)
x=0xFFFF;
return temp;
}
int test2(int v)
{
int temp;
temp = x;
if (!temp)
x=0xFFFF;
return temp;
}
If a compiler is allowed to assume that x will equal zero when the assignments are performed, the most efficient ways of processing those assignments would be inc byte ptr[_x] and dec word ptr[_x], respectively, but multiple threads were to process the if tests, and they then all processed the assignments, that could leave x holding a wide range of possible values other than 1 or 0xFFFF.
If an implementation will be used for purposes that would benefit from a guarantee that reads will yield some value that was previously written, the value of that guarantee could easily exceed the possible benefits that a compiler might reap by replacing mov instructions with inc and dec. If, however, a program would receive no benefit from such a guarantee, then the inc and dec form would be better. I think it would be better for a language like LLVM to focus on providing instructions which would uphold stronger guarantees, and then add instructions which could either be processed the same way or in ways that wouldn't always uphold those guarantees.
If kernel code is supposed to receive pointers to user-code objects and act upon them, it will generally be impractical for it to ensure that the objects will not be accessed by a user-mode thread at the same time as kernel code is accessing them. If such accesses occur, it would be reasonable for the kernel to specify that they may corrupt the user-mode object. If the user-mode code doesn't want its object corrupted, it should ensure that no thread tries to access it when the kernel is doing so. It would not, however, be reasonable to say that such improper usage could arbitrarily disrupt the state of kernel objects to which user code isn't supposed to have access.
Instructions that could behave arbitrarily if objects are written at unexpected times would be inappropriate for use by kernel code accessing user-code objects. On the other hand, requiring that kernel code only use strongly-sequenced memory operations would needlessly degrade efficiency. A compilation mode which would implicitly uphold some weak guarantees without having to uphold strong ones would allow more efficient code generation than one which required that everything be precisely specified.
it could be desastrous if the compiler simply inlined all these function calls with constant arguments, evaluated them at compile time (under the assumption that making u8 sometimes behave like it won’t ever overflow will always yield "correct" results) and made all the tests pass.
Is the situation any worse than the status quo where calculations that overflow can trigger arbitrary UB, or endless loops can do likewise?
Under my proposed handling for endless loops, given something like:
a compiler that actually executes the loop as written would be allowed to treat the expression (x & 1) within the if as a constant 1, create a quasi-dependency on someFlag and then invoke doSomething(1) without bothering to actually examine the value of someFlag or requiring that it actually be computed. Alternatively, a compiler could examine someFlag before the loop and skip the loop if the value is observed as zero, but if someFlags is non-zero the compiler would have to actually run the loop rather than merely assuming that (x & 1) would be non-zero.
Implementations or configurations intended for diagnostic purposes should regard a much wider range of things as potentially observable than those which aren't, and program validation tests should be performed with such implementations in addition to the implementations/configurations with which the programs will actually be used. No single configuration should be expected to be suitable for all purposes simultaneously.
If we’re talking about Rust semantics, the status quo would be that overflow results in a panic in debug mode and in wrapping operations (modulo 2the size of the integer type) in release mode. Of course when considering something like C/C++ where (AFAIK) the semantics say overflow is UB, then - yes - the situation would not be worse than the status quo.
If a language includes both wrapping and non-wrapping types or operations, and specifies that non-wrapping operations may be performed either using the indicated type or any larger type of the compiler's convenience, that would facilitate many useful optimizations with minimal "astonishment", and while allowing programmers and compilers to work together to efficiently handle situations where a program may receive a mixture of valid and invalid inputs to produce a range of outputs, only some of which will matter.
Given something like:
void test(int x)
{
int temp = x*100/50;
if (temp < INT_MAX/10)
doSomething(temp);
}
Under such rules, a compiler would be allowed to at its leisure either compute temp in a way that would be incapable of producing a result larger than INT_MAX/10 and then omit the comparison, or would be allowed to set temp to x*2 and then perform the comparison, but a compiler would only be allowed to assume that the result of the division couldn't exceed the range INT_MIN/50..INT_MAX/50 if it performed the computation in a way that would actually guarantee that. Such a thing might be accomplished in an intermediate language by having a divide instruction which allows substitutions and one that doesn't, and having a transform that would replace a reducible divide-by-50 with a combination of an irreducible divide-int32-by-50 and a blindly-assume-value-in-range instruction. If a compiler finds some transform that would turn the divide-by-50 into something else before considering the transform that would generate the assume-in-range instruction, the divide-int32-by-50 would no longer exist, thus preventing the generation of the now-erroneous assumption. If it would consider transforming the division in some fashion after generating the assumption, such an erroneous transform would be blocked by the use of the irrreducible-division operation.
Sure, one could simply mandate the use of wrapping int32 semantics everywhere, but that would impede many optimizations that should be useful. Further, even if one wants overflows to be detected, semantics which would be guaranteed to detect all overflows that would result in incorrect results would for many purposes be just as useful as those that would regard all computations that might overflow as though they were side effects in and of themselves, without regard for whether their results were actually used.
With respect to those comments about skipping computations.
Is currently any notion of "lazy block" that can be safely deleted when not needed in either rust/MIR/LLVM?
In general, calls to external functions can never be omitted, but in many cases could be useful for the programmer to mark them as lazy. For example, an external rand function could be omitted if its output is discarded and the programmer does not care whether the state of the RNG changes.
Is currently any notion of "lazy block" that can be safely deleted when not needed in either rust/MIR/LLVM?
I don't know whether any of those languages recognize such a concept, but people seeking to write optimizers should focus on optimizations which are easy for compilers to process correctly, and with which programs can easily be made compatible. Adding support for lazy blocks would allow the efficiency of many tasks to be improved more easily and safely than attempting to impose other kinds of more aggressive optimizations, especially if it's easy to explicitly specify various kinds of dependencies including "compute this value, whether or not the compiler can see any reason that it would actually be needed" or "ensure that any side effects from the computation of this value, or any values upon which it could depend, occur prior this point if they occur at all, but feel free to omit the computation altogether so long as those dependencies are still recognized".
Fundamentally, the goal of an optimizer should be to produce the most efficient machine-code program that would meet an application requirements. The better an implementation can be informed of what an application's requirements are, the better it will be able to meet them. Somehow, compiler writers have latched onto the idea that the people who wrote the C Language Standard in 1989 knew more about what programs needed to do than the people actually writing such programs, and the design of LLVM appears to be based on the idea that if the C Standard didn't provide for something, programmers shouldn't need it.
If LLVM adds support for useful constructs such as lazy blocks, high-level languages can follow suit. It will be difficult for high-level languages to usefully support such constructs, though, if LLVM doesn't.
Rereading what you said, I'm not sure you understand what I was saying. My point was that if a program evaluates e.g. x = y*30/15;, what a programmer will care about when code is executing in production is whether the program's output would match what would have been produced with infinite-precision integer math that could never overflow. If so, the program should output the results of its computations in a manner that would suggest they trustworthy. If not, its should produce output that cannot be mistaken for a trustworthy result (it might just output an error message, or it might output the results of its computations along with a message indicating that they should not be trusted unless validated by other means, depending upon e.g. whether it may be possible to validate an untrustworthy result in less time than would be required to produce a trustworthy result from scratch).
IMHO, programs that are running in production really should have integer overflow checks. The cost of such checks need not be great if compilers are allowed to perform computations in ways that might avoid overflow. As another example, suppose that code is supposed to add together a bunch of int32 values to produce an int32 result with an overflow indication. There are two approaches a compiler could take: check for overflow after each addition, or compute the sum using an int64 type and then check whether the final result was within the range of an int32. For each approach, there are some platforms where it could be accomplished about as fast as an int32 sum without overflow checking(*) but where the other would take twice as long. If compilers would be free to either report overflow or not, at their leisure, in cases where intermediate results exceed the range of int32 but the final result doesn't, the cost of overflow checks would be minimal on both kinds of platforms.
should work quite nicely. If the instruction after an if-then instruction is a single word (as would be the case for all of them), its execution time will be zero. Enabling overflow checking would require adding some code to see why the loop exited, and set up the counter so the terminal count would be at 0x80000000, and If it weren't for the need to perform overflow checking, the loop could be unrolled a bit more, but the overhead for overflow checking would be small enough that it should be tolerable even in a high-performance production system.
That would make it depend on optimizations if an overflow error is emitted. This is going to be very confusing for programmers and will make it really easy for different compiler versions or optimization levels to break programs.