"The optimizer may further assume that allocation is infallible"

RalfJung · December 22, 2023, 3:15pm

It leads to logical contradictions. For instance, using all the specs I have seen so far, I can prove that this program is sound:

int *x = malloc(-1);
int *y = malloc(-1);
if (x && y) {
  // Here we have two disjoint ranges of memory of size 2^64 - 1 fit into memory of size 2^64.
  // From this can can prove "False", and hence we can formally verify that the following
  // operation is safe.
  unreachable_unchecked();
}

But LLVM will actually compile this to a SIGILL. So either there's a bug in LLVM or there's a bug in the spec, and so far I haven't seen a proposal for how one could fix the spec.

newpavlov · December 22, 2023, 4:04pm

This proof relies on assumption that malloc always returns "real" pointers on success. In my understanding, if compiler can prove that memory behind malloc pointer is not observed, then it can use any value not equal to zero instead. Alternatively, we can say that if compiler can prove unobservability, it can insert virtual free call immediately after malloc.

I agree that it's worth to improve malloc specifications and make them more precise with better reflection of the implemented compiler behavior, but I also think that it's a relatively minor corner-case issue in a desirable optimization, which does no cause any problems in practice.

RalfJung · December 22, 2023, 4:26pm

Yes. That is how malloc is documented.

It is completely unclear what "pointer is not observed" even means, given that the compiler can do other optimizations first. We can change the program like this and the optimization may still happen:

int *x = malloc(-1);
int *y = malloc(-1);

// "Observe" the pointer.
// May be optimized to just printing "0" twice.
*x = 0;
printf("%d", *x);
*y = 0;
printf("%d", *y);

if (x && y) {
  // Here we have two disjoint ranges of memory of size 2^64 - 1 fit into memory of size 2^64.
  // From this can can prove "False", and hence we can formally verify that the following
  // operation is safe.
  unreachable_unchecked();
}

So right now, as far as I can see, there is no coherent spec we can give to malloc that allows this optimization. That's why I think this optimization is just wrong and should be removed.

newpavlov · December 22, 2023, 4:45pm

Note that I wrote about observing memory behind the pointer, not value of the pointer itself. Otherwise, null check can be considered observation of the pointer, which will be unnecessary strict. Additionally, observation has to have side effects, simple read and writes will be eliminated as can be seen here.

For obvious reasons, compiler does not account for cases like "we printed pointer value and user then inputted it manually back".

comex · December 23, 2023, 8:28am

Meanwhile I think this is another example of why operational semantics are not always the answer, and sometimes it’s better to just say “the compiler may perform X optimization” in the spec rather than try to reverse engineer operational semantics that justify X.

Perhaps not coincidentally, the other example I had in mind also had to do with enumerating the entire address space…

CAD97 · December 23, 2023, 8:45am

On the other other hand, this thread started based on basically exactly that case of saying "the compiler may assume performing this optimization is valid" being worded problematically vague.

(We also understand reasonably well how to resolve this specific problem, at least in the abstract; you "just" need to permit virtual allocation addresses to overlap. The more difficult part is saying exactly when they can't overlap and must actually correspond to a nonabstract memory address.)

RalfJung · December 23, 2023, 10:03am

I am not aware of a credible alternative to operational or axiomatic semantics. (There is denotational semantics of course but that's not what we are discussing here.)

One of the fundamental goals of a specification is to allow users to reason about their programs. There must be a way of proving a bunch of theorems and then concluding that all valid compilations of a program will do the right thing. A spec that does not achieve this goal is not worth calling a spec. Both axiomatic and operational semantics have extensive lines of work demonstrating their ability to achieve this.

Listing allowed optimizations has not been demonstrated to achieve that goal. I am not aware of a single work that would even properly formalize such a specification, let alone for any non-trivial language, let alone provide the tools needed to perform rigorous reasoning about programs written in such a language.

Even a C-style axiomatic spec is better, by orders of magnitude, than having to reason about the combinatorial explosion of a set of optimizations being applied in any order.

rather than try to reverse engineer operational semantics that justify X.

This "reverse engineering" provides deep insights into what the suggested transformations entail and how they interact with other transformations. This is not some grunt work we do to handle some deficiency of operational semantics; it is part of understanding whether that transformation makes any sense and whether programmers can be reasonably expected to be able to deal with it. Any optimization that does not have such a justification should be treated with suspicion; it may cause surprising actions at a distance (such as accidentally introducing pointer provenance as a concept programmers have to care about without people even realizing that that is what they were doing).

mjw · December 23, 2023, 11:44am

I don't think it makes sense to consider listing permitted transformations as an alternative to operational semantics.

But I think defining a small number of permitted transformations between the lowering from surface Rust and the execution on the abstract machine is much less worrying, and it might turn out to be necessary.

The problem of modelling the way compilers remove memory allocations is difficult enough without ruling out any options ahead of time.

the8472 · December 23, 2023, 12:47pm

It is completely unclear what "pointer is not observed" even means

For starters, in strict provenance terms, not only the pointer's address but also its provenance would have to escape the local execution. Printing its address is not sufficient for that.

I don't think that observing the memory is the right thing here since the entire memory could be "scratch space" during compile time execution that gets reduced to a single value. Think of filling a slice of memory with 0..n, summing over it and then returning the sum, it might get optimized to the closed form.

zackw · January 20, 2024, 5:41pm

I have a quibble. Ignoring issues of maximum array size^[1], I agree that in

int *x = malloc(-1);
int *y = malloc(-1);
if (x && y) {
  // Here we have two disjoint ranges of memory of size 2^64 - 1 fit into memory of size 2^64.
  // From this can can prove "False", and hence we can formally verify that the following
  // operation is safe.
  unreachable_unchecked();
}

the if condition is always false.^[2] However, the same logic dictates that this code fragment

int *x = malloc(-1);
int *y = malloc(-1);

// "Observe" the pointer.
// May be optimized to just printing "0" twice.
*x = 0;
printf("%d\n", *x);
*y = 0;
printf("%d\n", *y);

if (x && y) {
  // Here we have two disjoint ranges of memory of size 2^64 - 1 fit into memory of size 2^64.
  // From this can can prove "False", and hence we can formally verify that the following
  // operation is safe.
  unreachable_unchecked();
}

must have dereferenced at least one NULL pointer before it ever gets to the if. The only way the if can be reached is if x and y are both non-NULL, and therefore the condition is always true in the (empty) set of cases where the condition is reached.

I'm not skilled at writing operational semantics. In terms of "this optimization is allowed", how much does it help if we say that the compiler is allowed to shrink a heap allocation if it can prove that not all of that allocation is used? For example, given

int *p = calloc(4 * sizeof(int));
use(p[0]);
free(p);

we could say that it is valid to optimize this to

int *p = calloc(sizeof(int));
use(p[0]);
free(p);

And that then makes LLVM's current behavior correct for both your examples, since in neither case are all 2^64-1 bytes of memory actually used from both allocations.

Isn't it the case that under at least some interpretations of the C standard, no array is allowed to be larger than the maximum signed distance between two pointers representable by ptrdiff_t, and therefore malloc(-1) should always fail? ↩︎
You seem to be hesitant to describe things LLVM currently does as incorrect, and I don't understand why. "LLVM is buggy" should be a lower bar to reach than "the spec is incorrect". ↩︎

CAD97 · January 20, 2024, 10:51pm

This could actually already be the case operationally; it's just the as-if rule: the compiler can lie and cheat all that it wants so long as it ensures you never find out^[1]. So long as you stick to the conventional interpretation that allocation is not an “observable” event. It comes down to that definition of what malloc actually does.

The working definition as I recall it is that it that either

the unmodified allocation events are passed along to a runtime allocator provided to the AM (i.e. the #[global_allocator]); or
the entire allocated object is managed directly by the AM for the entire lifetime of the allocation (until free including any realloc, with a QOI promise to not utilize physical memory for this).

This permits the elision that optimizers do in practice (assuming these AM managed allocations are allowed to exist at addresses that don't fit in usize) but disallows more adventurous transformations with even greater surprise factor, such as merging/reusing allocations, eliding realloc, or shrinking allocations.

It fixes this specific example, but the underlying issue remains. You've plugged the simple way of demonstrating the presence of the magical out-of-bounds AM-managed allocations, but it's still possible to successfully observe “too much” allocation being live concurrently, just more difficult. (“Just” touch the allocated memory in a way transparently dead to the optimizer.)

Except by cheating yourself, e.g. inspecting the output assembly directly. ↩︎

dlight · January 20, 2024, 10:58pm

I think the actual rule should be: the optimizer is allowed to transform the code in such a way to reduce resource consumption, and thus sometimes remove errors caused by resource exhaustion. This means that you can never prove that a program definitively hits any kind of resource exhaustion (maybe it does with the current compiler, and ceases to error out in a future version, with better optimizers).

I mean, this doesn't happen just with malloc / heap allocation; it also happens with stack overflow, by inlining functions and sometimes performing tail call optimization, which may potentially reduce the size of the stack.

Another example of removing resource exhaustion I can think of are platforms or runtime systems that would kill and restart your program if it spends too much time doing CPU-bound work without reporting liveness (that is, systems that run under an watchdog). You could have a program that is failing because it's stuck on an expensive loop, and a newer version of llvm could optimize out this loop in such a way that the program is not failing anymore.

edit: maybe that's not a good example but CPU time can be viewed as a resource and a CPU limit could be enforced by some platform

RalfJung · January 23, 2024, 3:30pm

We can just use 3 allocations that are each slightly smaller than half the address space, if we want to not talk about "larger than ptrdiff_t" objects.

Yeah, good point. I should guard each of these parts with further if. It makes no difference.

Here's an example where we are using the first and last bytes of 3 allocations that each fill half the address space, and still they all seem to exist at the same time: Compiler Explorer

If I had more time I'd craft an example where we are using every single byte, and still LLVM optimizes it all out. It's just a matter of writing the code in a transparent enough way that LLVM can statically determine the results of all the loads.

The issue is that memory is finite, so there are cases where logically, resources must be exhausted. And the spec of Rust allows programs to directly observe the finiteness of memory by looking at the addresses that data lives at.

The stack of the Abstract Machine lives "outside memory" and cannot be observed by the program. So we can easily just pretend that it is unbounded in size, and then treat resource exhaustion the normal way: every function call may or may not, non-deterministically, overflow the stack.

The same trick does not work for memory because so much about the finite nature of memory is exposed to programs.

Vorpal · January 23, 2024, 3:51pm

Why is it a problem that a program can be optimised such that it works when before it ran out of resources? No sane program depends on not being able to complete its work due to running out of resources.

The rule that @dlight proposed seems perfectly reasonable and well defined to me. I don't understand the actual problem that such a rule would apparently cause. Perhaps providing an example of a reasonable program that would actually break would help explain the issue. Or is the problem simply that it is hard to express this rule in whatever formal mathematical you are working on?

tczajka · January 23, 2024, 4:03pm

The problem is that the rule as proposed by @dlight does not allow the optimization removing a memory allocation -- the problem with that optimization isn't that it provides virtual resources, the problem is that it modifies behavior in a way that is not allowed by the Abstract Machine. Seperately allocated memory must have disjoint addresses in the Abstract Machine.

RalfJung · January 23, 2024, 4:06pm

Unfortunately, we can't tell programmers "only sane programs will be compiled correctly".

So you have to find some way to rationalize what happens here; if you declare this program as "not sane" then you better define precisely what makes it "not sane" and how we can make sure that real-world programs are sane.

The problem is that the rule is self-referential, and leads to logical contradictions when put into formal mathematics. You first have to define the behavior of a program before you can even ask whether any given execution of the program fits into any given resource bounds. For the tricky ("not sane") programs in question, there is no such behavior we can define. (Well, there are very complicated proposals that @CAD97 alluded to above, but whether they actually work is an interesting open question. They involve putting multiple allocations "on top of each other" in the address space.)

Accidentally self-referential definitions are one of the oldest mathematical and logical troubles: "This sentence is false." Not everything that you can say in words and that seems to make sense, actually makes sense.

dlight · January 23, 2024, 6:27pm

But this can't be true. In the real machine, the stack lives in the same address space as the heap and is as much logically finite as the heap, since you can observe addresses in the stack and addresses must be all different. How could it be correct for the abstract machine be modeled like this?

(I understand that the abstract machine is a mathematical model that may differ from the real machine, but the abstract machine is a model to reason about the program and in practice, programs can observe both the finiteness of stack and the heap)

Vorpal · January 23, 2024, 6:48pm

I may be talking complete nonsense here, I'm not a mathematician. But:

This problem reminds me of Gödels incompleteness, in that every non-trivial internally consistent mathematical model has true things that it can't prove. Perhaps we need to extend this mathematical model of Rust with some additional rule/axiom to cover this case.

Im not sure what the problem with saying "memory allocations that are not observed may be elided, memory allocations may be allocated smaller than given (under as-if behaviour), it is not possible to reliably observe running out of resources."

Basically, for the purposes of resource exhaustion we model there being infinite resources (as is already done for the stack apparently) but those might run out at some unspecified point at runtime.

You might argue that there should be a lower bound (can allocate at least as much memory as exists in the physical machine), but that isn't true either. The OS might not give us all ram. It might not even give us all virtual addresses. And even in no-OS no-std how much resources you actually have might depend on how well the compiler optimises the binary (smaller binary, more resources left over for allocations).

Maybe that doesn't make mathematical model sense, but that is how it works in practise, and I would argue the model is deficient and needs to be augmented to properly model the real world.

quaternic · January 23, 2024, 7:02pm

But your program doesn't observe where the allocations live. Adding a if(x<y) printf("x<y\n"); prevents eliding the allocations.

Could the rule then be that allocations are allowed to live outside the finite address space if their specific address does not affect program behaviour?

dlight · January 23, 2024, 8:38pm

I think the problem here is of logical inconsistency: it was a logical conclusion that the program would terminate due to address space exhaustion, yet it didn't.

But if this were the case, I don't see how the program can't observe address space exhaustion from stack growth too (since the stack is also on the same address space)

It seems that the abstract machine models the heap and stack differently, but this modeling seems to provide undesirable results in this case. Or rather, for the abstract machine to be useful, it should recognize that both stack and heap can cause memory exhaustion, and such exhaustion is nondeterministic. It should model both forms of memory exactly the same, for the point of view of how observable memory exhaustion is.

So I think the best solution here is to change the abstract machine, and say that if you don't actually observe the addresses of allocations (either heap or stack), then you can't make logical conclusions about address space exhaustion. So a function that internally allocate space on the heap but doesn't leak it in any way can be optimized in such a way to remove the heap allocation (so for example, if you have a Vec in a local variable and it doesn't leak, it is an implementation detail and the allocation can be completely elided by the optimizer). This happens even if the allocated memory would be otherwise too big to fit on ram.

Topic		Replies	Views
#[panic_handler] will go the same way as #[global_allocator]? language design	4	718	June 12, 2023
Notes + Interviews on Fallible Collection Allocations	53	7582	March 25, 2019
Allocation failure should panic language design	2	4020	March 25, 2019
Layout is computed even for (de)allocators that don't use it compiler	7	979	October 16, 2022
pre-pre-RFC: Execution Context language design	8	1451	March 25, 2019

"The optimizer may further assume that allocation is infallible"

Related Topics