"The optimizer may further assume that allocation is infallible"

Here is what I think might work to allow such optimizations:

  • pointers can have values outside of what usize can represent (any arbitrary integers), and allocations are allowed to be placed at such addresses
  • if a pointer is ever cast to usize, it is guaranteed that it actually had a value that fits in usize

So the abstract machine can "guess the future" and risk allocating at magic addresses, but it's predicting the future in a smart way such that it never gets caught having to convert such an address to usize.

4 Likes

So I infer what you're saying is that the allocations would be allowed to reside at the same address, so long as you don't check? Okay, so I'll test if (x != y) printf("x != y\n") instead. Adding that test doesn't prevent allocation elision yet.

So how much "observation" is needed? When does the allocation go from elidable to observed? Answering that question is the problem point. And the only way to answer it fully reliably in the face of diverse optimization is a model of what allocation does in the abstract sense when it doesn't have an observed address.

[godbolt]

This is the effective working proposal, but made a bit more complicated in how it's actually defined (address are still usize, using "allocation planes" for extended address space). This is mostly to resolve the fact that we really want transmute_copy::<pointer, usize> to be a (pseudo) no-op (that discards provenance). You also necessarily run into the yet-undecided question of what exactly exposing a pointer entails.

And also now this has made me wonder about whether ptr.wrapping_offset(isize::MIN).wrapping_offset(isize::MIN) is valid to dereference... "infinite abstract address space" would answer no, but "separate usize address planes" would answer yes. (And answering "no" is a potential surprise to a method with "wrapping" in the name.)

1 Like

I don't think they are saying that at all. The allocations live at unique addresses. Those addresses may be outside the normal address space until you actually observe them. They may be in any order (after all there is no guarantee that the heap grows in a certain direction, most allocators use pools, mmap etc anyway).

This makes equality work as expected without making the address observed. In addition, inequalities return a result (which should be consistent and commutative for any pair of allocations and preferably transitive as well, unless LLVM already messes with the transitive property of course, which would be really unfortunate).

1 Like

I think it's a good direction for thinking about this problem. It also looks relevant to the provenance discussion.

let p1 = malloc(32);
free(p1);
let p2 = malloc(32);
free(p2);

Semantically, allocations p1 and p2 are different, even if p1 as usize is equal to p2 as usize. So we can say that the abstract machine works with "infinite" memory (so stack and heap will be treated equally) and pointers in it contain not only offset from null (the part which you get when you cast pointers to usize), but also a unique "allocation ID" (an arbitrary integer), which does not get represented during lowering to machine code, but it's relevant to the notion of temporal memory safety.

So with the malloc(-1) examples we do indeed get disjoint memory regions on the abstract machine level, because each allocation gets created with different allocation IDs. Compiler then proves that one of allocations does not have any side effects and thus can be eliminated.

2 Likes

Those rules can't explain this program, which only observes the pointers by comparing them against each other and (int*)0xffffffffffffffffUL, yet still concludes there's more memory allocated than fits in the address space

I would think that comparing to (int*)0xffffffffffffffffUL (or any other specific value) would indeed count as observing the value of the pointer.

But even so, wouldn't it fall under the suggested rule that the program cannot reliably observe resource exhaustion as the compiler is allowed to optimise away allocations that aren't actually used?

I don't think any of the proposed rules can be used in isolation, they probably have to be combined to get the effect we want.

From what I understood, it only make sense to compare the address of objects within the same allocation. If this rule is true, the allocator could totally be allowed to always return the same address for each new allocations, and store the relevant information of where this allocation is physically made in its metadata. And this also means that it’s absolutely not possible to observe abstract machine heap exhaustion.

2 Likes

Hang on, even if you use allocation planes there is no legal address for malloc(-1) due to the per-platform alignment guarantee, that malloc returns are either 8 or 16 byte aligned. LLVM could easily prove that malloc must fail on certain platforms. You need to alloc(-1, 1) instead...

I was talking only about the "control stack", i.e. what would be the frame pointer and return pointer. The contents of the local variables are just stored in memory. There is only one big memory, no stack/heap separation.

I guess we could imagine a compiler optimizing away local variables of size half the address space while still pretending that there actually is memory backing them. That would be similar to the malloc issue. I have not seen examples that would demonstrate this. And it's not even a different problem, malloc and local variables are just two ways to allocate blocks of memory.

My answer above referred to the typical stack overflow, which occurs not because of address space exhaustion but because the stack grows to collide with memory used for different purposes. This is a completely arbitrary target limitation and not properly observable in the AM. It is captured by the specification saying that any function call can fail with an "out of stack" error, and having an unbounded (control) stack itself. Optimizations that reduce the size of the stack are fine since we're not even saying that the stack fits into some articular memory range.

No, there is no relation. (Also, it's about the logic not being able to prove things that are true in the model. The model is used to define truth and falsehood, the logic is used as a tool to derive truth. So saying a "model has true things it can't prove" is mixing up the terminology a bunch.)

It observes that there exists addresses in 0 .. 2^64 where they live, because that's how large the address of a pointer is. There's no need to print these addresses, they must logical exist for memory reads/writes to happen.

No luck, adding casts makes no difference.

Since this is something LLVM does and the best minds of Rust opsem can't come up with a model for it, can we appeal to/defer to whatever specification LLVM itself has? I believe there is a LLVM language reference or similar, though I'm not sure if this would be documented there or elsewhere.

Perhaps you could ask for clarification about what their model is if their documentation is lacking.

The LLVM LangRef is incomplete and does not cover several important aspects of the language -- including the one we are discussing here.

3 Likes

Yeah, that's really the problem.

I had a whole response I was going to send to your post last month, which boiled down to: people reason in terms of optimizations all the time, and opsem is the one that 'has not been demonstrated to achieve' the goal of actually being able to reason about basic optimizations like removing dead local variables. That's especially problematic in combination with recursion elimination, which can turn O(n) stack allocations into O(1).

But I didn't send the response because I had second thoughts. I still think reasonable code is never going to break due to these optimizations. But Rust of course aims to be sound even in the face of unreasonable code, as long as it's safe. I don't know if LLVM is exploitable today using this optimization. But LLVM does things that would be exploitable if it were just better at propagating facts about integers.

For instance, here is a simple example where we start with 9 32-bit integers, and we check that all of them are multiples of 0x2000_0000, but none of them are equal to each other – a logical contradiction. I actually tried to turn this into an exploit, but couldn't find a way. LLVM is just not very smart sometimes… just look at this missed optimization.

Still, it's untenable to rely on LLVM not getting smarter about this in the future (or any other codegen backend, for that matter).

But then what's the solution? As @CAD97 suggested, it may be possible to specify conditions under which variables "can't overlap and must actually correspond to a nonabstract memory address" – and thus can't be eliminated if dead. But I suspect it is hard to do that without having a big impact on optimizability.

3 Likes

The relevant conjecture is that given code written to be provenance aware, the set of cases where allocated objects must exist should be the same as the set of cases where the object's address is exposed, which should be the same as the set of cases where the address has escaped analysis and the compiler already has to be maximally conservative anyway.

Of course, this is entirely conjecture at the time being. Pointer mucking code uses ptrtoint and inttoptr, not the provenance-aware map_addr. LLVM implements some weird variant of PVI where pointers aren't escaped until the address-dependent computation escapes. So in the current world, it would mean treating more objects as escaped and blocking optimizations that are currently done.

C/C++ doesn't not care about this, but because of the compilation model and rare application of C restrict, it's quickly not a problem in practice; calling a function defined in a different file means any provided pointers are effectively permanently exposed as far as optimizations can rely on.

As optimizers get stronger and languages and libraries provide stronger guarantees to the optimizer, these concerns become more practical. The examples we use usually look silly, but that's to make understanding the discussed property reasonable. Most optimization is peephole fixing up "silly" code evident after inlining, anyway.

2 Likes

Well, then what happens if someone calls .addr() without exposing, or simply uses pointer comparisons? Would the compiler have to assume that different (simultaneously live) objects can have the same address and have their pointers compare equal?

I don't see how this makes a difference. None of the litmus tests I've seen in this thread depend on restrict. And I wouldn't say inlining is particularly more common in Rust than in C++.

Then the question seems to become: do we want the optimisations or the stricter behaviour. As an embedded developer I definitely want the optimisations here (at least until they break my code, hah!).

But that leads to my next point: I have yet too see any credible example (either actual code, or description of a scenario) of actual real world code that would be broken by the optimisations described in this thread. Having at least one such example might help motivate us who are more practically inclined.

As it is, all examples are of the form "panic if allocations have been optimised to fit in address space". That is the whole point of this optimisation though! And I don't see a way you could have tail recursion optimisation (TCO) without that. Logically separate allocations get merged into one across call frames.

Imagine a case of TCO where I print the address of a local buffer to stdout. Should that printing inhibit TCO? I would argue not, I might want that value for debugging purposes, and it should not affect the escape analysis.

So yes, I would like to see an example of code breaking that isn't just "ooh I allocated too much logically, let's panic!".

1 Like

FWIW –

Demonstrations of miscompilations do tend to look artificial regardless of how likely the miscompilations are to occur in real code. But I would argue that, nevertheless, some miscompilations are (much) more likely to occur in real code than others. And address-space-exhaustion-based miscompilations are near the bottom of the list.

(Where I'm defining 'miscompilation' loosely as 'the compiler did something the programmer didn't want'.)

I would consider this factor relevant if I were designing a C compiler (which is not to say everyone who works on C compilers would agree with me!). But the factor doesn't make much difference here, because Rust tries to maintain soundness even in artificial examples, something which I agree is valuable.

3 Likes

It does currently!

pub fn test() {
    let x = [0u64; 128];  // sub	rsp, 1104
    // let x = [0u64; 8]; // sub	rsp, 152
    println!("{:p}", &x);
    test() // call	qword ptr [rip + playground::test@GOTPCREL]
}

Formatting IO is a bad example to pick; it's extremely opaque to the optimizer and blocks most optimization as if it were a fully unknown extern fn.

Except that's exactly what's under question. The question is how to justify optimizing out resource consumption in the face of code being able to observe states which would imply resource exhaustion has occurred.

No practical program is going to exhaust the memory space in a way that both the program can tell and the optimizer can keep it from happening. But if you want your compiler to be correct, "practical" isn't an argument; you need to be correct for all valid programs.

The "solution" is "don't do that." But Rust exists because we dare to believe that "don't do that" isn't a great solution, and there are better options available.

In non-generic code, it is, due to the division of compilation units. For generic/templated code, it's comparable.

malloc semantics in LLVM are actually achieved by putting noalias (restrict) on the return type, so it's present, just invisible; the semantics are inferred rather than added by the user.

Either we say live objects can have the same .addr() or .addr() becomes sideeffectful. Since .addr() semantically only returns part of the pointer, it's much simpler to justify .addr() overlap than it is .expose_addr() overlap.

Sure, on one hand it's just shifting the problem. But in shifting the problem, it's shifting it into a paradigm that acknowledges (more of) the problem. After all, what gets more people to acknowledge the difference is CHERI making pointers 128+1 bits and .addr() still 64 bits, which is obviously a modular reduction of pointer space.

I agree here, which is why object overlapping / resource (non) exhaustion are relatively lower priority "holes" in the model. We very much would prefer to plug them so it's possible to fully justify optimizing transforms as being correct, but working with the softer world of pretending it's a non-issue typically is sufficient.

2 Likes

How about this for a model: since pointer provenance means you really can't use pointers from one allocation with another, we conceptually say that each allocation lives in a separate address space. The optimiser may map them into the same address space in practise, and comparing pointers from different allocations may give unexpected results (though it isnt UB, since the operation is safe).

In particular, the compiler may say that allocations are inequal if they come from different allocations, even if their unobserved actual value would be equal. There may not be a consistent total ordering when comparing multiple allocations with lt/gt, though any specific comparison should give a stable result.

Since there is already precedence for multiple address spaces with identical integer representations of pointers (wasm, AVR) this could be the way to go? Possibly with some tweaks and modifications (rather than something invented in 5 minutes late at night). In particular you might need some language about the null pointer being identical between such virtual address spaces.

1 Like

Congratulations, you've reinvented the "allocation planes" model that's been alluded to multiple times (in various detail) prior in the thread. There's good reason why it's the leading working model.

Unless someone were to come forward with a full formalization and proof of correctness for their model, it's extremely unlikely you'll add something relevantly new to the discussion at this point on the model side of things.

(Wrapping the address space isn't a concern for Rust at least, since we have a maximum object size of isize::MAX and maximum alignment of usize::BITS.)

1 Like

Please consider your tone in the quoted post. Yes, some of this may have been discussed, but I don't think that discussion reached any real conclusions, and it was based on a somewhat outdated notion of provenance.

Anyway…

I can buy addr() overlapping, but I do wonder what happens to pointer comparisons for distinct live pointers with equal addr().

Do they also compare equal (or is it nondeterministic)? Then you completely lose the ability to tell whether two pointers point to the same object.

Do they compare non-equal? It feels like it could be possible for a compiler to guarantee that elided allocations compare non-equal to each other and to real allocations, but I'm not sure. If it is possible, then I think pointers would need to be modeled as having three parts: address, provenance (which does not affect comparisons), and some extra chunk of hidden state that makes distinct live pointers compare non-equal. A bit odd, but maybe workable.

That also leaves the question of what happens to things like offset_from or the gt/lt comparisons that Vorpal mentioned. Is there a way for subslice_offset to be sound?

2 Likes