"The optimizer may further assume that allocation is infallible"

aside

Mea culpa; it was intended to stay positive/constructive but I can see how it could be interpreted otherwise. More "and that's how we got here" than anything else. The "working model" is a lot looser here than SB/TB as "working model" for the dynamic borrow rules.

It's only doing safe operations, so it's necessarily sound. It might potentially be incorrect and improperly return Some if object addresses are allowed to overlap, but it'll always be sound.

The most provenance-aware way of writing that would be

fn subslice_offset<T>(outer: &[T], inner: &[T]) -> Option<usize> {
    let outer_range = outer.as_ptr_range();
    let inner_ptr = inner.as_ptr();
    if outer_range.contains(inner_ptr) {
        Some((inner_ptr.addr() - outer_range.start.addr()) / size_of::<T>())
    } else {
        None
    }
}

(doing comparisons directly with pointers) but it does seem improper to require the use of wrapping_ semantics here instead of sub_ptr.

As you observe in the earlier example (many highly aligned allocs), today's compilers will break the idea of .addr() existing instead of having address overlapping happen. That example is actually the most I've questioned the viability of a strict PNVI model, since "obviously" p1.addr() == p2.addr() should fold to false when p1 and p2 are separate live allocations, but if you do that peephole optimization enough times, suddenly you're implying that more numbers exist than do.

The general expectation has been that pointers would always maintain a strict ordering (thus a < b && b < c where a and c are within the same object implies b is also within that same object if within any object). The typical current implementation is that testing for a specific order between allocations realizes them (puts them in the concrete allocation plane).

I'm now wondering if the only fully coherent model is for .addr() to effectively have a side-effect of realizing the allocation. It'd be quite unfortunate to lose purity, but it's still a meaningfully less impactful side-effect than .expose_addr().

Or maybe we end up with some model between PVI and PNVI that can better approximate current behavior, e.g. (I just made up) "provenance via integers only if convenient" where integers may still have provenance attached for a demonically nondeterministic period after being derived from provenance-carrying pointers.

1 Like

Opsem is the only approach that has been shown to actually work. There exist two verified compilers, CompCert (for a dialect of C with less UB and no pointer-integer-casts) and CakeML (for ML), and both use opsem. They both, to my knowledge, are able to remove (some) dead local variables.

I have not seen a single example of rigorous (mathematical, formal) reasoning in terms of optimizations about any optimization, including removing dead local variables. Please point me to one if you have. We have seen plenty of examples of faulty reasoning in terms of optimizations. So yes, people reason in terms of optimizations all the time, and they get it wrong.

So the evidence doesn't support your claim, it supports mine.

(I'm on a workshop this week, won't be able to read or reply to the rest of this thread.)

2 Likes

Without pointer-integer conversions, or some other way to observe the finiteness of the address space, this isn't a hard problem. But CompCert doesn't even go very far with the easier version of the problem. If you write int a; int *pa = &a;, then it can't remove a. It doesn’t optimize &a == &b to false (and in fact doesn’t even optimize &a == &a to true). So yes, technically it satisfies the criterion of "opsem-based reasoning that can remove dead local variables", but in such a narrow sense as to be meaningless to the discussion.

(To be clear, I find CompCert super impressive despite those limitations. Formal verification is hard.)

Anyway, this is getting off-topic. And it's moot, because I already said that I changed my mind somewhat.

4 Likes

Your statement about "people reason in terms of optimizations all the time" can be expanded to show why opsem is the current front-runner for defining what the language means, and thus what optimizations are sound.

We do two sorts of reasoning in terms of optimizations; it's routine for any skilled developer to look at "sane" code (by their understanding of the word) and reason about what optimizations can be applied to get to "optimal" code, and what optimizations cannot be applied. When we're doing this sort of reasoning, lots of things are ignored, because they fall outside the "sane" boundary - no program, for example, deliberately seeks out the limits of resources just to see if the compiler will let it find those limits.

This is a really useful form of reasoning when hand-optimizing code - "the intended meaning of this fragment is this, ergo I can swap it out for this and get the same effect" - which is part of why skilled developers do it routinely. However, it doesn't consider all the possible meanings of the code; instead, it relies on the optimization developer correctly deducing the intended meaning of the code and only applying optimizations that work in this context.

Lots of optimization developers work this way, too - it's a considerably easier way to work: "this code can be transformed into this faster code by doing this" - but it's led to bugs where chaining optimizations together outside the original context results in the optimization changing the semantics of the code - Ralf linked to a talk that describes how this happened in LLVM, and was found by people trying to formalise the semantics of LLVM.

Fortunately, this isn't the only way to reason about optimizations; the other way to reason, which opsem formalises, is to say that the code as written before optimization has one set of allowed interpretations, and the code after each optimization applies has a set of allowed interpretations that is equal to or a subset of the allowed interpretations before applying that optimization.

The downside of the opsem model is that it's a much harder place to work. Instead of saying "here is an example of unoptimized code, and this transform converts it into more optimal code", you have to define a formal model in which you can say "the semantics of this unoptimized code permit the following outcomes; this transform removes these outcomes from the set of possible outcomes, and does not add new outcomes".

I, personally, find this second form of reasoning much harder to get right; I'm still trying to come up with a suitable way to describe my first example where I want to allow the optimizer to change an allocation of 256 items followed by a free of 128 items into a realloc from 128 items to 256 items in an opsem-compatible style. It is obvious, when looking at my dumb example, that it should be OK to make this type of transform in all the code that I would consider "sane", but actually pinning down what the semantics are before the transform in such a way that the transform is legal turns out to be very challenging - not least because the simple semantics will not permit that transform (since s1 cannot be freed before v is allocated, but I want the optimizer to be permitted to determine that it can reuse s1's allocation for v if it can show that the elements of s1 are dropped or moved before

2 Likes

Can this not be solved by the as-if rule (assuming infinite memory so you can't see running out of ram): Does the program generate the same observable IO interactions before and after optimisation?

You have to assume infinite memory, because otherwise you don't risk running out of just address space, but also actual ram. And you can't know in advance how much ram the program will use (Rice's theorem) or how much ram the target system even has installed.

So resource exhaustion cannot be considered observable if you want to do any optimisation at all. Besides a new version of the standard library, compiler or OS could change the pre-optimisation baseline anyway, which would then become a breaking change (maybe?) .

Some of the embedded systems I code for has less than 1 MB of ram. Being able to optimise away dead variables is quite important.

That's the whole thing that opsem is trying to turn into a formal description; what, exactly, does the "as-if rule" actually mean if you go from intuitive meaning (where we agree that it applies here, and it "should" allow the compiler to transform my examples the way I'm suggesting) to a formalized meaning that allows simple verification that the transform does not break the as-if rule.

Indeed, the following is a restatement of the as-if rule, but in terms of how you say things within the opsem model of optimizations:

In this restatement, lots of things that are a bit hand-wavy in the original formulation are made explicit; instead of "observable I/O interactions" and "assuming infinite memory", I have the set of possible interpretations of my original code - these will include things like "OOM" and "generates no I/O", as well as others like "causes I/O to swap objects out of memory to make room for my new allocation". I then have the transform, which can provably only remove interpretations from that set or leave it unchanged, but not add to it; for example, it can remove OOM from the set in some cases, and might remove I/O involved in swapping, but it won't add new interpretations.

In turn, this means that we need to define the semantics of the source code in a way that allows for important optimizations, while not surprising readers; we want the formal opsem definition of what my Rust source means to both have lots of interpretations, so that most transforms are allowed, while also not having so many interpretations that I get surprised when I reason about my Rust code less formally than opsem requires.

For example, what is "the same observable IO interactions"? It's possible for memory allocations to result in an I/O interaction - but does this count as an "observable I/O interaction" or not, given that it's part of the I/O deliberately performed by the program, and not something the platform does behind your back to emulate infinite memory? Or can we declare it as "not observable", since it's something the program is doing to emulate infinite memory? Bear in mind in both cases that we're talking about an embedded system, where the allocator and associated I/O interactions are part of the source program we're optimizing, and not part of the platform - in Rust terms, we'd be supplying the allocator via #[global_allocator] since there would be no allocator if we didn't supply one.

This is the sort of detail that opsem models force us to clarify; by forcing us to have a formal model of what possible interpretations of my original code are permissible, and what interpretations are not, it forces us down a route where we have a very clear definition of what I/O interactions are observable, and which ones are incidental.

4 Likes

[quote="farnz, post:64, topic:19780"] Ralf linked to a talk that describes how this happened in LLVM, and was found by people trying to formalise the semantics of LLVM. [/quote]

Sorry, which link is it?

Ralf linked to his blog post which links to Chung-Kil Hur's slides for a talk at OOPSLA 18 on "Reconciling High-Level Optimizations and Low-Level Code in LLVM".

The "Conclusion" section in Ralf's blog post also has several interesting links to follow - two to papers, and one to the "Alive" project.

Ah, that's a fun one. :smiley:

Yes I think there can be basically arbitrary logical contradictions from this.

The question is, do you want a compiler that's held together by duct tape and "we haven't seen a counterexample yet", or a compiler that's held together by solid engineering and systematically approaching the problems involved? I know what I prefer.

AFAIK the plan for TCO is that it be explicit, via some sort of become keyword? In that case the spec can say that the caller stack frame is deallocated before the callee stack frame is created, so this is not a problem.

But yes of course optimizations that remove unnecessary allocations are highly desirable. Even mem2reg is an optimization of that form, turning memory allocations into non-addressable registers.

Agreed. It's a problem, but not a high-priority one.

In which sense is it less impactful?

Sorry for that. I was slightly triggered by the paragraph I quoted, and felt the urge to reply before having time to read the entire thread.

You are right, it's hard! Doing a proper proof of such optimizations requires pretty advanced PL techniques. If you dig into the rabbit hole of how we proved things about Stacked Borrows, you might find that it is surprisingly deep.

2 Likes

Not super significantly — since the allocation events have to happen, the path can't be completely eliminated, making it still a painful constraint on things like map_addr based pointer tagging — but it never constrains the contents of the memory like exposing provenance does (i.e. when the exposed provenance cannot be proven to

However, do note that in a world where .addr() realizes the allocation, I imagine transmute should still be fully pure. This is to allow storing a pointer to memory to not necessitate allocation realization. The intent is to capture a rule for the concept of "if you can concretely answer information about an allocation's address

Another way of putting it: Let the domain of allocation addresses be all integers. Transmuting pointer bytes to integer bytes produces the address wrapped to the domain of usize. ptr.addr() and comparison over pointers with addresses outside the domain of usize is non-behavior. [The implementation must prevent non-behavior from occuring.] All live objects' addresses are disjoint. [If execution would require address overlap, then resource exhaustion must terminate execution before that point.]

This wording doesn't ever require allocation to actually occur, so the lowering could still elide allocation so long as it still somehow reserves address space for the objects (perhaps utilizing addresses that can be known to never hold an actual allocation, e.g. non-canonical ones), but the most straightforward way of ensuring disjointness is to let the allocator handle it. (A system allocation mode that does everything except make the pointer actually valid for loads is an interesting concept. Probably never useful, but interesting, and maybe possible by abusing lazy pages and/or overcommit.) The extent of non-behavior of addresses outside the domain of usize is a way of controlling how restrictive the approach is to elision.

But honestly I think that if it comes down to it, simply permitting objects to have overlapping addresses is probably a more desirable end state than requiring developers to use unsafe to promise they won't make conclusions from address exhaustion. There's already necessarily other QOI promises around allocation and resource exhaustion at runtime, so it's not much to add on a QOI promise to not cause address overlap outside of degenerate scenarios to the pile. Such promises aren't sufficient for the compiler to make justifications, but developers relying on them (e.g. ignoring the existence of I-unsound issues) is generally fine.


Distinct objects' addresses overlapping is also a possibility if the allocation provider is in-model since the minting of the "allocated object" happens between two parts of the code which could also communicate not through that border. But for that case, while the objects do exist concurrently, at least they aren't concurrently accessible.

Off topic note with no time to put a better place:

IIRC, since deallocation makes the allocated object go away, it's very difficult to justify changing deallocation timing at the AM level. You must first lower to an IR relaxing "inbounds" requirements to "nowrap" if provenance of the allocation was ever potentially exposed, as while retagging can revoke the provenance's read permissions, the "offset permission" is only revoked on deallocation. Never revoking provenance's "inbounds" would've been a different way to address the provenance monotonicity problems with deallocated provenance being more restricted than no provenance, as an alternative to the specific blessing for zero sized access/offset we did. Though I don't claim to know whether that's compatible with LLVM semantics.

2 Likes

Not quite; LLVM inbounds actually means "inbounds of a live or dead allocation", so getelementptr inbounds can be freely reordered with deallocation.

1 Like

But note that this involved stack allocations, not heap allocations.

Which goes back to my point that stack and heap should have the same semantics in the abstract machine because they actually share an address space.

The addressable variables on the stack work exactly like heap allocations, yes.

The "control" part of the stack, and in particular the actual "stack" part, is completely separate. I'm not sure how readable that is, but you can see it spelled out for MiniRust here.

1 Like