Documentation should only refine types (or why Pin::get_unchecked_mut is a ticking bomb)

Yeah, that's the kind of stuff that happens when semantics are defined axiomatically rather than operationally...

2 Likes

Yes, that's what I'm proposing.

What is the issue with unsafe fn? This is a type annotation that extends the semantics of the function type. For example if |fn(T)| is defined as |fn|(|T|) i.e. the semantics of functions taking one argument applied to the semantics of that argument, then |unsafe fn(T)| is the union of |fn|(s) for all s ⊆ |T|. Two concrete examples would be:

/// Pre: i < xs.len()
unsafe fn get_unchecked(xs: &[u8], i: usize) -> u8;

/// Pre: p is valid for read, aligned, and initialized
unsafe fn read(p: *const u8) -> u8;

The implementation of those functions can't be typed without unsafe because they need to support all possible inputs. Using the unsafe keyword, the semantics is extended to also allow functions that only supports refinements of the inputs.

Yes, that's a real issue. But my hope is that it's possible to have multiple layers in the language. And users only need to bother with the levels they need. I'm not expecting anyone to write low-level code. But I hope anyone can write high-level code. How to achieve this is challenging but I don't see (yet) any theoretical reasons why it wouldn't be possible.

I think this is ok. This is an issue, but it can be worked around. The idea is that when we realize we did a bad semantics choice for a syntactic type, we introduce a new syntactic type with the new semantics. If there is a subtyping relation between both (usually there is), we introduce it. This should help incrementally migrating code to the new type. Those migrations may be huge, in particular if we want to change the semantics of &mut T which ideally we eventually would.

My hope is that people don't write low-level code. This should be left to experts. The level hierarchy should break this dichotomy into a continuum where there is not a single notion of low-level expert but a multitude of low-level expertise. This is needed because low-level stuff can greatly vary and I don't expect anyone to understand all of it. Besides the level hierarchy/lattice, some composability between the low-level semantics is needed. That's also challenging. And that one might actually be impossible (unlike the level lattice).

Yes, I agree this is much safer and a good decision. It's just that in that case, Rust is not my dream language and I'll keep looking for it.

It's not just C. Rust did the same. It didn't come upfront with sound tooling testing whether the escape hatches are used correctly. There was first adoption with a lot of unsafe code, then came Miri. It's the same with C but a different timescale. Tools like ASan, TSan, MSan, UBSan, Valgrind, or tis-interpreter (I believe all those tools predate Miri) came after the language was adopted. Rust was just faster to react, thanks to the learnings of C (and the work at RustBelt).

If I have time (after August) I'll try to see if I can come up with a toy language that demonstrate the type lattice I have in mind (e.g. for Pin). I'm curious to see if low-level details leak between levels, which would mean the language won't be accessible and probably unusable.

I think a primary thing your system disallows is temporary relaxation.

If all types are by-value, this is done in your system by the extensive refinement subtyping rules, such that if I have e.g. u32 in 123..456, I can just transparently relax it to u32.

But we don't only have by-value types. Most types are by-pointer.

I think your design could work for a pure functional language where mutability isn't a thing. But where mutability is a thing, you have the very problematic case of taking &mut subtype and using it as &mut supertype being 100% valid so long as you put back a value in the subtype set before the &mut supertype is visible again.

This is in effect exactly the "postcondition antipattern" which you want to avoid. But APIs like str::as_bytes_mut are important. There is no problem with writing arbitrary bytes into a string so long as you fix it back up into wf UTF-8 before anyone uses it as a string.

This just isn't something that is expressible only with subtype relationships. I think your model works in a pure functional paradigm (the land of formally clean designs) but fails once non-SSA things start happening.

(To that end, it's interesting to note that LLVM scalar attributes work much like you describe.)

And that's still ignoring the fact that we want the language to be a distinct unit to the library. The language rules are and should be distinct from what library APIs document. (And libraries can and do "discover new possible optimizations" all of the time.)

So if I understand correctly, an unsafe fn(T) is correct if there exists ("union") some precondition stronger than T that makes the function correct? Great, I pick s := ∅, i.e., precondition False. This means all unsafe fn are trivially "semantically well-typed", which makes that claim rather useless.

I wasn't talking about having picked the wrong semantics for a particular type. I was talking having set up the entire structure of the semantics in such a way that we accidentally ruled out some things we meant to allow. This can happen easily, e.g., the original RustBelt would not have allowed GhostCell. The only way to fix this is a global change and re-checking all proofs -- in other words, a breaking change, Rust 2.0.

Mistakes have certainly been made, but I think it is fair to say that Rust's UB story never was nearly as bad as that of C. Miri did not exist from the start, but there were also no choices made that would make Miri impossible. (For example, in C, an infinite loop can be UB under some circumstances. That kind of UB is a very bad idea IMO and Rust never had it.) To be clear, I am not blaming C here; the very understanding of what UB even is changed during the lifetime of C, so some early choices have not aged well.

I think those are 2 different things:

  1. The post-condition anti-pattern I'm talking about is when you take a syntactic type and document it to have a bigger semantics type. This usually happens as post-condition of unsafe functions like Pin::get_unchecked_mut but is not specific to post-conditions.
  2. What you're talking about is documenting some condition (post or pre doesn't matter like above) but it's still within the syntactic type although it might look bigger compared to something else (in your case the same type but at a different program point). For example a function taking a &mut [u8] such that it's UTF-8 before and/or after the function.

The refinement still works through pointers (otherwise I agree it would be useless). The &mut [u8] such that it's UTF-8 is a good example. Note that with mutable pointers there are multiple ways to refine since you can ask for UTF-8 validity at different time points: at beginning of function, after function, after borrow lifetime, or any combination. Each of those points is a different invariant. If you want, a function taking a &mut T can change the type. Here are some clarifying examples:

/// Pre: xs is valid UTF-8
/// Post: xs is valid UTF-8
unsafe fn make_ascii_uppercase(xs: &mut [u8]);
// This function can be called with the subtype where xs is not necessarily
// valid UTF-8 after reborrow.
// In other words the call-site doesn't care that xs is still UTF-8 after
// the call.

/// Post: xs is valid UTF-8
fn make_utf8_lossy(xs: &mut [u8]);  // overwrite with null bytes when needed
// This function can be called with the subtype where xs is valid UTF-8 before
// reborrow and not necessarily valid after.
// That's arguable why the function would be called, but maybe the fact that
// xs is valid UTF-8 after reborrow is not necessary for soundness but just
// for functional correctness.

But it makes those functions useless. The art of typing a program is to find the correct invariants at all program points such that all pieces of the puzzle fit together. You want to find a semantics that contains the union of the semantics of all call-sites while being included in the semantics of the function definition (choosing a semantics within those constraints is called API design and usually involves anticipating new usages i.e. call-sites and new implementations i.e. definitions). If you know you will never call the function you can use the empty set, but I'm not sure why even write such function.

I see. I'm assuming there's a canonical semantics framework where all valid programs can be typed. The idea being that the semantics framework is as expressive as the theory you're using, i.e. you can encode arbitrary invariants in types. Maybe that's not possible. The way I see the RustBelt/GhostCell issue is that RustBelt is not as expressive as Coq (maybe due to Iris?). There's some amount of assumptions to simplify things. Maybe those assumptions are actually needed (at least some of them), I don't know. But my intuition is that it should possible to interpret directly in your theory (Coq, ZFC, whatever). This way, when you extend the type system syntax, the interpretation of those syntax already interoperate with the existing ones.

Yes, I also agree the Rust situation is much better for many reasons (better starting position, shorter reaction time, etc).

I fail to see how "&mut _ but you must not move from it" is a distinct kind of type refinement from "&mut [u8] but you must put well-formed UTF-8 back into it before releasing the loan".

Important note: I am presuming that, as is the current case and intent, "can be &mut aliased" is a property of the pointee type. That is, whether annotating &mut _ as LLVM noalias is valid is not determined by whether a Pin has been constructed, but by a property of the pointee type. This is currently determined by the Unpin autotrait, but might change in the future to act like UnsafeCell.

This makes "&mut impl ?Unpin but it has not been pinned yet" a wf subtype of "&mut impl ?Unpin and it might be pinned".

The only inversion then is that safe code is not syntactically restricted[1] from treating "&mut impl ?Unpin and it might be pinned" as "&mut impl ?Unpin but it has not been pinned yet." IIUC, this is the core issue that you are taking; that code is allowed to assume a syntactically allowed operation is not done. But this is, in fact, the purpose of unsafe; to put further syntactic restrictions on safe code.

Rust does not support type refinement, and I suspect it never will support in-place type refinement. You did not make it clear up front that you were assuming so much type system expressivity; you indicated subtype


To your core point of "discovering optimizations," though: it is still always valid to upgrade "library UB" to "language UB" in a new version of the library. (That is the point of "library UB.")

What is not a valid "discovered optimization" is taking existing user code which is not "language UB" and making it "language UB" when no library code is updated.

Rust guarantees address stability[2] of rust allocated objects, so a new version of the compiler cannot violate that guarantee. However, it would be valid (however impractical) to introduce a new primitive[3] which didn't.


  1. I use this term because I think this usage matches yours, though I disagree that validity of reference use is a "syntactic" restriction. The syntax of Rust is purely what is accepted under #[cfg(FALSE)]. Any restriction of the type system is a semantic restriction. (And restrictions placed via documentation do not exist at a language level, only at a library level.) â†Šī¸Ž

  2. All the normal caveats apply; if you can't tell the difference and it behaves as-if the guarantee held, all bets are off. But the cheapest way to provide the appearance of address stability is just to use stable addresses. â†Šī¸Ž

  3. Or, in fact, it's quite possible to define a library type that provides a non-address-stable reference with small-object optimization. â†Šī¸Ž

1 Like

This thread is not about Rust anymore. So while "&mut T but you can't repurpose" is a subtype of |&mut T|.val, it is not a subtype of |&mut T|.typ. The thread is now about whether a type system with val = typ is feasible/usable. Maybe that's where the confusion comes from? (In particular, the "post-condition anti-pattern" makes a compiler incorrect if val = typ. Rust doesn't have this issue because of the nice trick of having typ ⊊ val for some well-chosen types.)

Again, the thread is not about Rust anymore. The Rust incorrectness was fixed with typ ≠ val. What remains is the lack of some subset of optimizations due to this gap between typ and val. The thread is whether we can theoretically do better (it doesn't need to be a future Rust edition and may just be a future language) and whether it's even useful in practice (do those optimizations really matter?), although I'm more interested in the theoretical question, because practical questions just matter now and we don't know what may happen in the future.

Yes exactly. That's one of the points I'm arguing.

There's a link to a discussion at the Summer 2019 meeting. ISO operates under the Chatham House Rule, so the minutes are not public. Other than that, the paper seems to have stalled AFAICT (it is in a "needs revision" state).

Take the perspective of the compiler. And it's a modular compiler, it sees a function but doesn't see the callers (because those are in a different crate). All it knows is that the function has been proven correct somehow.

Under your proposal, if this is an unsafe function, that means the compiler knows absolutely nothing. Sure, if the precondition was false that would make it useless, but then maybe it is useless? No way for the compiler to tell. Or maybe the precondition is "this number must be 42", or "this number must be a prime", or whatever. This means that it is impossible for the compiler to do any interesting optimizations in this function, because there is no meaningful proof obligation that the programmer has to satisfy. If tell you "there exists some proposition that is true", then you will not be able to deduce any fact from that. (Logically speaking: ∃ (P: Prop), P is a tautology.)

In contrast, under the scheme Rust actually uses, the compiler still has no clue what the actual precondition is, but it knows for sure that the function will not cause UB under that precondition. And this kind of UB-freedom is the only assumption the compiler is actually allowed to make. This works because UB is defined directly within the operational semantics.

You misunderstood. This has nothing to do with expressiveness. It has to do with making choices. When we defined RustBelt we inadvertantly made some choices that made GhostCell impossible. When we verified GhostCell we hence had to go back and readjust some core parts of our semantic type system. A more expressive logic would not have made any difference. It's impossible not to make choices since the space of semantic type systems is so much bigger than the space of syntactic type systems.


@CAD97 You have a footnote regarding the terminology "syntactic" vs "semantic". Unfortunately there does not seem to be a way to quote footnotes so I will reply with a quotation: this is standard terminology. "syntactic" refers to everything that is defined purely on the syntax of the language, which includes the type system. The type system after all is just a fairly straight-forward induction over the syntactic representation of the program. "semantic" refers to the operational semantics and everything that involves it, everything that involves the "behavior" of programs. If you are into denotational semantics, then the "denotation" is what makes things semantic; the type system works on the syntactic representation, not its denotation. (Often ill-typed programs do not even have a denotation.)

Admittedly, sometimes type systems are referred to as "static semantics" (vs "dynamic semantics" for operational semantics). I am not a big fan of that terminology. But specifically when talking about type systems both in terms of a "type checking algorithm", and analyzing the actual runt-ime invariants upheld by a type system, then the standard terminology is "syntactic typing" vs "semantic typing". See for example these lecture notes or this article.

This goes back (at least) to John Reynolds (1983):

type structure is a syntactic discipline for enforcing levels of abstraction

This is from one of the most influential PL papers of all time.

Benjamin Pierce also uses this terminology in "Types and Programming Languages", one of the standard textbooks:

A type system is a syntactic method for automatically checking the absence of certain erroneous behaviors by classifying program phrases according to the kinds of values they compute.

1 Like

It knows what the type says. For example, within an unsafe fn foo(p: *const u8) the compiler knows that p is a *const u8.

If you mean a higher-order function like fn foo(f: unsafe fn(*const u8)) then it knows that f is a function that takes one argument which must be a *const u8 but doesn't need to support all such *const u8 (compared to if it were taking fn(*const u8)). This means that the compiler can't call f on arbitrary *const u8 values unless it was done by the programmer (but that wasn't the case anyway due to possible side-effects, the compiler would need to know whether f is pure or not and that's anyway currently not part of Rust type-system).

The programmer has to satisfy the type because the compiler sees the type. It's just the documentation that it doesn't see. This is the same in Rust. If an unsafe function takes &[u8] assuming it's UTF-8, the compiler doesn't know the slice is UTF-8 and can't optimize based on that. This is working as intended, to improve the situation new types need to be introduced to capture those documentations (e.g. &str).

I see that in the Rust community, UB is used to mean 3 different things and that's confusing:

  1. UB as defined by the calculus (that's the real UB)
  2. UB as defined by the type system, usually called language UB (that's primitive-type semantics)
  3. UB as defined by the programmer, usually called library UB (that's custom-type semantics)

I think in your sentence you speak about language UB. This is what the semantics of types are. So you're essentially saying the same as me, the compiler can use the type semantics to optimize. It's just that in your case (Stacked Borrows) you define the semantics of types using an operational semantics.

Mmm, I think I have an idea what's going on (I'm still gonna read the whole article instead of just the intro and conclusion). I was working under the assumption that I control the type system. I guess what happened with GhostCell is that you tried to match existing Rust type system with RustBelt. Out of all possible sets of type interpretations that satisfy the typing rules, you chose one that doesn't account for some particularities (within unsafe probably). So you had to revise it.

I don't think this argument matters in my case (and more generally in theory) since type systems must come with a semantics (to be able to support sound escape hatches). Rust being a practical language, just came up with some type checker without really thinking about the semantics it's implementing. That's also something that makes the current situation in Rust complicated, and why the task of RustBelt is much more complicated than if it was designing the language. It's some kind of reverse engineering. Essentially, RustBelt is trying to accommodate what the compiler and programmers are doing instead of designing for what they want to do.

What I am saying is: the information content of saying "foo is semantically well-typed" is 0 bits. It's about as useful as saying that "42 is even". It's true, but I learned nothing that I didn't already know before. Every unsafe fn is semantically well-typed (under some precondition).

UB is all about giving information to the compiler so that it can optimize better, and for unsafe fn, saying that it is semantically well-typed fails to achieve this goal.

The first item is language UB. The second item does not exist, the type system defines no UB.

Indeed UB exists on the language level and library level, and I am not happy with that terminology. As far as I am concerned, we have only discussed language UB in this thread.

It's more fundamental than that. It's a common pattern that there are two libraries that internally use unsafe, and having either one of them is perfectly sound, but having both is clearly unsound. (The oldest and most famous example of that in Rust is the pre-1.0 scoped thread API, vs Rc/Arc.) Whenever that happens, Rust needs to deliberate to figure out which library we want to allow.

By defining a semantic type system, you are giving an answer to all such possible conflicts, because at most one of these two libraries can be semantically well-typed. And, crucially, you are doing that without even knowing what the things are that you are ruling about, skipping all the case-by-case deliberation. In other words, you are inadvertantly making tons of choices -- choices like allowing GhostCell, vs allowing some other library that is fundamentally incompatible with GhostCell. (For GhostCell I do not know of such a library, but that doesn't mean none exists.)

1 Like

I see. I was not talking about whether foo is semantically well-typed. I was talking about foo having a specific semantics. The compiler can use that information. And then I was mentioning that the semantics of an unsafe function is much larger than the semantics of its equivalent (i.e. same signature) safe version. This means the compiler cannot assume too much about what an unsafe function does, since it can do much more than a safe function, but it can still assume something, which is defined by the semantics of its type (without documentation). And if we use Stacked Borrows to define the semantics of pointer types like *const u8 and the unsafe function takes such pointer as argument (regardless of the additional restrictions of its documentation) then the compiler can assume that the function won't do anything outside what is possible on such type (except in Rust this uses validity semantics instead of the actual semantics of the type). I suspect this is the same as what you say but looking at it from a different angle.

What you call UB here, is what I call the semantics of the type (actually typing) at that program point (i.e. |Γ âŠĸ Ī„|). This is why I believe we say the same thing with different vocabulary.

Ok so that's a big vocabulary problem. In particular if you think the second item does not exist. That's exactly the information the compiler is using to optimize in my design (which I believe is the usual PL design). A concrete example would be:

  • true.0 is calculus UB because you can't project the first component of a boolean (no such reduction rule).
  • &mut *ptr::null_mut::<u8>() is language UB because |&mut u8| doesn't contains null pointers.
  • NonNull::new_unchecked(ptr::null_mut()) is library UB because of the documentation.

But I guess the distinction comes probably from the fact that you include references in the calculus, while for me they are part of the type system. In other words, &mut *ptr::null_mut::<u8>() is just a cast of the null pointer from *mut u8 to &mut u8 and has no trace after type erasure. In other words, the program in the calculus is just 0 which doesn't have UB (yet). However because of the wrong cast, it ill-typed, which means it doesn't belong to the semantics of that type. The rest of the program that relies on that type semantics may have UB (e.g. the next dereference of the &mut u8).

Yes, and I think that's a good thing. Sure you will do mistakes (it's impossible to get the design right from the start, it's probably impossible to ever get it right), but because your type system is well defined, you can just add new types with better semantics and migrate to them (using subtyping as needed).

I guess the fundamental problem you refer to is when the way you define your semantics prevents something (for example you probably need to close your semantics under reduction and other properties that you need for the final property you want all your programs to have, usually bug-free for some definition of bug). However, if you always impose the least amount of restrictions on how you close your semantics (leaving the maximum amount of freedom to define different types), in particular just what you need to prove that well-typed programs are bug-free, then you shouldn't face the issue that the whole semantics framework has to be redone. One way to measure this property I'm talking about is to "count" how many different types you can define in your semantics framework. The more you can, the more chance it's robust.

This part of the discussion literally derived from me saying , I quote

Being "semantically well-typed" only really makes sense as a notion on a safe fn. So it is unclear what (if anything) is require for unsafe fn here. But we need to require something so that the compiler can do its optimizations.

You then responded to that here and since then I am trying to convince you that your response does not solve the problem that I posited. If you are not replying to this concern then you switched topics somewhere half-way though...

This makes no sense. Stacked Borrows is an operational model, it strictly follows the approach I have been describing for Rust above: define UB in the operational semantics.

The "semantics" of *const u8 are: it must be a pointer. End of story. There is nothing else this type promises. Of course most (but not all) unsafe fn that have a *const u8 argument require more than this, but that's extra preconditions, not inherent to this type.

If you want to define UB your way, you don't want or need Stacked Borrows, and it has to be replaced by something altogether different. I don't know by what. It would probably involve the lifetime logic.

No, it's not. The usual design in the formal study of programming languages is to define an operational semantics where terms can "get stuck" (which is another way to say they have Undefined Behavior), and then to define a type system and to prove that "well-typed terms do not get stuck" (this is called type soundness). That's also the approach I am following for Rust.

If a compiler is considered, compiler correctness is usually defined via contextual refinement, which is defined on the operational semantics. No "semantic" notion of types is involved in any of this.

If you don't ever allow programmers to write unsafe code, then this approach still gives you all the flexibility you are asking for: you can discover new type system invariants, and since every program ever written is type-checked, you can immediately exploit those invariants. Of course that does not work in Rust since unsafe code is effectively un-typed code. If you want to exploit invariants you have to define all of them upfront -- either my way, by putting UB directly into the underlying operational semantics, or your way, by defining a semantic model. Either way, every invariant is fixed and decided.

No you cannot and I said that above.

Sorry this is going in circles, so please accept this as my final words in this thread: the mistakes you make can be in the very foundation of the formal semantics. You can not just add a new type. You have to change the very notion of what a type even is, or change some fundamental part of the type system that appears everywhere. It is not possible to fix these problems in a backwards-compatible way, existing proofs have to be rechecked.

1 Like

I guess it's because your sentence has 2 different statements:

  • unsafe fn can't be semantically well-typed: to that I respond that documentation provides the semantics to unsafe functions such that the whole program can be well-typed.
  • we require something so that compilers can optimize: to that I respond that the type semantics of unsafe function (i.e. the semantics of the type unsafe fn(...) -> ... without documentation) is already enough to optimize.

In other words:

  • doc is the semantics needed for the program to be well-typed
  • typ is the semantics used by the compiler (I have doc ⊆ typ = val = opt) which is not necessarily enough for the program to be well-typed (unless doc = typ and the function is essentially safe)

Yes it would involve some form of lifetime or anything useful to describe properties of a program (ghost state etc). Those properties are part of the type system, not the semantics which should really just be sets of terms with context (e.g. memory, variable assignments, function definitions, etc), i.e. all program states that we can reach at that program point.

The reason I mention Stacked Borrows with such design is because I believe there is a way to define the same semantics as Stacked Borrows with sets of terms with context. This is quite frequent to show that the same notion can be described in different ways. Stacked Borrows probably chose an operational semantics because that's the most useful (e.g. Miri) and is possibly simpler.

Yes ultimately the compiler needs only the calculus. But having type information permits to benefit from free analysis (faster compilers, modular compilation). And most compilers use the semantics of the type (e.g. t *restrict in C and &mut T where T: Unpin in Rust, both provide some noalias property).

I guess in your case, you already changed the definition of the calculus at that point. Instead of saying that you define a higher-level calculus with more UB that trivially compiles with the identity function to the actual calculus, you say that the actual calculus is the higher-level one confusing 2 notions. So indeed, the compiler can use that since typing implies no UB in the high-level calculus. But it also means the programs that don't actually have low-level UB but have high-level UB, can't be written. You argue this is ok because if you would do it using type semantics you would also forbid some programs. That's where I'm sceptic and until I have some concrete example (showing that it's possible to define a canonical semantics regardless of type system for a non-trivial calculus where all programs without UB are in at least one semantics (not all sets are valid semantics)), I agree there's not much we can discuss since it would just be opinion against opinion without facts. I'll try to build on top of TAL and/or Cyclone since those are the closest to what I have in mind.

To reiterate, this is the main disagreement of definitions.

A program execution is either defined or it isn't. On the definition of the AM, there's no useful open middle where a program { is ill typed | violates opsem | has high-level UB } but { is still defined | has no low-level UB }.

Or rather, there is the concept of an ill-typed program; one which violates the documented library preconditions. But by definition, violating a library precondition is not something the AM itself can possibly know about.

However, the library can and does take library preconditions and in a new library version communicate them to the language and elevate violations to an AM violation.[1] The library has an open middle where it can be used in violation of the library preconditions but not in violation of the language semantics by knowledge of when the library actually relies on the invariants.

I believe this actually does satisfy your "discovered optimization". The difference with language primitives is that they don't have a library definition of what is sound to do with them distinct from the language requirements, because they are the language, and the language either defines operations on them or doesn't.

So I think the difference may come down just to how separated the AM and the semantics of primitive types are; for Ralf, myself, and Miri, the AM is the opsem of using the primitive types. Whereas (I believe) for you, the AM calculus is separate enough from the language-provided types that you can/should/expect introduce new primitive types[2], rather than just relying on combining and refining the existing primitives.


  1. A recent example fresh in my mind: Layout has had the library precondition that alignment is a power of two since forever. A somewhat recent change to std has changed the representation of Layout such that it can only represent power-of-two alignment, and providing a non-power-of-two alignment is an immediate AM fault. â†Šī¸Ž

  2. Types for which the compiler/optimizer fundamentally understand/define the opsem; (ideally) T.doc == T.typ == T.val == T.opt â†Šī¸Ž

1 Like

Yes, exactly. For me, the calculus comes before the type system (see this earlier comment). In particular, the calculus is the source of truth, can't be modified, and doesn't contain any types (which are made up and can change over time, they are just automated proofs, and over time with research we get better at automating proofs).

It is possible though to define a higher-level calculus possibly with types and more UB but otherwise preserving the operational semantics of the true calculus. This high-level calculus is just a way to forbid some programs. That's what Stacked Borrows is in my design, a high-level calculus in which all well-typed programs are running without (high-level) UB. Even if the type system were complete, some programs without (low-level) UB in the true calculus can't be well-typed because of the restrictions of the high-level calculus (and that's something different than the issue with typ ⊆ val which is just about optimization and not about valid programs that can't be well-typed, i.e. incompleteness).

The Rust AM then is the higher-level calculus which introduces more types adding shadow state and (operational) restrictions over the target concrete machine. This is the level at which the Stacked Borrows shadow state (roughly, provenance) exists.

Fundamentally, though, it doesn't matter how many levels of lowering you go through, because the level you are using defines what is a valid, defined operation, and what is UB. This is why MIR can lower to LLVM IR or GIMPLE or Cranelift IR or WASM or x86_64, despite them all existing on different AMs with different rules; we just have to ensure that the translation preserves all defined behaviors.

Whatever calculus you're programming against can't be modified, and is the only calculus which matters, because it defines what programs/executions are valid. Once defined, the calculus must be monotonic[1] and never change the behavior of an existing defined execution.

Perhaps this gives us a workable definition of "low level language:" one with surface level UB resulting from allowing you to write and compile/execute programs which are in violation of the calculus you're programming against.

I consider this topic fully resolved and will not be replying further.


  1. This is not just as simple as only adding new types and operations, because given (nearly) any calculus you can have a valid execution which is valid only because it is not valid to do an operation which would make it invalid. The canonical example would be the following WASM, which could trap if an operation to modify running stacks was added. Insert your favorite invalid operation here.

    (func
      i32.const 0
      (if (then unreachable))
    )
    
    â†Šī¸Ž

It does because each level removes control from the level below. A low-level programming language tries to have a calculus as low as possible. There are of course compromises to be made (e.g. between control and complexity), but it doesn't change the fact that if you target a high-level calculus for simplicity then you lose control.

I agree. The initial topic was resolved a long time ago when I learned about the val trick, which is an option I didn't think of (I was assuming Rust was making no compromise). Then the topic moved to whether val is necessary. This topic didn't came to a conclusion due to lack of understanding. And I think it won't make progress unless I come up with a proper calculus and associated semantics with a proof that all valid programs are in at least one semantics.

This topic was automatically closed 540 days after the last reply. New replies are no longer allowed.