There’s another ingredient typically used when doing refinement – not only does the context have to be safe, it also has to use the “hole” (i.e., the unsafe code) at a particular type. The same piece of code can be safe at one type and unsafe at another type. The type, in some sense, describes the interface between the safe and the unsafe code, it says what the safe code is allowed to do. (This is btw. how I often describe the virtue of Rust to type system people – even though the type checker is fairly limited in what it can understand, and there’s lot of code you can’t write if you follow its rule, the type system is actually pretty expressive without being excessive in terms of number of concepts or so. It can concisely describe complicated contracts like “As you iterate over the Vec, you must not mutate it” or “After you release the Mutex, you must no longer use any pointers you acquires to its interior while you held it”. Then we can put those types at the holes of a safe context and fill the hole with unsafe code that we merely claim has this type; the type checker can check that the user follows the interface we write without having any clue about the subtle invariants it expresses. To me, that’s the true power of Rust.)
In all the papers I read on contextual refinement, P ⊑ Q means "Every behavior of P is also a behavior of Q" / “P has fewer behaviors than Q” (which is why P is “more refined”), so it corresponds to ⟦P⟧ ⊆ ⟦Q⟧ and the two inclusion actually match up (assuming those semantic brackets give the set of possible behaviors of the program). For example, see page 7 of this paper.
A typical way to use this would be start from a “rough sketch” of the program you want, maybe just pre- and postconditions, and then you stepwise refine this until you got a program with the efficiency you want. (There’s people literally doing that in a proof assistant, with the proofs going all the way up to some assembly code.) Clearly, if you do this, you’d only want to remove behaviors and hence make sure that your final program actually satisfies the spec you started with. So, “more refined = fewer behaviors”.
However, I “grew up” in a world dominated by operational, small-step semantics. It is possible that domain theorists use an inverted notation.