Well, then, go to B2.3, which is more explicit about ordering relationships.
The keys here are the definition of Location and Same Location.
Location:
A Location is a byte that is associated with an address in the physical address space.
Same Location:
An effect E1 and an effect E2 are to the Same Location if one of the following applies:
- All of the following apply:
- E1 is a Memory effect.
- E2 is a Memory effect.
- E1 and E2 are to the physical address (PA), or
[...snip other conditions relevant to non-Memory effects...]
I think this is missing the word "same" before "physical address". But the point is that the definition depends on physical address, not virtual address. More broadly, everything in B2.3 talks about physical rather than virtual addresses, except for one clause related to TLBI instructions.
That means that whatever path through B2.3's thicket of guarantees is normally responsible for guaranteeing same-thread coherence, it should apply with equal force whether the accesses are using multiple mappings or the same mapping.
Okay, but what is that path? B2.3 is extremely complicated, and this is my first time diving into it. There might be a simpler way to use its rules to prove same-thread coherence. But this is what I found:
Look at "Deriving Reads-from-memory and Coherence order from the Completes-before order" in B2.3.9.2. This section gives us two sets of sufficient conditions for an effect (in this case, the load) to "Read-from-memory" another effect (in this case, the store). Which set of conditions to use depends on which effect "Completes-before" the other. We know that one of the effects must Complete-before the other, because the Completes-before order is defined as "a total order" earlier in B2.3.9.2. I will show that regardless of which Completes-before the other, the store must "Read-from-memory" the load.
First suppose that the store Completes-before the load. Then we're on this set of conditions, where E2 is the store and E1 is the load:
[quote edited to replace bullets with numbers]
If all of the following apply:
- There is a Memory Write effect E2 (possibly representing the writing of the initial value to the Location).
- E2 and E1 are to the Same Location.
- E2 Completes-before E1.
- It is not the case that E2 Completes-before a Memory Write effect E3 and E1 is the local read successor of E3.
- There is no Memory Write effect E4 to the Same Location in the Completes-before order between E2 and E1.
- There is no Memory Write effect E5 to the Same Location in the Completes-before order between E2 and a Memory Read effect E6 to the Same Location that appears in program-order before E1.
then it must be that E1 Reads-from-memory E2.
Condition 1 is satisfied. Condition 2 is satisfied because the load and store use the same physical address (again, regardless of whether they use the same mapping). Condition 3 is satisfied by assumption. Conditions 4, 5, and 6 are satisfied because we're assuming that there are no other writes to the same location. (Or if there were any earlier writes, we're assuming that they appear earlier in the Completes-before order than any of the current operations. Proving correctness in the presence of multiple writes is possible but more complicated.)
Alternately, suppose that the load Completes-before the store. Then we're on this set of conditions, again with E2 as the store and E1 as the load:
[quote edited to replace bullets with numbers]
If all of the following apply:
- There is a Memory Write effect E2.
- E1 is a Local read successor of E2.
- E1 Completes-before E2.
- There is no Memory Write effect E3 to the Same Location in the Completes-before order between E2 and an Explicit Memory Read effect E4 to the Same Location that appears in program-order before E1.
then it must be that E1 Reads-from-memory E2.
Condition 1 is satisfied. Condition 3 is satisfied by assumption. Condition 4 is satisfied because, again, we're assuming that any other writes must appear earlier in the Completes-before order. And condition 2 is satisfied by the definition of "Local read successor":
A Memory Read effect E2 of a Location is the Local read successor of a Memory Write effect E1 to the same Location if E1 appears in program order before E2 and there is no Memory Write effect E3 to the same Location appearing in program order between E1 and E2.
QED.
--
One thing about B2.3 is that it doesn't contain an explicit exception for mismatched memory attributes. I think that is a mistake. But on that subject, here is another quote to provide a third "proof" that ordering is guaranteed in your example. This "proof" is more informal and implicit than the others, but I thought it was still compelling enough to tack on. From B2.16, "Mismatched memory attributes":
When a memory Location is accessed with mismatched attributes, the only software visible effects are one or more of the following:
- Uniprocessor semantics for reads and writes to that memory Location might be lost. This means:
- A read of the memory Location by one agent might not return the value most recently written to that memory Location by the same agent.
- Multiple writes to the memory Location by one agent with different memory attributes might not be ordered in program order.
[...snip other effects...]
Accessing the same Location with mismatched attributes essentially requires multiple mappings. Therefore, this language strongly implies that in the general case where we have multiple mappings but not mismatched attributes, "uniprocessor semantics" are not "lost". And in particular, "a read of the memory Location by one agent" does need to "return the value most recently written to that memory Location by the same agent". (As long as there are no relevant writes by other agents, an assumption which is not stated explicitly.)