Are the proofs in question publicly available? It would be interesting to look at them to see whether the new operation would invalidate any of the existing steps in the proof. (I had intended to prove it to be a sound extension to the memory model by quantifying over all possible implementations and showing that none could be affected, but it would probably be easier to check the existing proofs to make sure that they still work.)
https://plv.mpi-sws.org/scfix/paper.pdf is the paper with a model that comes closest to the C++/Rust memory model. Strangely they don't seem to have a link to the actual proofs, I will ask the authors about that.
Ah I found it, it was just a bit hidden: Repairing Sequential Consistency in C/C++11. The "full paper with appendix" has the proofs.
So it turns out that this paper already proves a model in which racing an atomic read against a non-atomic write produces an undefined value rather than undefined behaviour, in a very trivial way β it makes no distinction between relaxed-atomic writes and non-atomic writes at all (they compile into the same code on every system it mentions, and none of the proofs treat non-atomic writes and relaxed writes any differently). It does make a distinction between non-atomic and relaxed-atomic reads for performance purposes, but the distinction is specific to reads (the proof basically works by proving that any cycle that they want to disallow contains a raced non-atomic read, and thus would be undefined behavior under C11 β but the fact that raced non-atomic writes are also undefined behaviour under C11 is not involved in the proof).
As such, under this memory model, seqlocks should be possible to implement simply by using a relaxed-atomic read in the potentially racy step (a byte at a time for opsem purposes, but optimisable into reading all the bytes at once), because the write that it reads from can be interpreted as though it were a relaxed-atomic write (even if it were non-atomic in the source code, it would have been compiled as a relaxed-atomic write because the memory model makes no distinction between the two cases). In the situation where no race occurred, it will read the correct value; and in the situation where there was a race, it will read some arbitrary value (without violating any assumptions of the memory model proofs) which is then ignored by the program, so the actual value read doesn't matter.
That said, the paper only proves the correctness of its own compilation scheme (together with that of a few simple optimisations). Atomics have two reasons to exist (ensuring that the hardware doesn't break the semantics of the program and ensuring that compiler optimsiations don't break the semantics of the program), so in addition to showing that the memory model allows an atomic read to race against a non-atomic write (like the paper does), you would also have to show that the compiler also allows this. Fortunately, LLVM is intended to do so (its documentation specifically states that a raced read reads undef rather than producing undefined behavior), so this is another problem which turns out to already have been solved in advance.
Reading arbitrary values can be UB if the values have validity invariants, such as bool. The model would have to promise only ever reading a value that has ever been written. Or seqlocks would have to be limited to MaybeUninit or perhaps zerocopy::FromBytes types.
At the risk of rehashing one of our previous discussions, I believe that inline assembly is a perfectly valid escape hatch for SeqLock. In other words, something like this could be considered sound in the context of SeqLock:
unsafe fn racy_read<T>(src_ptr: *const T) -> MaybeUninit<T> {
let mut dst: MaybeUninit<T> = MaybeUninit::uninit();
unsafe {
core::arch::asm!(
"<read T from src_ptr into dst>",
in(reg) src_ptr,
in(reg) size_of::<T>(),
in(reg) &mut dst,
// Ideally, this also would use `readonly` and `pure`,
// but it's impossible to return a generic `T` from `asm!`
options(preserves_flags),
);
}
dst
}
Yes, we can not properly describe what the block does in terms of AM, but we can say that it reads something into dst and the resulting data can be interpreted as T (by calling assume_init) only if the SeqLock validity check has passed. The write probably also should be accompanied by an (empty) asm block to "observe" the write.
Validity of this reasoning relies on "external" knowledge of knowing that the assembly block does on the given CPU arch, i.e. we rely on racy reads being well defined on the assembly level.
Whether such operation should be or could be exposed on the language level is a separate question.
The idea is that you read into a MaybeUninit (using a special command designed to do possibly racy reads, that contains any necessary synchronization) and only assume_init it once you determine that the read is unraced.
Doing this is technically UB according to C11, but nothing seems to actually rely on the UB (the proofs about the memory model don't rely on it and the compiler doesn't exploit it either), so it would be possible to define Rust to make it not UB without needing any actual changes to anything other than documentation. This thread is, from my point of view, discussing whether it should continue to be UB or whether we should change the rules.
Storing the value as initialised (rather than MaybeUninit) without checking to see if the read has been raced on would be UB even under the current compiler, pretty much for the reasons you mention.
I don't think this is right. The model uses catch-fire semantics where any data race is UB. Where are you getting this from?
Note in particular that the proof of Lemma D.1 relies on executions with unordered non-atomic read-write pairs to be UB.
It hasn't been solved. The LLVM model is not even precisely specified, let alone proven correct.
I don't agree since I don't think you can write a sound story for this inline asm block. You might be tempted to pick the story "picks an arbitrary value that satisfies the validity invariant of T" (after proving that such a value exists), but then clients using this library could not, for soundness purposes, rely on actually getting back the value they wrote. If you come up with a story, you have to use the same story consistently for all Rust-level reasoning.
Yes they do. The paper explicitly mentions this:
The model was intended for a world with catch-fire semantics where any data race is UB, but this is actually a stronger assumption than the proofs require β the proofs only actually rely on a special case (where the read that is part of the race is non-atomic). If you change the model's semantics to "a race of a non-atomic read against any sort of write is UB/catch-fire, but a race of an atomic read against a non-atomic write reads an undefined value", then all the proofs continue to work with no changes.
So, if you change the data model to allow races of atomic reads against non-atomic writes (i.e. effectively LLVM's model), the resulting model can be proved correct by using the exact same proofs, and will have all the desirable properties that the previous model did (together with the additional desirable property of allowing you to write seqlocks and other related sorts of concurrency).
I went through all the proofs in the paper and verified that they were still valid even in the new model. (My plan was originally to come up with a new model and adapt the proofs β proving something about a new memory model from scratch is hard, but adapting existing proofs to allow for a small change in the model is often easy. In this case, it turned out to be even easier than I expected.)
This also acts as a proof of correctness for LLVM's model (although this of course doesn't prove that LLVM correctly implements that model β just that the model itself is correct).
Yes they do. The paper explicitly mentions this:
It does rely on one case of races being UB (the case where the read is non-atomic). (I get the impresion that the authors were trying to avoid relying on race-UB at all, but had to compromise, and this is why they included the section in question.) It does not rely on the case of races being UB where only the write is non-atomic. (Presumably the authors didn't realise that the weaker condition would be useful for some people, and just stated the stronger "races are UB" condition without noting that a weaker condition would still work.)
to allow races of atomic reads against non-atomic writes (i.e. effectively LLVM's model)
That is not LLVM's model. LLVM's model is that all read-write races are allowed. So this includes non-atomic reads against atomic writes, and non-atomic reads against non-atomic writes. (Well, LLVM's model is something like that. Not even LLVM themselves know what their model is, so we probably shouldn't use the term "LLVM's model", though I am guilty of that myself.)
That's still an interesting claim you are making, if it is correct. Unfortunately I lack the time to verify those proofs.
Ah, you are right β LLVM allows nonatomic reads to race with nonatomic writes too (again producing an undefined value).
You might be tempted to pick the story "picks an arbitrary value that satisfies the validity invariant of
T" (after proving that such a value exists), but then clients using this library could not, for soundness purposes, rely on actually getting back the value they wrote.
The "story" here is that we read an arbitrary value if the check has not passed, and the value of T which we wrote for the checked sequence number otherwise (i.e. the read did not race). As I wrote in my comment, we "prove" it by using external knowledge about assembler and hardware behavior. From the language point of view it's indeed an arbitrary value which has absolutely no relation to the value written previously (which is inevitable since asm! is a black box for the compiler), but IMO it's fine.
We as programmers know that we read the expected value and handle it under this assumption (e.g. by calling assume_init). To the clients we just document "pinky promise, it's the same value which you wrote previously" and it's sufficient as any other library-level safety promise.
The "stories" which we tell each other have nothing to do with how the compiler actually functions. It's just a tool to convince ourselves that the inline assembly behaves as we expect while interacting with the rest of Rust code. I know you don't like such arguments, but I can not see how asm!-based racy read could badly interact with compiler optimizations in practice (ofc only under the "black box" assumption).
The "story" here is that we read an arbitrary value if the check has not passed,
The story cannot depend on a check that has not happened yet. It has to be expressible in Rust code, after all.
The story cannot depend on a check that has not happened yet.
[citation needed] ![]()
It's the story which tell ourselves, not to the compiler. The assembly block reads some value which we know can be interpreted as T if the validity check has passed. We communicate it to the compiler by calling assume_init only after the check. Until then the value is completely opaque to the compiler, so the Schrodinger-like reliance on the post-factum check in our story is fine.
Such reasoning may not fit into the reductionist "tell me story in terms of AM" approach, but it's used in practice and by the looks of it works more or less reliably.
[citation needed]
That's not how this works. You can't just assume something is sound and expect me to cite a refutation You have to argue why depending on future choices is sound. The argument I laid out in my blog post crucially depends on the story being expressible as actual Rust code. So if you want to make future-dependent choices, you'll need to develop an entirely different framework for justifying inline assembly blocks. I look forward to your blog post and correctness argument on that subject. ![]()
The argument I laid out in my blog post crucially depends on the story being expressible as actual Rust code.
We could move the validity check into the assembly block, but honestly it feels like an unnecessary (and error-prone) busy-work to create a purer "story".
I look forward to your blog post and correctness argument on that subject.
Nah. I am more of practitioner and the practice shows that my line of reasoning works. I and some other people find my "story" to be satisfactory enough and it's fine if you find it vague or hand-wavy. It's not like your "correctness argument" is formally verifiable either, sure it's simpler and easier to believe, but it's still a way to convince others.
Unfortunately, in practice some of us have to deal with such grey areas (another example is IPC based on doubly mapped shared memory) and we can not wait for a more complex model to be developed. Sure, one day I could find myself in shoes of those C guys who curse compiler optimizations which break their UB-ridden code, but I consider probability of this happening sufficiently small, especially considering that no one presented even a contrived scenario in which my reasoning may fail.
It's not like your "correctness argument" is formally verifiable either,
Oh it definitely is.
Nah. I am more of practitioner and the practice shows that my line of reasoning works.
I am sure I can come up with counterexamples (EDIT: to the general approach of allowing the asm story to predict the future). But then you'll dismiss them as artificial.
Obviously for code you write you get to choose whatever tradeoffs you prefer for how thorough you want to be with your correctness arguments. But for project-official guidance I think we should have pretty high standards.
FWIW, I think the reason that "seqlock with inline asm for the data reads and writes" works in practice is that there is another story, one that actually works in my framework (assuming a language extension to Rust that we'll hopefully get one day): consider those accesses to be relaxed bytewise atomic. I don't know if your intended implementation uses inline asm for both the read and the write path, but if it does, then I'd say your code is correct, just your story / correctness argument isn't.
Oh it definitely is.
For it we would need to formally prove equivalence between the told "story" and the respective asm! block. Without it we are still in the realm of hand-waving.
I am sure I can come up with counterexamples
You should've started with it without assuming my dismissal. A good counter-example can eliminate a lot of unnecessary back-and-forth discussions (like MADV_FREE in the context of uninitialized memory).
consider those accesses to be relaxed bytewise atomic
Great, now you potentially break CPU soundness requirements (at least on Intel) which forbid simultaneous differently sized atomic operations on the same memory. This restriction may apply only to synchronizing atomic operations, IIRC the Intel manual was somewhat vague in this regard and Linux just ignores it, but still.
UDP: Ah, you mean that both reads and writes should be bytwise atomic.
For it we would need to formally prove equivalence between the told "story" and the respective
asm!block. Without it we are still in the realm of hand-waving.
Obviously. Your claim was that the approach is "not formally verifiable". I deny that claim, the entire thing could be formally verified, all the ingredients are there. That's in contrast to your approach which so far has not been expressed precisely enough to even enable the possibility of formally studying it.
If you had claimed that the approach is "not formally verified", I would have agreed. But that's true for basically every part of the toolchain so it's an unreasonably high bar for most of it.
Maybe I misunderstood what you mean by "verifiable". I mean "could be verified", not "has been verified". I think that is in line with the usual meaning of the "-able" suffix in English.
You should've started with it without assuming my dismissal
I have come up with counterexamples to multiple proposals that involve the program predicting the future (e.g. here and here). I cannot come up with a counterexample to the concrete proposal for "seqlock with inline asm" since that code is correct, just your story isn't. And I cannot come up with a counterexample to your more general approach since you haven't written down the exact rules of what is and isn't allowed in inline asm. Therefore my claim is that if you were to write down those rules, and if they were to permit predicting the future, I'd most likely be able to come up with a counterexample.
UDP: Ah, you mean that both reads and writes should be bytwise atomic.
![]()
I have come up with counterexamples to multiple proposals that involve the program predicting the future (e.g. here and here).
I find the first example here really interesting because I think it demonstrates a difference between two different sorts of future-prediction / angelic choice (and also think that it's effectively the same situation as the sequence-lock situation being discussed in this thread).
From my point of view, the reordering of the let b and let x lines in the the first example is allowed even with angelic choice being used, and the reason is that the sequence of events is as follows:
xis created from a pointer using exposed provenance. For the time being, the provenance is opaque, in the sense that the compiler's behaviour does not depend on it.bis chosen non-deterministically.- Depending on the value of
b,xis offset. Its provenance is still opaque (but the same as what it was previously). xis dereferenced. At this point, we determine what its provenance must have been, on the basis of which address is being accessed. It is considered to have had that provenance all along (e.g. trying to access a different allocation through a copy ofxwould be undefined behavior).
Or in other words, I don't see the angelic choices as being made immediately. Rather, I think that sort of choice works like variables in Prolog: initially nothing is known about them, as they are used it places more constraints on what they are, if the constraints can't be fulfilled simultaneously you get a failure (in Prolog) or undefined behavior (in Rust semantics). For people who haven't seen Prolog before, it lets you write this sort of program:
A = B,
write(B),
A = 2,
write(B).
and get an output saying that B's value is unknown from the first write, and an output saying that B is 2 from the second write. (The program is evaluated in order from start to end, and yet learning more about A nonetheless lets us know more about what B is/was, without any futher information being given about B directly.)
The basic idea is that just as a Prolog implementation can't make assumptions about the value of B until it sees some evidence about what the value is, a Rust compiler can't make assumptions about what provenance is obtained from from_exposed_provenance until it sees the pointer being dereferenced in a way that puts constraints on the provenance, or about whether or not an inline-assembly block actually read memory until it can prove that the read would have been a race.
I think that this is conceptually the sort of reasoning that people want to apply to both the seqlock case (where we determine whether or not we have read from the memory, depending on whether we subsequently treat the resulting value as initialised or not) and the exposed-provenance case (where we determine which provenance the pointer had, based on which addresses it is subsequently used to access). That said, I am not convinced that this sort of reasoning is, in the abstract, always correct β based on my previous attempts to prove seqlocks correct, I suspect that it's almost right, but that merely using this Prolog-style reasoning isn't automatically enough to prove the code sound, there may be other constraints too which haven't been formalised yet.
Or in other words, this sort of argument from opaqueness is probably the most natural approach to solve this sort of problem (as opposed to doing things like changing the memory model, which is easier to prove correct), and many people seem to intuitively feel that it is obviously correct, but more effort will be needed to determine whether it actually does work, and to formally prove why. (There's an attractive informal argument why it works β while the angelically-determined information is opaque and so the compiler has no evidence to constrain it, it can't make optimisations that depend on its properties β but this argument is hard to formalize properly.)
