Loads and stores to/from outside the memory model

I think that in a language as young as Rust, where many important "implementation details" are still in flux it's important to keep a clear distinction between "what is UB" (i.e. language does not specify what happens, hardware catching fire is standard-compliant) and "what current compiler+hardware implementations do".

For this, I'll steal @comex's nice "in theory" vs "in practice" wording.

UB is a purely theoretical concept. The fact that hardware is allowed to catch fire, does not mean that it will. Sometimes, compilers and CPUs provide stronger guarantees than programming languages alone do. The danger, of course, being that this aspect of hardware & compiler behaviour may not be subjected to any future compatibility guarantee.

In theory, concurrent volatile accesses are UB because in the C++11 memory model, concurrent non-atomic access to memory is a data race (notice that volatile is not part of this definition), and data races are UB.

In practice, the purpose of volatile's existence is to enable memory-mapped IO, which is 1/outside of any sane programming language memory model and 2/a form of concurrency between the CPU and other hardware. So volatile accesses are unlikely to result in surprising behavior in current compilers + hardware, and the standard's definition of a data race should arguably be amended accordingly to allow concurrent volatile accesses too.

This paper leans heavily on an "in practice" point of view. One of its core arguments is that if large C/++ codebases like the Linux kernel assume volatile to mean a certain thing, then any compiler that understands it to mean something different breaks the large codebases, and will thus be deemed evil by programmers.

As a result, compilers are forced to follow the large codebases' world view in practice, and one can clarify this by integrating this world view into the C standard (thusly turning what is currently undefined behavior from the language's PoV into the reasonably well-defined behavior that it actually is from the implementations' PoV).

We should be wary of blindly applying such reasoning to Rust, though, because...

  1. Rust is not C/++. It has already diverged from C/++ semantics in major areas such as struct layout, alias analysis and enum exhaustive-ness, and may diverge further from those semantics in the future.
  2. Rust is much younger than C/++. There are less established large Rust codebases out there, and therefore rustc still has some wiggle room for changing implementation details before developers start demanding that they be set in stone.
  3. Rust only has one widely used compiler, which means that its semantics were subjected to much less scrutinity than those of C/++.

For an example of the dangers of using C knowledge on Rust...

...notice that the world view of LLVM and Rust diverges from that of the C language here.

The C language has volatile variables and pointers, which come with the "no extra read" restriction that you have in mind. But Rust only has volatile accesses, which resemble accesses to volatile variables in C, but importantly do not forbid the compiler from introducing extra non-volatile accesses to the variable at hand.

As far as I know, said property can only be achieved through much discipline, e.g. making sure that no reference to the "volatile variable" is ever created (which turns simple things like accessing struct fields into unsolved language design problems).

Make sure you truly need this guarantee, as it will be painful to provide in Rust.

UB is a language property, it is not architecture dependent. It only means that the language spec doesn't specify what happens when a program does a certain thing. Sometimes, additional implementation knowledge of the compiler, OS and hardware will allow you to tell what happens in certain circumstances, but the UB is still there, as the behavior may still be different on a different compiler/OS/CPU.

Now, with that theoretical consideration out of the way, I fully agree with @comex's "practical" view here :

Without allowing some form of concurrent access, volatile isn't even good enough for its original use case of memory-mapped I/O. Therefore, "concurrent volatile is UB" language semantics are too weak and "effects of concurrent volatile are implementation-defined" would be better.

Is it allowed to create an infinite loop (which is UB) where there wasn't one before, though? I would personally find this surprising, but then again, "infinite loops are UB" is a C/++-specific concern and we're mostly discussing Rust here...

As @comex pointed out, the Itanium situation is even more complicated (it depends if the data is in RAM or in a register). But overall...

  1. UB is not hardware-specific. Future compiler updates are free to break UB-ish code, if they can find a good reason to do so and can convince their user base.
  2. All current compilers and hardware will do what you want.

So one reasonable stance would be to advocate for some reasonable wording about concurrent volatile access to be merged into the Unsafe Code Guidelines as a life insurance, and be done with that.

Unfortunately, I don't know enough about Java's guarantees to comment on this. But in C++11, coalescing two consecutive relaxed reads to a single variable into one is definitely mostly legal, except in peculiar cases like when it could break forward progress by e.g. creating an infinite loop.

Going a bit faster...

...adding new writes to a shared memory location is generally forbidden in the C++11 memory model, for the reason that you state, so as long as the compiler correctly identifies your memory as shared (I guess that means there's an UnsafeCell somewhere in Rust) you don't need to worry about it.

I'm not sure if I understand, the parenthesis seems to contradict the original statement.

As @comex points out, excessive granularity is the problem which byte-wise atomic memcpy is meant to solve, but it may not immediately resolve the SIMD issue due to current LLVM/rustc limitations.

Unfortunately, here, we face the problem that volatile is about memory-mapped I/O and precise hardware control, which will always kill optimization. This is why I think unordered atomics would be a more promising direction.

2 Likes