Implications of the memory model


#1

I have been thinking about the Rust memory model, and here are my conclusions:

  • Do not mark anything as UB unless one can provide a real program (not a microbenchmark) on which the optimization that it invalidates makes a real, substantial difference that could not otherwise be easily achieved. C and C++ violate this rule, badly, which forces programmers to spend, time appeasing the compiler instead of source-level optimizations that are much more effective.
  • Especially do not mark anything as UB if it forces programs to be made slower than they would if the behavior was well-defined. A good example is strict-aliasing in C and C++: strict aliasing requires that unsigned char buffers be copied(!) before being passed to functions like strlen that take char* parameters. The copy (and the heap allocation that might be needed to hold it) are almost certainly much more expensive (in terms of execution time, much less programmer time) than passing -fno-strict-aliasing to the compiler (which, along with -fwrapv -fno-delete-null-pointer-checks, is in my CFLAGS and CXXFLAGS).
  • Do not mark something as UB if it forces contorted and difficult workarounds. The aforementioned example (strict aliasing requiring a copy) is one. Another would be disallowing a *mut pointer aliasing an &mut or & pointer, even if the raw pointer is never written to (or, in the *mut case, read from) while the reference is in scope. Other examples are probably in memory allocators, kernels, and garbage collectors.
  • Finally, make sure that anything that needs to be done can be done without invoking undefined behavior. That includes such dangerous behavior as reading an integer from a file, casting it to a function pointer, and jumping to the result! Yes, under normal circumstances this is a dangerous security vulnerability – but for a dynamic linker it might be exactly what is required. A more mundane example is jumping to JIT-generated machine code. It also includes various types being treated as essentially an array of raw bytes or machine words – see many garbage collectors.

#2

Kinda offtopic, but still - this is not true, strict aliasing rules don’t apply to char types, signed x / unsigned x pairs and in some other cases, copying the buffer is not necessary.


#3

cc @pcwalton @ubsan