Role of UB / uninitialized memory

hanna-kruppe · June 14, 2017, 1:21am

The idea of treating heap memory and stack memory differently w.r.t. UB is an interesting one. It goes completely against my intuition, but it took me a while to understand why. I believe the reason why stack and heap memory should be the same in this regard is that all memory should be interchangeable. I don't mean that the optimization discussed earlier (demoting mallocs) should be legal, I mean that I want to choose how I allocate memory solely based on considerations like allocation lifetime and performance and availability (e.g. whether I even have a heap), not by language lawyering about UB.

Especially when I'm doing low-level, unsafe work where I treat something as a bag of bits and do dangerous things with it, I don't want to care where in memory those bits reside. I have enough trouble getting the actual bit bashing right. I don't want to track down a misoptimization caused by moving a temporary buffer from a Vec to a stack-allocated array (or the other way around).

The virtue of treating heap and stack alike also shows elsewhere: Your proposed memcpy64 is only well-defined (under the proposed "reading uninit heap memory is okay") if it copies from a heap allocation into another heap allocation. IIUC, the inline(never) is supposed to force the compiler to conservatively assume the heap-heap case, but that's not how this works. Language semantics must be defined independently of "what the compiler can see" and if you say "reading uninitialized bits from the stack is UB", then calling memcpy64 with stack buffers as arguments is UB, period. (And this will inevitably be exploited by a compiler author trying to squeeze out a few percent out of a benchmark suite.) So a rule that distinguishes stack and heap memory makes it actually harder to write a correct memcpy, because it's easy to write something that's only UB for stack memory (i.e., where it's even harder to observe the UB).

Of course, one could invent arbitrary semantics that are aimed at preventing certain reasoning steps by the compiler. For example, one solution would be to draw a line at function boundaries (perhaps only those that are inline(never)) and say that pointer arguments are to be treated "as if" they were referring to heap memory. Besides being very inelegant and throwing the whole object model into disarray, this will inevitably have fallout for the compiler's ability to analyze and optimize. So I am not a fan of this approach either (at least in principle, I won't rule out that there's some twist that is relatively simple and brings much advantage.)

PS:

That can easily happen in overly general code spread over multiple functions. But aside from the specifics, it doesn't sound very appealing to rule out optimizations just because they sound like good code shouldn't need them.

Topic		Replies	Views
Memcpy and uninitialized memory Unsafe Code Guidelines	20	3398	December 22, 2024
Safely reading uninitialized memory	25	3106	March 25, 2019
"What The Hardware Does" is not What Your Program Does: Uninitialized Memory	40	6408	October 17, 2019
Uninitialized memory	57	10311	March 25, 2019
Mem::uninitialized, `!` and trap representations language design	56	6823	March 25, 2019

Role of UB / uninitialized memory

Related topics