"What The Hardware Does" is not What Your Program Does: Uninitialized Memory

This post is about uninitialized memory, but also about the semantics of highly optimized “low-level” languages in general. I will try to convince you that reasoning by “what the hardware does” is inherently flawed when talking about languages such as Rust, C or C++.

https://www.ralfj.de/blog/2019/07/14/uninit.html (edit history)

30 Likes

Having learnt just enough VHDL to be dangerous, the idea that bits are always zero or one got pushed out; in VHDL modelling of real hardware, a single hardware bit can take on one of nine possible values.

There’s the expected 0 and 1; but there’s also weak 0 and 1 (which can be overriden by a forcing signal to 0, 1 or unknown), weak unknown and strong unknown, high impedance, don’t care and uninitialised. Of these, the last state is a simulation-only state, but the 8 preceding states are all needed to represent what can happen in a real digital chip.

8 Likes

Don’t mean to nitpick but I’m a little confused by your repeated use of the range 0…256 for values stored within a single byte. I thought it was just a mistake in one of your articles but from what I can tell you’ve written this in multiple articles and comments. So I can’t see it as a one-off typo but rather a systematic concept you hold.

Can you explain this further?

That was written as 0..256,which is Rust syntax for a range from 0 inclusive to 256 exclusive,written as [0,256) in math.

6 Likes

I have a question regarding MaybeUninit that, in the very common case where one wants to create an uninitialized array as buffer to accept data from e.g. Read, they may do this:

let mut buf: [u8; 4096] = unsafe { uninitialized() };
loop {
    let len = reader.read(&mut buf)?;
    if len == 0 {
        break;
    }
    handle(&buf[..len]);
}

IIUC, at the point where uninitialized returns, it’s already UB.

What is the right way to do this, with MaybeUninit, then? (Maybe the Read API would need some improvement?)

1 Like

This is currently unspecified. The validity invariant for integers and floating-point is where different forms of specifying this behavior are being discussed: https://github.com/rust-lang/unsafe-code-guidelines/issues/71 One of the options being discussed is specifying this to be undefined behavior.

1 Like

Wow, I had no idea! I guess that's the analog real world shining through?

@matt1985 has answered your question; I've added a short explanation in the post as well since you are the second person to ask. I guess Rust's range syntax is less widely known than I had expected. :wink:

To expand on what @gnzlbg said, there currently is no "right way" to do this with the given Read API. This use-case is one reason why I am arguing that uninitialized integers should be considered "valid" in the sense of not immediately causing UB. (Performing any operation on them, though, is still insta-UB.)

Also note that this pattern is only legal if you know the code that gets called by reader.read, and audited it to make sure it only writes into the output buffer! To quote the read docs:

Correspondingly, however, callers of this method may not assume any guarantees about how the implementation uses buf . The trait is safe to implement, before calling read . Calling read with an uninitialized buf (of the kind one obtains via MaybeUninit<T> ) is not safe, and can lead to undefined behavior.

2 Likes

Thanks for that blog post! It will definitely come in handy when teaching Rust and explaining what UB is.

I also have a humble request: could you add a short paragraph about the “freeze” idea? “Why don’t we just allow the abstract machine to read unspecified, but fixed bytes?” seems like an “obvious” solution for a layperson, like me, and the reasons why it doesn’t work (can leak passwords, is not true on the OS level due to MADV_FREE, plainly does not exist in LLVM) are far less straight-forward.

3 Likes

According to the hardware engineer helping me learn, yes, it is. In the sorts of designs I was learning about (mostly to ensure that I could debug a former coworker's FPGA enough to fix the interface to my microcontroller software), 8 of those 9 exist because digital circuits are designed to have high impedance inputs driven by either "weak" (low maximum current) or "strong" (high maximum current) outputs. The transistors in digital circuits are also kept in saturation modes - they are either fully on or fully off, and the state in between is unknown, as it will behave like both 0 and 1 depending on the exact analog characteristics of the circuit at the moment you inspect it (e.g. tiny variations in temperature, supply voltage, EMI, quantum effects etc can affect the behaviour of a circuit driven by an "unknown" input)

What you actually care about in the analog world is voltage levels and current flow, and these 8 states are just a usable summary of the ways in which transistors in a digital circuit set voltages and current flow at notes.

FWIW, DRAM is also a lot more complex than most of us software guys realise; you read a bit from DRAM by draining the capacitor that implements that bit, and seeing if it had enough charge to qualify as a 1 or a 0. You then have to immediately refill the capacitor to avoid losing the data. There's a funky gotcha here, in that there is always a state between 0 and 1, which becomes one or the other value on read. DRAM thus effectively has the LLVM poison semantics; there is an underdetermined value which, on a read or refresh in a high quality implementation, will be frozen into a 0 or a 1 at random despite being written as a half-way value. But, if your DRAM implementation is low quality, the rewrite is done from the sense state, not the output state, so you end up with unknown (poison) persisting until the bit is written.

10 Likes

2 posts were split to a new topic: Range syntax is confusing

I have added the freeze idea but not in the way you asked.^^ I added it to mention a data structure that is incompatible with "unstable" uninitialized memory, and how freeze would fix that.

I would love to have a clear, concise answer to "why is it beneficial for optimizations to make the abstract machine state of uninitialized memory distinct from any initialized bit pattern", but I do not. I know that LLVM propagates undef/poison through loads of things and that would indeed be much harder if it had to make sure that there exists a single "consistent" frozen value for any uninitialized byte, but I have no idea how much performance that really buys us.

I am not entirely surprised. DRAM does not even satisfy the digital contract that literally everything above assumes.

2 Likes

I thought you disliked that essay :slight_smile:

That's all well and good, and I have more sympathy for this view than some on this forum, but given that for Rust no such abstract machine has been formally defined yet, platform details and potential optimisations are the only thing that people can realistically talk about now. And let's face it, most Rust code is going to be compiled by naïve translation to imperative code later transformed by optimisation passes, so the whole 'abstract machine' business is in practice a fancy specification of invariants that all optimisations must uphold.

So I think it'd be nice for a change to have a discussion about what kind of optimisations do we want to support, so that we can finally come up with some sensible semantics. In particular, I'd like a more thorough explanation for the x < 150 || x > 120 example, because I find the concept of a value that defies universal quantifiers quite disturbing. I don't see a reason why I shouldn't consider this example an outright compiler bug. What is the specific sequence of optimisation passes that produces such code? How are their transformations justified?

(By the way, in the C++ example, if I replace a heap allocation with a stack allocation — which is uninitialized all the same — the function is optimised to always return true instead. This only serves to strengthen my 'compiler bug' intuition.)

And one more thing. If we want to discourage talking about target platform details in favour of abstract machine semantics, then the same thing should apply to LLVM intrinsics. Not only because it ties Rust to a particular compiler implementation, which isn't all that rigorously specified anyway, but also for the much more down-to-earth reason that relatively few people are familiar enough with LLVM internals to understand it. When someone tells me 'mem::uninitialized is just undef', I have no idea what is being said. And given the headaches arising from conflicting optimisations mentioned in the post, LLVM people don't necessarily have the best idea either.

The point is that x is not "a (particular) random bit pattern"; and in particular it doesn't need to behave consistently or as a constant value across uses. Therefore, it can yield false in both sides of the logical or, first behaving as if it were a value greater than 150, then subsequently behaving as if it were a value smaller than 120. In other words, the compiler is allowed to, at its convenience, decide to optimize a comparison expression involving an uninitialized value into a constant false, separately, in both subexpressions.

It is still definitely disturbing.

In C and C++, the answer is "because the standard says so". AFAICT there's no Rust standard, so I'm not sure what the correct answer in Rust is, but maybe we should stick to the definitions and allowed assumptions enlisted by the Nomicon.

1 Like

Forget about standards for a moment. This assembly is generated by a specific, deterministic (I assume) compiler consisting, like I said, of a frontend that generates naïve (but annotated) imperative code that does the 'obvious' thing, and optimisation passes that transform that code. The optimisation passes have to justify their transformations, i.e. prove that they maintain semantics of all code that was not UB in the first place. I want something of the form 'if this comparison returned true, that would imply condition X, which is UB and therefore cannot happen, so we can assume it returns false and fold it to a constant'; in other words, something like this. Then we can perhaps consider if this optimisation is worth the trade-offs and whether to adopt semantics that allow it, and to what extent. Mere 'because the standard says so' is not nearly enough. Why does the standard say so? Where? Does it actually? Maybe in fact the optimisation is wrong, and only happens in this case to produce code that is technically compliant.

1 Like
  1. Compiler assumes that UB does never happen in your program;
  2. If x is uninitialized, than (the standard states that) it is UB to evaluate x < 150.

Given the two statements above, if compiler can prove that x is uninitialized, it makes a conclusion that your program will not attempt to evaluate x < 150. “Will not attempt to evaluate” means that code that performs this evaluation is unreachable, and thus should be eliminated.

1 Like

@matklad makes a very good point about how MADV_FREE can introduce non-determinism even at runtime. From the man page:

madvise(addr, len, MADV_FREE)

The application no longer requires the pages in the range specified by addr and len. The kernel can thus free these pages, but the freeing could be delayed until memory pressure occurs. For each of the pages that has been marked to be freed but has not yet been freed, the free operation will be canceled if the caller writes into the page. After a successful MADV_FREE operation, any stale data (i.e., dirty, unwritten pages) will be lost when the kernel frees the pages. However, subsequent writes to pages in the range will succeed and then kernel cannot free those dirtied pages, so that the caller can always see just written data. If there is no subsequent write, the kernel can free the pages at any time. Once pages in the range have been freed, the caller will see zero-fill-on-demand pages upon subsequent page references.

Since this is used by jemalloc internally, this means that uninitialized memory returned by the allocator will non-deterministically contain either zeroes or data from previously freed memory, and the OS could decide to change this at any point. A freeze operation as suggested in @RalfJung's post would have to read and write back 1 byte in each page of the frozen data to ensure that the OS doesn't change the data behind our backs.

2 Likes

What happens in the Rust example is:

  1. The LLVM IR generated for mem::uninitialized() performs an alloca and then loads from it without storing a value. You can see this by building without optimizations.
  2. LLVM simplifies this to undef.
  3. always_returns_true is inlined into its caller, so the IR becomes the equivalent of undef < 150 || undef > 120.
  4. undef < 150 is replaced with false, an arbitrary boolean value (though interestingly, not undef itself). undef > 120 is also replaced with false. (To see what's going on in the playground, choose the "LLVM IR" output option.)
  5. false || false becomes false.

This is all on LLVM's end, so you will get the exact same result from Clang if you manage to get Clang to produce the same IR – but trivial differences in what IR the frontend generates, or a different LLVM version, may cause different results. After all, this specific outcome depends on things like the order of optimization passes (e.g. always_returns_true could have been optimized to true before inlining), and LLVM's arbitrary choice of false as the result of comparisons with undef (true would have worked just as well).

As for how it's justified –

For a C or C++ equivalent, the standard provisions that would apply are quite complicated and have changed over time, but it boils down to "executing the comparison is undefined behavior, therefore the program can do anything from that point on".

For Rust, as has been mentioned, there is not yet an official definition of what operations produce undefined behavior, but when such a definition is created it will likely follow similar principles. It sort of has to, because Rust depends on LLVM's optimizer which has those principles baked into it.

13 Likes

What I’d like to know is why undef < 150 isn’t replaced with a trap instruction?

(I understand why the spec doesn’t mandate this. As far as the spec is concerned, undef when the compiler knows that you’re reading uninitialized memory is the same as the undefined value that results when you read a memory cell at the same time as another thread writes to it, even though in the latter case the compiler doesn’t know you’ve created an undefined value. I’m curious why, if the compiler knows that the value is undef, it doesn’t throw in a trap instruction when you branch on the result of comparing with it.)

2 Likes

I was assuming the question was "why this shouldn't be considered a compiler bug". The — arguably — surprising behavior of the compiler is not a bug if and only if it is permitted by the specification of the language. No more, no less. Why the specification is what it is is a different, equally valid question.

Alright, but reading the value of an uninitialized value was UB, isn't that the whole point? An optimizing compiler at this point goes like, "hm, this guy is reading the value of an uninitialized variable, so I have the license to evaluate the expression as anything I please".

C99, §6.7.8.10:

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.

§3.17.2:

indeterminate value

either an unspecified value or a trap representation

§3.1.1:

access

〈execution-time action〉 to read or modify the value of an object

§6.2.6.1.5

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.41) Such a representation is called a trap representation.

And if this weren't enough, it specifically says in Annex J.2:

J.2 Undefined behavior

[…]

— The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.8, 6.8).

[…]

Unfortunately, I don't have a copy of any recent C++ standard at hand.

2 Likes

Slotted into the existing framework of undef (of course one could side-step the whole matter by eradicating the notion of undef, with more far-reaching consequences), such trapping would mean the optimizer would have to tread extraordinarily careful around optimizing comparisons [1]:

  • For most operations, it is infeasible to prove they don't ever get undef inputs.
  • Even if a given optimization pass can't prove the operand is undef, that doesn't mean a later pass won't be able to prove it.
  • In consequence, any comparison (without proof neither operand could ever be undef) has to be treated as potentially trapping and thus as having side effects.

This restricts a variety of useful optimization (even and especially those that don't need UB to be justified). The biggest issue is that causing a comparison to be execute at all when it previously wouldn't have becomes off-limits, or predicated on the incredibly difficult task of proving that the inputs can't ever be undef. This affects a bunch of important (even classical textbook) optimizations, so scrapping the concept of undef and all the optimizations it enables entirely looks pretty appealing by comparison.

In fact, as you can read about in section 2 (particularly 2.2 and 2.3) of Taming Undefined Behavior in LLVM, enabling "speculative" execution of UB-ful operations is (one of the) main motivation(s) for having these weird values in the first place.

[1] talking only about comparisons because that was your example; the same applies to most other arithmetic and logical operations if they'd similarly trap. I'll also only talk about undef although poison is essentially the same for the issues at hand.

7 Likes