"What The Hardware Does" is not What Your Program Does: Uninitialized Memory

Oh, right, duh. That makes sense.

2 Likes

There’s a small note in https://llvm.org/docs/LangRef.html#undefined-values:

In fact, %A and %C need to have the same semantics or the core LLVM “replace all uses with” concept would not hold.

So as I understand it, it’s less that this is an explicit choice and more that it’s a consequence of being able to do constant propagation.

2 Likes

Heh, my own words used against me. :wink: Well played!

I am not a fan of an undiscriminated use of the “anything is possible” shirt/slogan because it papers over why UB exists and why “anything is possible”. What do mysterious unicorns have to do with our compilers? I mean, the compiler won’t just generate unicorns into your program.^^

I am a fan of that article though.

It does not defy universal quantifiers. You just quantified over the wrong domain if you only considered 0..256.

Of course, that is surprising. But a “naive memory”, where every location just stores a value in 0..256, is IMO not achievable for a language that wants to be even remotely competitive in terms of performance. The post here argues this for uninitialized memory; Stacked Borrows needs to track pointer provenance which also is incompatible with a naive memory.

To recover naive memory, we would have to sacrifice all potential reference-based optimizations (say goodbye to Stacked Borrows) as well as all the already existing optimizations that are based on “pointer arithmetic cannot cross object boundaries”, as well as all the undef and poison-based optimizations LLVM does. I am content with assuming that that is prohibitively expensive even without having done the experiment.

I don’t disagree. However, the optimizations do not exist either. You said “potential optimizations”, why is that any better than a “potential abstract machine”?

For the longest time, my impression is that the optimizations came first and then someone tried to reverse-engineer an abstract machine for that. That does not work very well. I propose this time we let the abstract machine come first.

Of course, this should be guided by desired optimizations, but the abstract machine defines which one of these we can really get.

Absolutely, that is an express goal of the UCG WG.

Fully agreed. All intrinsics should have a specification in terms of the Rust abstract machine.

Btw, mem::uninitialized is not implemented with an intrinsic any more.


Oh, so the following can actually happen when using jemalloc:

  • Allocate memory, deallocate it. MADV_FREE gets called.
  • Newly allocated memory reuses that storage. The new memory gets read.
  • The kernel decides to free the page.
  • Memory gets read again, giving a different result.
  • Memory gets written, now allocating a page again and persisting the result.

Is that correct?


Actually, an optimizing compiler will go “this guy is surely not reading the value of an uninitialized variable, so I will only consider the cases where that does not happen when checking if an optimization applies”. The compiler does not actively search for UB; it assumes it not to happen. The difference is that the compiler generally does not know if UB occurred, so it cannot warn you about it.

Not having to pick a consistent value for all uses of undef certainly helps. Indeed, if creating memory would be non-deterministic, that means it would be effectful (non-determinism is a side-effect), and constant propagation can only propagate “pure” computations. This is why freeze and constant propagation do not interact very well.

5 Likes

Definitely. Although I think in this particular case it’s way easier to justify the false with the behavior I described. Otherwise the “sensible” optimization would be to assume that x < 150 and x > 120 can’t both be false at the same time and thus collapsing the logical OR expression into true.

I disagree. It is much simpler to do a pass over the entire IR and fold undef < 150 to false (this is a “peephole optimization”, very cheap to apply) than to do a range analysis that notices that this disjunction of two comparisons will always be true. Hence I am not surprised that the peephole optimization wins here.

But anyway, we are discussing why LLVM makes certain optimization choices, but that’s off-topic for my post. I think we agree that what LLVM does is legal according to the spec, and that making it illegal would put an undue burden onto the optimizer? And explaining why it is legal requires talking about the abstract machine and considering “weird” memory that tracks initialization.

8 Likes

Pretty much. This is essentially a data race, which you can somewhat get around by using volatile loads. However the “cute data structure” will definitely not work in such conditions.

1 Like

Wait, I didn’t say this was what actually happened :slight_smile: Only that it’d be easier to explain. At this point though, it doesn’t really matter much.

Yes, absolutely.

On Reddit someone linked to this Facebook presentation where they talk about actually running into this MADV_FREE issue in practice.

Wow. And here I was thinking I finally found a case where you need to consider the abstract machine as there is no other direct way to explain what is going on.^^

5 Likes

One thing I do not understand about that Facebook presentation is that they say, when the “removed” page is written to, “it is restored with its old content”. So they describe behavior that’s different from what I described above; they are first reading a 0 and let a non-0 value. How can that be, given that new pages are 0-initialized? Where is that “old content” even being preserved and why?

I think what happens is after MADV_FREE, the page table points that range to a shared read-only page of zeros. When written, the page fault handler allocates a distinct page of physical memory again, which happens to be the one it had before (or never truly freed). That physical memory was never cleared.

Picture this allocation setup:

| … | the string || \x?? | … |

Then jemallc frees the second page with madvise(.., MADV_FREE). Now it appears to the program as zeroed in that it’s unmapped from the translation table and reads are serviced by the kernel as returning zero. but has not been physically zeroed. Or as the docs put it:

Once pages in the range have been freed, the caller will see [emph mine] zero-fill-on-demand pages upon subsequent page references.

I’m not totally sure here, the docs seems slightly ambiguous. It references pages in the range without further qualifier. I take it to mean: Once any page has been freed, the whole range of pages appears zeroed (even though other pages in the range have not been freed). It may not mean this, and then I do not understand that bit of the video either. So the program might observe

| … | the string || \x00 | … |

But the documentation of madvise also states:

The kernel can thus free these pages, but the freeing could be delayed until memory pressure occurs. For each of the pages that has been marked to be freed but has not yet been freed, the free operation will be canceled if the caller writes into the page.

If jemalloc reuses the page before it has been actually freed, and writes new data somewhere else to it, it seems that it is never physically zeroed and contains its old data. We are back at:

| … | the string || \x?? | … |
1 Like

@HeroicKatora thanks for the detailed explanation! That makes sense.

I was not aware that madvise would immediately replace the pages by zeroes, and then just delay the “behind-the-back deallocation” of the physical page. I thought the page would remain mapped in its old state until the time the kernel would reclaim it. There are probably good reasons though for what is actually happening, and it explains what the Facebook string team has been seeing.

Actually the reason why facebook encountered this bug is because they use MAP_UNINITIALIZED. This allows the kernel to reuse a previously freed page without zeroing it.

1 Like

And also the kernel will only actually allocate the page once it gets written to? Is MAP_UNINITIALIZED on its own enough to cause “unstable” memory?

My understanding is that the kernel will map the zero page read-only while you’re only reading the data, but will allocate a new page once you try to read it. If MAP_UNINITIALIZED is used, this new page will not be zeroed.

So yes, I think MAP_UNINITIALIZED on its own is enough to trigger facebook’s string null terminator bug. However this requires a custom kernel patch that is not in the mainline kernel.

MADV_FREE is a different issue where a non-zero value can turn into a zero because the kernel reclaimed the page.

This article is helpful for setting a frame of reference for discussing uninitialized memory. I would like a small clarification on the following statement though;

Ruling out any operation on uninitialized values also makes it impossible to implement this cute data structure. The is-member function there relies on the assumption that “observing” an uninitialized value ( sparse[i] ) twice gives the same result, which as we have seen above is not the case.

According to the diagrams described in the link, the is-member function will operate correctly when sparse[i] provides inconsistent values for each read; the value is just Some(u8).

I think the argument being made is in terms of how the abstract machine is expected to behave when sparse[i] is uninitialized;, providing None. E.g. it’s allowed to cause is-member to return true or false for uninitialized sparse[i].

But what I read feels directly contradicting to the algorithm provided in the link. My confusion comes from the interpretation that the quoted statement is talking about Some(u8) and not None. And that’s only because it talks about values changing between successive observations.

I do not think that is the case. Consider return sparse[i] < n && dense[sparse[i]] == i. Even if the first value is less than n, the value read the second time could be bigger, and then you have an out-of-bounds read.

The code is fine with the value changing between multiple invocations of is-member, but causes UB when it changes within is-member. Or did I miss something?

I am sorry I do not understand the point you are trying to make in the rest of your post. Why the emphasis on “value”?

1 Like

Thank you! That’s exactly the point I was missing.

It begs the question, assuming sparse[i] is assigned to a variable and the variable is used in both places, then it still has undefined behavior due to potential out of bounds access. (In Rust, a panic, in C/C++ anything can happen like a page fault.)

So you also bounds-check i before using it … Does is-member still have undefined behavior, even though it can read uninitialized memory inside the array bounds? That’s the point I’m trying to make by emphasizing values.

That makes no difference! The variable will just have "value" 0xUUUUUUUU, meaning it will consist of 4 uninitialized bytes (assuming these are u32). Using that variable twice can still give you two different results.

Uninitialized memory does not get "frozen" just because you put it in a variable. If you do something like let x = *y;, then the compiler is allowed to replace every use of x by *y if it can show that there was no write overlapping with that memory; for this optimization to be correct, the act of loading memory into a variable cannot by itself be very meaningful.

Interesting! The abstract machine is more extreme than I imagined. But still enlightening. I am starting to better understand the implications, now.

1 Like