Writing down binary data... with padding bytes

hanna-kruppe · November 4, 2019, 10:18am

I am puzzled by this assertion. The PLDI'17 paper proposed that, but you specifically talk about LLVM in the present tense, and I quoted the part of the language reference which says otherwise. I could be persuaded that this clause does not apply to write specifically (e.g., maybe you want to argue an opaque call to write does not depend on the bytes in the buffer), but it seems plain as day to me that there is UB from poison beyond branching/selecting on it. For example, store volatile i8, i8* %p, poison is UB since:

"Values other than phi nodes depend on their operands", so the store instruction depends on poison.
Volatile accesses have side effects.
As quoted before, "undefined behavior occurs if a side effect depends on poison"

RalfJung · November 4, 2019, 12:50pm

You yourself assumed LLVM-without-undef -- as in, LLVM with (some variant of) the PLDI'17 paper implemented -- in your post. I just followed suit. Maybe I should have been more clear.

I forgot one case: the address of a load/store must also not be poison, or else we have UB.

I see. I agree the most likely interpretation of the lang ref implies that this is UB. On the other hand, I do not know of anything LLVM would or could reasonably do that would rely on this, nor does it seem to make any sense to call this UB. So given that the lang ref is generally pretty bad about documenting UB, I wouldn't put too much weight on it here, either.

We were talking about write though, not volatile stores. Given that I can memcpy poison, I'd be very surprised if I couldn't also write it. Sure, it becomes a bit annoying to have to talk about this in the spec and to have to define how poison "looks like" when observed by the target environment -- but specifying that seems much, much easier than many other aspects of this spec, and we anyway need to talk about how to translate "Rust Abstract Machine memory" to "target memory" because of things like pointer provenance.

hanna-kruppe · November 7, 2019, 1:23pm

The current LangRef subsumes this under "operations which have UB on any input" (which also includes division and remainder, for example). That seems sensible both for lowering to machine code and for IR-to-IR transformations:

immediate UB (as opposed to producing undef or poison) often exists because on some machines certain operations on certain inputs can screw up the underlying machine state (e.g., trapping) which precludes the speculative execution that undef and poison enable
replacements like poison -> freeze poison -> (any specific constant) should be sound, and clearly can trigger any IR-level UB that happens for some concrete choice of inputs

I agree that LangRef is often sloppy about these matters, but making side effects depending on poison UB seems sensible to me. Side effects such as writing a value out to a file bring back (to a lesser extent) the "different uses see different values" semantics which play a part in making undef annoying to work with. For example, if %x in the following snippet turns out to be poison, it is possible under reasonable lowerings to machine code that the two putchar calls output different things.

  call i32 @putchar(i32 %x)
  call i32 @putchar(i32 %x)

I don't think this would be inconsistent (with a suitable definition of poison "looks like" when observed for these purposes) or break any middle-end optimizations but when I work on an optimization pass I'd much rather not bother worrying about such cases and declare them UB. That allows me to think about poison values solely as intermediate values internal to the abstract machine. Note that memcpy is not a side effect and not externally visible so these questions don't arise.

IMO the onus is on you to argue why we should bother making it defined, when programs where it matters could just freeze values and memory selectively. The more usual case is that a program has UB from the start and some of that UB is converted to "deferred UB" in the form of poison.

zackw · November 7, 2019, 1:43pm

This sounds like a ridiculous thing for a compiler to do, but they do do it. It's called "rematerialization" in the literature. Suppose, right before register allocation, you have a sequence of IR operations like this:

load.w v1, (v0)
test v1
beq .early_exit
# ... many ops here, requiring many scratch registers ...
# ... v0 is used but v1 isn't ...
move arg0, v1
call somefunction

When there aren't enough hardware registers to implement the "many ops here" sequence while keeping both v0 and v1 around in registers, the register allocator may choose to discard v1 and re-load it from (v0) right before it's needed again, instead of spilling it to the stack and re-loading it from there. This makes the stack frame smaller, which is often considered valuable. And it's a correct transformation as long as nothing writes to the memory location (v0) in between the two loads...

RalfJung · November 9, 2019, 9:30am

I agree that they can output different things, but I do not agree that this brings back the bad parts of undef. This just means that things implicitly get frozen each time they are passed to the outside world.

While writing padding bytes to a file is probably a bad idea in many cases, at least for languages like LLVM I don't think they should be opinionated like that. There is no benefit that I know of that LLVM can glean from ruling out poison in data passed to a syscall, and one can imagine use-cases (say in controlled environments) where this is not an unreasonable thing to do.

Oh they totally are, we just freeze everything that leaves the abstract machine. That's what I meant above when I referred to "translating" Abstract Machine memory to whatever the syscall/FFI sees. We need such a translation anyway to handle pointer provenance, so it's not like ruling out poison here means we can pretend the outside world works directly with Abstract Machine memory.

I cannot imagine a situation in which this makes any optimization more complicated: you have to handle unknown function calls to other LLVM modules, and those certainly can pass poison around. If you handle that correctly, you will automatically handle syscalls correctly, too.

IMO the onus for UB is basically always on whoever proposes the UB. There's no question of "how do we make it defined"; to make it UB we have to add an extra clause in our list of things that are UB -- and that clause needs justification. If there's no good reason for UB, we shouldn't have it. I think I refuted the only argument you gave (one of simplicity).

I'd like to put the emphasis on sounds here. IMO this is a completely reasonable thing for compilers to do in certain circumstances, and you mentioned one. (I think that was you point, but the first sentence sounds like the opposite.)

hanna-kruppe · November 9, 2019, 10:41am

The "implicitly get frozen each time" way of putting it is exactly why I worry. Duplicating a freeze instruction is unsound, so does that mean that duplicating instructions with side effects has the same issues? I can sort of see an argument that these "implicit freezes" have only one "use" each (the side effect itself, which must not be duplicated in any program trace even if it's syntactically duplicated by e.g. sinking a call into conditionals) so it should be fine.

But this seems to depend on what side effects are, formally, and whether they are at all malleable under optimizations. For example, C compilers sometimes replace printf("...\n") with puts("...") and puts calls with putchar calls, and depending on how you defined the interface between the abstract machine and the real world and where you draw the line (e.g., do you peek into the libc?), these three calls can perform "different" sequences of side effects. Normally, care is taken that such changes don't affect the observable behavior, e.g. by reasoning that side effect A + side effect B is the same as side effect C. But if each individual side effect step freezes poison anew, that reasoning may not hold any more and previously justified optimizations may introduce more observable behaviors. I'm not saying the specific case of printf/puts/putchar is a miscompile waiting to happen (clearly, we don't see such issues in practice with undef), but I hope you can see why I think the argument is more subtle than you claim.

atagunov · November 9, 2019, 3:26pm

...the worries about formal semantics of write (syscall wrapper) can be alleviated by writing it in ASM right? less radically one can disassemble compiled libc to make sure there is currently no UB and hope for it not to change in the future? and rewrite in ASM if an issue is discovered?

you could still declared selected functions not UB by your own decree in language/library spec and then undertake to keep it that way come what may LLVM do?

RalfJung · November 13, 2019, 9:21am

Well, you obviously cannot duplicate instructions with side-effects as then you'd run the side-effect twice.

It seems rather obvious to me that this transformation is also still correct with implicit freeze. Notice that duplicating a freeze whose result is unused is entirely sound.

It is slightly more subtle than I expected, true. But it still seems overall mostly straight-forward to me and not nearly as subtle as some of the other things we'll have to tackle, such as how pointer provenance interacts with transmutes.

No. Then we'll have to give formal semantics to the interaction of a Rust program with an ASM program, which is strictly harder.

hanna-kruppe · November 13, 2019, 10:14am

But the frozen value is not unused, it's visible in the program output. So, as a clear-cut but unrealistic example:

a call to a function that outputs the same byte twice -- freezing it implicitly, once -- will always output the same value from 0..=255 twice
two calls to a function that outputs a single byte -- freezing it implicitly each time -- may output two different values
this makes output_twice(x) -> output_once(x); output_once(x); an incorrect transformation, even though it would be sound without poison or implicit freezing

The only reason I can't construct a semi-realistic example of this from a printf -> putchar transformation is that the printf syntax doesn't let you refer to the same argument multiple times.

CAD97 · November 13, 2019, 1:54pm

That would just be loop unrolling, right?

fn unoptimized(maybe_poison: c_char) {
    for _ in 0..2 {
        libc::putchar(maybe_poison);
    }
}

fn unrolled(maybe_poison: c_char) {
    libc::putchar(maybe_poison);
    libc::putchar(maybe_poison);
}

That said, I think the original also freezes the maybe_poison twice, so this would be a valid transformation. It's key to note that the implicit freezeing happens at the point of IO, so IIUC there is no way to observe the implicitly frozen value except by writing it to optimization boundary IO and then reading it again. (Which, if I'm not mistaken, is approximately how freeze would be defined/implemented in the face of e.g. MADV_FREE (modulo optimization).)

hanna-kruppe · November 13, 2019, 4:29pm

No, loop unrolling is fine for the reasons you described. What I'm talking about is optimizations that replace a (sequence of) side effects with another sequence that would be equivalent but isn't is poison is implicitly frozen for each individual side effect. I actually realized now there's even existing syscalls that semi-realistically might exhibit this: vectored I/O or gather/scatter I/O.

For example, this Linux program snippet (which surely has mistakes since I haven't tried it):

struct iovec bufs[2] = { {&c, 1}, {&c, 1} };
writev(STDOUT_FILENO, bufs, 2);

... makes a single system call to write the same character (from the same memory location, even) twice. It seems valid for a compiler might replace that with:

char buf[2] = { c, c };
write(STDOUT_FILENO, buf, 2);

which is really all-around better: it makes the same number of syscalls, uses a simpler and presumably faster syscall, and even consumes less stack space. But if the semantics of syscalls are to freeze memory, then this can e.g. output ab (because there's two bytes being frozen) while the first version can only output aa, bb, etc.

I don't claim this is an optimization that any compiler actually implements, but it seems like a reasonable optimization to me (at least, no less reasonable than other optimizations of standardized I/O functions) and I believe it is only correct if side effects depending on poison are UB.

CAD97 · November 13, 2019, 5:02pm

I think you can still argue that that optimization is correct in the face of no-UB poison IO.

If c is poison, then you have a guarantee that two instances of &c point to the same memory location, but not that they point to the same value! In effect, because you pass &c to writev twice, it observes the poison two separate times, thus reads two unconnected indeterminate values.

HeroicKatora · November 13, 2019, 5:26pm

If the function is safe as written, it would be very sweet to have such fact codified in the standard library.

hanna-kruppe · November 13, 2019, 5:26pm

This depends on how exactly you formalize this implicit freezing, so once again, note that all of this is awfully subtle.

My working assumption was that one would map the abstract machine heap to a huge byte array by picking concrete addresses for pointer values and for allocations whose adress has not yet been fixed, freezing all poison, and filling non-allocated address ranges arbitrarily. Then clearly both iovecs would point at some concrete frozen byte. I don't really see (yet) what else one could do to get the behavior you propose.

atagunov · November 13, 2019, 11:09pm

I stand corrected

..wasn't poison for cases like

LLVM replaced i : i32 with i : i64
so i <= i32::max_value() no longer holds

?

MADV_FREE does not yield this kind of poison
it creates undef: new value each time you read it

Perhaps undef is poison that gets freeze applied on each use?

Summary

This text will be hidden

Under this semantics the suggested vectorized IO optimization is legitimate

Apologies for naivety, does LLVM actually optimize IO? E.g. treat write* as non-opaque?

CAD97 · November 13, 2019, 11:30pm

The long term plan is to soft deprecate undef in LLVM and use poison instead, because reasoning about correct optimization in the face of undef semantics has shown to be nearly impossible. That, and LLVM's internal assumptions about undef are inconsistent due to how loosely specified it is in practice.

poison's semantics are, IIRC, that any calculation based on poison results in poison, and any conditional dependent on poison is allowed to take an arbitrary choice. Furthermore, said choices have no requirement for any kind of consistency. (There is almost certainly some subtlety I've missed here.)

The MADV_FREE issue means that freeze's lowering cannot just be a nop. In the virtual semantic machine, freeze takes a value and, if it is poison, gives it an arbitrary non-poison value. Further use of the frozen value is consistent. (Specifically, reading the same memory location can return different results without the program doing anything at the asm level of abstraction.)

Anything down to the syscall level can be optimized by LLVM (especially with LTO). Actual syscall (intrinsic) usage is allowed to be adjusted so long as LLVM can prove the as-if rule.

undef is not new value each time you read it (that's poison, roughly). undef is basically if you touch it, your program has UB and it's time to welcome the nasal demons. (In practicality, LLVM will assume the branch is impossible and remove it as aggressively as possible.)

atagunov · November 13, 2019, 11:51pm

@CAD97, thx!

@rkruppe:

one could formalize treatment of uninit memory passed to write* as poison that gets a fresh freeze applied on each use of each byte right?
std could guarantee this semantics either via recourse to (inline?) ASM or by relying on LLVM being reasonable e.g. not branching based on content of buffers passed to syscall intrinsic etc?

RalfJung · November 16, 2019, 12:37pm

The semantics of output could easily be that they freeze memory after reading it.

The semantics of each syscall needs to be defined separately anyway, so what you did is argue convincingly that writev should be specified to read-and-freeze each vector separately, instead of trying to be smart and deduplicate things.

No, we cannot let the outside world observe so much about the entire abstract machine heap. That would prohibit a lot of optimizations.

The biggest problem is that we are having this discussion in English, which (like most/all natural languages) is really not suited for precise nitty-gritty details like we are talking about here right now. This is what math is made for. But we have a lot of ground to cover to build up the necessary vocabulary to talk about all of this in a more precise way.

atagunov · November 21, 2019, 11:22pm

Seems it would be possible to create a version of write which takes &[MaybeUninit<u8>] and passes that as a pointer to a syscall.

Question: will the operating systems always be okay with receiving an uninitialized page within a buffer passed into such a syscall?

Searching online suggests Valgrind will get angry. But will it be right to get angry?..

RalfJung · November 22, 2019, 8:19pm

Exactly has has been intensively discussed in this very thread. (And without reaching a firm conclusion.)

Topic		Replies	Views
WHAT-IF: Reading uninit RAM was not UB? language design	14	2291	January 25, 2021
Exploit the padding?	50	2596	September 28, 2021
`freeze(MaybeUninit<T>) -> MaybeUninit<T>` for masked reads language design	32	2290	November 16, 2022
Make a way to avoid UB when accessing byte representation of type with padding	6	1060	November 15, 2020
Blog post: Uninit Read/Write libs	1	759	March 7, 2022

Writing down binary data... with padding bytes

Related topics