Writing down binary data... with padding bytes

kornel · October 28, 2019, 11:14pm

It can't be a safe function, because types may contain uninitialized data that isn't padding (like MaybeUninit).

kornel · October 28, 2019, 11:17pm

Maybe there could be an auto trait for plain old data, which would exclude types with uninitialized memory? And zero_padding would return PlainDataWrapped<T> that implements PlainData for the type after cleaning padding, if T: PlainData.

scottmcm · October 29, 2019, 12:34am

I don't think the normal auto trait mechanism works for "has no padding", because it's affected by things like alignment that are invisible to auto traits. The other direction -- safe to set to any bit pattern -- would work using auto trait, though, since that property is true of padding.

atagunov · October 29, 2019, 12:38am

Could freeze intrinsic in rustc be implemented like this?

insert a call to an extern function LLVM knows nothing about, say rustFreeze(p) where p is a pointer/ref
LLVM will now assume that memory could have been written through p
add an extra hand-crafted LLVM pass suitably late in the chain to remove all calls to rustFreeze

This will not be a true freeze because MADV_FREE and friends can still cause the result of reading uninitialized memory to be unpredictable, LLVM undef style. However at least LLVM will not compile the code to NOP or make any other nasty assumptions?

Amanieu · October 29, 2019, 12:52am

I personally feel that freeze is not the right solution to this because you're still leaking uninitialized memory into a file. This can have security implications, hence my rule of thumb:

Uninitialized memory contains either your private keys or non-deterministic random data, whichever is worse.

If you are writing a struct to a file, you really want to make sure any uninitialized padding bytes are zeroed out.

atagunov · October 29, 2019, 1:02am

whoever free()-ed RAM w/o 0-ing passwords is the guilty party
padding 4 bytes here and 3 bytes there - hopefully not that much space to leak info..

On a more serious note wouldn't it be nice to have both options?

Alternative to my earlier idea on rustFreeze:

write rustFreeze in asm
make it aware of page size somehow
for each page in range read first 8 bytes and write them back

This should tame even MADV_FREE right?

comex · October 29, 2019, 1:26am

LLVM could still theoretically assume that the memory will not change, e.g. by changing one load into two redundant loads and optimizing based on the assumption that both loads will return the same value. (Though it may be hard to find a case where such an optimization would be profitable, let alone actually performed.)

That would likely work. Caveats:

It requires a mutable slice; it can't work on immutable slices because it involves an assembly-level write, and immutable slices can point to pages marked read-only. Mutable slices may be fine for your use case, though.
It's not zero-overhead. That may be okay; we already have a zero-overhead option in the form of passing around &[MaybeUninit<u8>] slices, and this could complement it.
It doesn't help if you want a snapshot of memory that has another thread or process actively writing to it, as opposed to the only source of 'writes' being MADV_FREE weirdness. Again, that may be okay, since that use case may warrant other approaches (like the recently-discussed atomic memcpy).

atagunov · October 29, 2019, 2:02am

Back to zero-overhead: could Rust say these are not UB for padding bytes?

libc::memcpy
libc::write

Ways to get confidence:

examine compiled libc
re-implement manually in asm

scottmcm · October 29, 2019, 2:15am

We use memcpy (via ptr::copy_nonoverlapping) on padding bytes all the time -- that's what happens when you reallocate a Vec<(u16, u8)>, for example.

gnzlbg · October 29, 2019, 10:18am

~~Could you elaborate ? Reading padding bytes as MaybeUninit<u8> should always be fine and safe.~~

EDIT: thanks to @bjorn3 below for the clarification, I misunderstood.

bjorn3 · October 29, 2019, 10:55am

I think @kornel meant the opposite: Reading MaybeUninit<u8> as u8 is UB, even though it doesnt contain padding.

kornel · October 29, 2019, 11:55am

Invisibility of padding to auto traits is an implementation issue, not a logical problem, is it?

If the term auto trait is supposed to mean a specific behavior, then scratch that, and let's call it magic trait, so that: u8: PlainData, u16: PlainData, (u8, u16): !PlainData.

atagunov · October 29, 2019, 2:45pm

v : Option<[u8; 8196 + 128]> = None

is the biggest problem here it seems..

8k+ bytes allocated
only first few written (tag)

the "tail" may well fall onto a MADV_FREE-cursed page..

"Solution" 1

Padding always falls onto a page that also contains some non-padding bytes. Had they been surely written to (unlike the example above) MAD_FREE wouldn't be an issue. We could have implemented a zero-cost rustFreeze for struct-s that don't have this problem.

Solution 2

Some operations probably can be declared not UB for v : Option<[u8; 8196 + 128]> = None

Solution 3

A non-zero-cost rustFreeze could be implemented for non-const data.

scottmcm · October 29, 2019, 6:26pm

I thought you were referring to the auto trait syntax in nightly, yes: https://doc.rust-lang.org/nightly/unstable-book/language-features/optin-builtin-traits.html

RalfJung · November 2, 2019, 5:02pm

That is not true. MADV_FREE just means freeze is not a NOP but needs to actually touch the page (each page in the range, to be more accurate).

That's not entirely correct. For example, memcpy will preserve all bytes' contents; a move "forgets" the bytes stored in padding.

I don't think so; it says that you get an "indeterminate value". The rules for those are somewhat unclear but my interpretation is that you can load and store them, but doing anything else is UB.

Niches cannot use padding bytes.

But what about enums? Does this check the discriminant at run-time? If yes, what about unions?

They most likely are not UB. But that is not the problem. The problem is all the reference-based wrappers around them -- and the fact that this might leak private data, of course.

RalfJung · November 3, 2019, 11:17am

Just to have that recorded here, this is based on the following function that is, I think, safe:

use std::mem::{self, MaybeUninit};
use std::slice;

pub fn as_raw_bytes<T: ?Sized>(x: &T) -> &[MaybeUninit<u8>] {
    let size = mem::size_of_val(x);
    unsafe { slice::from_raw_parts(x as *const T as *const MaybeUninit<u8>, size) }
}

atagunov · November 3, 2019, 4:41pm

Ahhh! So hypothetical UB-free analogues of memcpy and write could take &[MaybeUninit<u8>]..
fn write<T>(..., data: &T) wouldn't make much sense say for Vec<T> as Ralf noted

hanna-kruppe · November 3, 2019, 5:01pm

I actually don't think libc::write can deal with uninitialized bytes, at least under (plausible future) LLVM semantics. I don't know what the C standard says under various readings, but at LLVM IR level, uninitialized memory will hopefully be poison (once undef is finally dead) and:

undefined behavior occurs if a side effect depends on poison

So, in particular, writing out padding bytes to a file (with libc::write or otherwise) would be UB.

atagunov · November 3, 2019, 5:03pm

...but you could write your own analogue in ASM; or confirm safety by reading compiled machine code for libc::write and hope it doesn't break in the future?

RalfJung · November 3, 2019, 5:03pm

In LLVM, the only place where poison turns into UB is a conditional jump (and maybe select). All other operations just propagate poison. So I disagree; calling write with a buffer containing poison should be just fine.

Topic		Replies	Views
WHAT-IF: Reading uninit RAM was not UB? language design	14	2350	January 25, 2021
Exploit the padding?	50	2688	September 28, 2021
`freeze(MaybeUninit<T>) -> MaybeUninit<T>` for masked reads language design	32	2468	November 16, 2022
Make a way to avoid UB when accessing byte representation of type with padding	6	1077	November 15, 2020
Blog post: Uninit Read/Write libs	1	763	March 7, 2022

Writing down binary data... with padding bytes

Related topics