Writing down binary data... with padding bytes

It can't be a safe function, because types may contain uninitialized data that isn't padding (like MaybeUninit).


Maybe there could be an auto trait for plain old data, which would exclude types with uninitialized memory? And zero_padding would return PlainDataWrapped<T> that implements PlainData for the type after cleaning padding, if T: PlainData.

1 Like

I don't think the normal auto trait mechanism works for "has no padding", because it's affected by things like alignment that are invisible to auto traits. The other direction -- safe to set to any bit pattern -- would work using auto trait, though, since that property is true of padding.

Could freeze intrinsic in rustc be implemented like this?

  • insert a call to an extern function LLVM knows nothing about, say rustFreeze(p) where p is a pointer/ref
  • LLVM will now assume that memory could have been written through p
  • add an extra hand-crafted LLVM pass suitably late in the chain to remove all calls to rustFreeze

This will not be a true freeze because MADV_FREE and friends can still cause the result of reading uninitialized memory to be unpredictable, LLVM undef style. However at least LLVM will not compile the code to NOP or make any other nasty assumptions?

I personally feel that freeze is not the right solution to this because you're still leaking uninitialized memory into a file. This can have security implications, hence my rule of thumb:

Uninitialized memory contains either your private keys or non-deterministic random data, whichever is worse.

If you are writing a struct to a file, you really want to make sure any uninitialized padding bytes are zeroed out.

  • whoever free()-ed RAM w/o 0-ing passwords is the guilty party :slight_smile:
  • padding 4 bytes here and 3 bytes there - hopefully not that much space to leak info..

On a more serious note wouldn't it be nice to have both options?

Alternative to my earlier idea on rustFreeze:

  • write rustFreeze in asm
  • make it aware of page size somehow
  • for each page in range read first 8 bytes and write them back

This should tame even MADV_FREE right?

LLVM could still theoretically assume that the memory will not change, e.g. by changing one load into two redundant loads and optimizing based on the assumption that both loads will return the same value. (Though it may be hard to find a case where such an optimization would be profitable, let alone actually performed.)

That would likely work. Caveats:

  • It requires a mutable slice; it can't work on immutable slices because it involves an assembly-level write, and immutable slices can point to pages marked read-only. Mutable slices may be fine for your use case, though.
  • It's not zero-overhead. That may be okay; we already have a zero-overhead option in the form of passing around &[MaybeUninit<u8>] slices, and this could complement it.
  • It doesn't help if you want a snapshot of memory that has another thread or process actively writing to it, as opposed to the only source of 'writes' being MADV_FREE weirdness. Again, that may be okay, since that use case may warrant other approaches (like the recently-discussed atomic memcpy).
1 Like

Back to zero-overhead: could Rust say these are not UB for padding bytes?

  • libc::memcpy
  • libc::write

Ways to get confidence:

  • examine compiled libc
  • re-implement manually in asm

We use memcpy (via ptr::copy_nonoverlapping) on padding bytes all the time -- that's what happens when you reallocate a Vec<(u16, u8)>, for example.

1 Like

Could you elaborate ? Reading padding bytes as MaybeUninit<u8> should always be fine and safe.

EDIT: thanks to @bjorn3 below for the clarification, I misunderstood.

I think @kornel meant the opposite: Reading MaybeUninit<u8> as u8 is UB, even though it doesnt contain padding.

1 Like

Invisibility of padding to auto traits is an implementation issue, not a logical problem, is it?

If the term auto trait is supposed to mean a specific behavior, then scratch that, and let's call it magic trait, so that: u8: PlainData, u16: PlainData, (u8, u16): !PlainData.

v : Option<[u8; 8196 + 128]> = None

is the biggest problem here it seems..

  • 8k+ bytes allocated
  • only first few written (tag)

the "tail" may well fall onto a MADV_FREE-cursed page..

"Solution" 1

Padding always falls onto a page that also contains some non-padding bytes. Had they been surely written to (unlike the example above) MAD_FREE wouldn't be an issue. We could have implemented a zero-cost rustFreeze for struct-s that don't have this problem.

Solution 2

Some operations probably can be declared not UB for v : Option<[u8; 8196 + 128]> = None

Solution 3

A non-zero-cost rustFreeze could be implemented for non-const data.


I thought you were referring to the auto trait syntax in nightly, yes: https://doc.rust-lang.org/nightly/unstable-book/language-features/optin-builtin-traits.html

That is not true. MADV_FREE just means freeze is not a NOP but needs to actually touch the page (each page in the range, to be more accurate).

That's not entirely correct. For example, memcpy will preserve all bytes' contents; a move "forgets" the bytes stored in padding.

I don't think so; it says that you get an "indeterminate value". The rules for those are somewhat unclear but my interpretation is that you can load and store them, but doing anything else is UB.

Niches cannot use padding bytes.

But what about enums? Does this check the discriminant at run-time? If yes, what about unions?

They most likely are not UB. But that is not the problem. The problem is all the reference-based wrappers around them -- and the fact that this might leak private data, of course.


Just to have that recorded here, this is based on the following function that is, I think, safe:

use std::mem::{self, MaybeUninit};
use std::slice;

pub fn as_raw_bytes<T: ?Sized>(x: &T) -> &[MaybeUninit<u8>] {
    let size = mem::size_of_val(x);
    unsafe { slice::from_raw_parts(x as *const T as *const MaybeUninit<u8>, size) }

Ahhh! So hypothetical UB-free analogues of memcpy and write could take &[MaybeUninit<u8>]..
fn write<T>(..., data: &T) wouldn't make much sense say for Vec<T> as Ralf noted

I actually don't think libc::write can deal with uninitialized bytes, at least under (plausible future) LLVM semantics. I don't know what the C standard says under various readings, but at LLVM IR level, uninitialized memory will hopefully be poison (once undef is finally dead) and:

undefined behavior occurs if a side effect depends on poison

So, in particular, writing out padding bytes to a file (with libc::write or otherwise) would be UB.

...but you could write your own analogue in ASM; or confirm safety by reading compiled machine code for libc::write and hope it doesn't break in the future?

In LLVM, the only place where poison turns into UB is a conditional jump (and maybe select). All other operations just propagate poison. So I disagree; calling write with a buffer containing poison should be just fine.