It can't be a safe function, because types may contain uninitialized data that isn't padding (like MaybeUninit
).
Maybe there could be an auto trait for plain old data, which would exclude types with uninitialized memory? And zero_padding
would return PlainDataWrapped<T>
that implements PlainData
for the type after cleaning padding, if T: PlainData
.
I don't think the normal auto trait
mechanism works for "has no padding", because it's affected by things like alignment that are invisible to auto traits. The other direction -- safe to set to any bit pattern -- would work using auto trait
, though, since that property is true of padding.
Could freeze
intrinsic in rustc
be implemented like this?
- insert a call to an extern function LLVM knows nothing about, say
rustFreeze(p)
wherep
is a pointer/ref - LLVM will now assume that memory could have been written through
p
- add an extra hand-crafted LLVM pass suitably late in the chain to remove all calls to
rustFreeze
This will not be a true freeze
because MADV_FREE
and friends can still cause the result of reading uninitialized memory to be unpredictable, LLVM undef
style. However at least LLVM will not compile the code to NOP or make any other nasty assumptions?
I personally feel that freeze
is not the right solution to this because you're still leaking uninitialized memory into a file. This can have security implications, hence my rule of thumb:
Uninitialized memory contains either your private keys or non-deterministic random data, whichever is worse.
If you are writing a struct to a file, you really want to make sure any uninitialized padding bytes are zeroed out.
- whoever
free()
-ed RAM w/o 0-ing passwords is the guilty party - padding 4 bytes here and 3 bytes there - hopefully not that much space to leak info..
On a more serious note wouldn't it be nice to have both options?
Alternative to my earlier idea on rustFreeze
:
- write
rustFreeze
inasm
- make it aware of page size somehow
- for each page in range read first 8 bytes and write them back
This should tame even MADV_FREE
right?
LLVM could still theoretically assume that the memory will not change, e.g. by changing one load into two redundant loads and optimizing based on the assumption that both loads will return the same value. (Though it may be hard to find a case where such an optimization would be profitable, let alone actually performed.)
That would likely work. Caveats:
- It requires a mutable slice; it can't work on immutable slices because it involves an assembly-level write, and immutable slices can point to pages marked read-only. Mutable slices may be fine for your use case, though.
- It's not zero-overhead. That may be okay; we already have a zero-overhead option in the form of passing around
&[MaybeUninit<u8>]
slices, and this could complement it. - It doesn't help if you want a snapshot of memory that has another thread or process actively writing to it, as opposed to the only source of 'writes' being
MADV_FREE
weirdness. Again, that may be okay, since that use case may warrant other approaches (like the recently-discussed atomic memcpy).
Back to zero-overhead: could Rust say these are not UB for padding bytes?
libc::memcpy
libc::write
Ways to get confidence:
- examine compiled
libc
- re-implement manually in
asm
We use memcpy
(via ptr::copy_nonoverlapping
) on padding bytes all the time -- that's what happens when you reallocate a Vec<(u16, u8)>
, for example.
Could you elaborate ? Reading padding bytes as MaybeUninit<u8>
should always be fine and safe.
EDIT: thanks to @bjorn3 below for the clarification, I misunderstood.
I think @kornel meant the opposite: Reading MaybeUninit<u8>
as u8
is UB, even though it doesnt contain padding.
Invisibility of padding to auto traits is an implementation issue, not a logical problem, is it?
If the term auto trait is supposed to mean a specific behavior, then scratch that, and let's call it magic trait, so that: u8: PlainData, u16: PlainData, (u8, u16): !PlainData
.
v : Option<[u8; 8196 + 128]> = None
is the biggest problem here it seems..
-
8k+
bytes allocated - only first few written (tag)
the "tail" may well fall onto a MADV_FREE
-cursed page..
"Solution" 1
Padding always falls onto a page that also contains some non-padding bytes. Had they been surely written to (unlike the example above) MAD_FREE
wouldn't be an issue. We could have implemented a zero-cost rustFreeze
for struct
-s that don't have this problem.
Solution 2
Some operations probably can be declared not UB for v : Option<[u8; 8196 + 128]> = None
Solution 3
A non-zero-cost rustFreeze
could be implemented for non-const data.
I thought you were referring to the auto trait
syntax in nightly, yes: https://doc.rust-lang.org/nightly/unstable-book/language-features/optin-builtin-traits.html
That is not true. MADV_FREE just means freeze
is not a NOP but needs to actually touch the page (each page in the range, to be more accurate).
That's not entirely correct. For example, memcpy
will preserve all bytes' contents; a move "forgets" the bytes stored in padding.
I don't think so; it says that you get an "indeterminate value". The rules for those are somewhat unclear but my interpretation is that you can load and store them, but doing anything else is UB.
Niches cannot use padding bytes.
But what about enums? Does this check the discriminant at run-time? If yes, what about unions?
They most likely are not UB. But that is not the problem. The problem is all the reference-based wrappers around them -- and the fact that this might leak private data, of course.
Just to have that recorded here, this is based on the following function that is, I think, safe:
use std::mem::{self, MaybeUninit};
use std::slice;
pub fn as_raw_bytes<T: ?Sized>(x: &T) -> &[MaybeUninit<u8>] {
let size = mem::size_of_val(x);
unsafe { slice::from_raw_parts(x as *const T as *const MaybeUninit<u8>, size) }
}
Ahhh! So hypothetical UB-free analogues of memcpy
and write
could take &[MaybeUninit<u8>]
..
fn write<T>(..., data: &T)
wouldn't make much sense say for Vec<T>
as Ralf noted
I actually don't think libc::write
can deal with uninitialized bytes, at least under (plausible future) LLVM semantics. I don't know what the C standard says under various readings, but at LLVM IR level, uninitialized memory will hopefully be poison
(once undef
is finally dead) and:
undefined behavior occurs if a side effect depends on poison
So, in particular, writing out padding bytes to a file (with libc::write
or otherwise) would be UB.
...but you could write your own analogue in ASM; or confirm safety by reading compiled machine code for libc::write
and hope it doesn't break in the future?
In LLVM, the only place where poison
turns into UB is a conditional jump (and maybe select
). All other operations just propagate poison. So I disagree; calling write
with a buffer containing poison
should be just fine.