Writing down binary data... with padding bytes

wasn't this thread mostly about user-space behavior?
e.g. what would our app do?
rather than what would kernel do?


please correct me if I'm wrong..

..my latest idea was write method in our app would probably behave if it used &[MaybeUninit<u8>] right until the syscall; the only question left then would be if the kernel would behave too

This thread was about the Rust spec. There is no kernel there, just "externally observable behavior".

I don't think anyone doubts that both proposals made here are implementable -- when linking code at the assembly level, uninit memory degenerates to arbitrary bit patterns (or your secret key, whatever is worse). So the kernel will be fine. And anyway it would be a serious kernel bug if we could irritate it by putting "bad" data into the write syscall.

When discussing the spec, things like what the hardware does or what the kernel does are not very relevant.

3 Likes
  • aren't only specs that we can and will to implement worthy of consideration?
  • what can be efficiently implemented depends on capabilities of real-world kernels
  • ergo I would argue discussing them is relevant; even on this thread

There are other options for a kernel: kill the app/raise a signal/return an error.

If at least some of 32/64bit Linux/FreeBSD/OpenWRT/Windows/... interpreted uninit pages as garbage and not an error it would make sense to create an IO interface like

fn unixWrite(fd : u16, buf : &[MabyeUninit<u8>])

and to give it the semantics we discussed before. Job done. @HadrienG's quest fulfilled.

Right? Bonus points for considering it as the main Rust IO interface :smiley:

I think instead of playing with potentially uninitialized data and trying to define how it should behave, a better approach would be to find a good way to introduce function for zeroizing padding bits and getting &[u8; size_of::<T>()] as was proposed in the beginning of this discussion. Of course such functionality has to be integrated with the language, but it also should be extensible for user-defined types (e.g. to structs which contain unions). So I guess it should be some kind of unsafe trait, with a magic derive using which the compiler will generate code for zeroing out padding bytes. It would introduce some runtime overhead, but I think it would be a better fit for Rust.

1 Like

Wouldn't that have overhead though that might be undesirable? If I have and array of structs with a couple of hundred fields in each struct where there is maybe 1/4 to 1/3 of the bytes are padding for alignment purposes (which might not be unrealistic once you get into things that aren't just interacting with hardware). Do I want/need to pay the overhead of zeroing the padding bytes before I serialize the 1020 GB Array I have in memory? And, by serialize, I mean start handing pointers to the OS to blocks of the memory which will do a DMA transfer to a high-throughput I/O device. Do I care that the padding contains random bits? I might, but, I also might not. Is it "Safe" it I don't care? I would argue yes.

1 Like

You still can transmute reference to your struct or convert it to a raw pointer *const u8 and get its length using size_of. You will walk a very gray area and it would be indeed nice to clarify language behavior for such cases, but in my opinion the zeroizing approach will cover most of use cases in a reliable and easy to reason about way, so it will be a worthwhile addition to the language.

UPD: One interesting alternative could be to introduce a marker trait which would force Rust to zeroize memory before placing a struct into memory reserved for it, so padding bytes and bits would be initialized to zero. It would violate the property "move is always a simple memcpy", but it may be not too bad, since it will become "move is always a simple memcpy or for types which implement the PlacementZeroize trait it is memset(ptr, 0, n) followed by a memcpy".

Such addition will be a good match for a trait which would force compiler to zeroize memory after data was moved or dropped. This would be a really-really great feature for cryptography applications, which want to reliably erase secrets from memory, but can not do it right now because users can move those secrets (i.e. even if we zeroize secrets on Drop, they will stay intact in old memory locations).

3 Likes

I have the impression that this thread starts to be running circles a bit between the "treat uninitialized data as nondeterministic bytes" point of view and the "providing a way to zero out padding bytes" point of view.

I have yet to see a proof that either of these solutions can resolve all use cases (perf-conscious ones and security-conscious ones), so it may be the case the we ultimately need to have both solutions available.

If so, it would be great if we could somehow layer one on top of the other, i.e. have the unsafe "dump maybe-uninitialized bytes to disk" primitive, the possibly-safe "zero out padding bytes" primitive, and maybe combine them into a safe "dump binary data to disk with zeroed out padding bytes" API which is the recommended one when max performance is not a concern.

1 Like

To answer your question directly: using a raw syscall (libc::syscall(__NR_write, fd, ptr, len)) will work since the kernel uses volatile loads to read the input data (it actually uses inline asm, but the effect is the same in that undef is frozen).

Any wrappers above the syscall (e.g. libc::write, File::write) will probably work in practice, but their API doesn't actually guarantee that they don't touch the input data before passing it on to the kernel (which would be UB if there is undef in the input).

3 Likes

I understand any existing wrappers. Since new ones can be written to be fully safe. Hmm.. Windows?..

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.