This is a fun argument to have, but this is not the right place to be having it.
For other features, ranging from volatile
and asm
to boring old FFI, I think it's possible to formally specify their behavior, but I don't know if you can call the manner of specification "operational". You have to define a mapping between Rust Abstract Machine and the lower-level process state, and say that at certain points they're required to "sync up" to a certain extent.
For example, suppose you write some data to a buffer, then pass that buffer as an argument to a system call. The kernel obviously has to be able to read the data you wrote to the buffer. But it doesn't know about the Rust Abstract Machine; it has its own, lower-level model of process memory*. For the Rust program to behave correctly, the compiler needs to guarantee that when you perform the FFI call, the lower-level memory state contains data at the buffer's address corresponding to the data you wrote there within the Abstract Machine. The abstract and lower-level states aren't always the same; they can diverge due to compiler optimizations such as reordering writes or eliminating redundant ones. But they have to sync up when you perform the call.
But none of that matters in this case, because buffer clearing – at least in the form of C's memset_s
, the proposed secure_clear
for C++ that was mentioned, or any of the nonportable ways that C/C++ programs often do it in practice – is only best-effort anyway. That means we simply don't need to worry too much about precise guarantees.
Why is it best-effort? In C, if you call memset_s
or secure_clear
, the compiler may be forced to clear out some buffer in the lower-level memory state. But:
-
Any previous operations that operated on the buffer may have left traces of the data in registers, on the stack, etc. This isn't just a theoretical concern; in fact it almost always happens. Usually only parts of the buffer are leaked, e.g. the most recently accessed word or byte, rather than the entire thing; and usually the relevant registers or stack locations will be clobbered by other data eventually. But there are no guarantees.
-
On the more theoretical side of things, the compiler is generally within its rights to make multiple copies of the entire buffer, especially (but not only) if it's stored in a local variable whose address never escapes. In other words, a single buffer in the Rust Abstract Machine state can correspond to multiple buffers in lower-level state, and the memset will only clear out one of them. It's just that this is not usually a profitable optimization.
Despite those objections, best-effort buffer clearing is still a useful operation, because in practice it usually does reduce the amount of sensitive information left in memory. I think Rust should support it, but it should be clearly documented as best-effort.
Alternately, clearing the entire stack range rather than just one buffer – as has been discussed – would, I think, solve most of the practical leaks, but it's still playing with fire. If we implemented that natively in Rust (which it doesn't really need to be; it works fine as a library feature), I'd still be reluctant to call it more than best-effort, even if I were talking purely about 'how the implementation works today' as opposed to 'what we can guarantee forever'.
Now, if someone comes up with a design, based on Cranelift or something, that precisely tracks where sensitive data is stored and can truly guarantee when it's gone... that would be awesome, and much more elegant. Though even that wouldn't be perfect: even if the data is gone from userland's view of memory, it may be around in state only the kernel can see, e.g. swap files. Regardless, the delta from here to there is a lack of implementation, not philosophical questions about how low-level Rust is.
* Though note that the kernel's view of process memory is still several abstraction layers away from "the hardware", such as virtual memory, CPU caches, etc.