So I just learned that there are algorithms that involve reading from uninitialized memory. See here for an example. It doesn’t seem like it’s possible to implement this data structure in Rust because reading uninitialized bytes is always undefined behaviour. Could we maybe add an intrinsic which “initializes” a &mut [u8] by setting it to arbitrary bytes? (This would in practice be a no-op).
I don’t have a particular need for this, I just hate to think that this cool little data structure is unimplementable in Rust for such an easily-fixable reason.
It seems to me that this proposal would open a big security hole. If the sparse array is allocated on the heap, as seems likely, and that space formerly contained your credit card numbers and login username/password information for your various accounts, you’d probably be very unhappy that that information had just been exposed to the user whose program used this technique.
This is the job of the OS, not the language. For example, Linux kernel will zero out pages allocated for user space by default (see the mmapMAP_UNITIALIZED flag) unless you are on some embedded platforms with certain kernel compile-time configurations.
Rustc keeps track of initialization to avoid undefined behavior bugs – a matter of memory safety, not security.
I think technically &mut T is supposed to point to an already-initialized T… this doesn’t compile. You’re thinking of &out T (or whatever else you call it), but we don’t have that yet.
Who said that all allocations come directly from mmap? It’s not unheard of to have, say, malloc work by putting memory obtained from sbrk in a heap managed by the C library, and reusing memory that was previously freed. I haven’t looked how exactly memory allocation is done in Rust, but if it goes through a similar intermediary (and I believe it does), then taking care of deallocated sensitive data being inaccessible indeed becomes a job of the language.
AFAIK the unsafe code guidelines are anywhere near settled on this point, but making it UB is certainly the most conservative option. Besides, many reasonable notions of “undefined bytes” that allow some useful operations (e.g. memcpy such values) still behave weirdly if you try to do any sort of computations on it. For example, in LLVM every use of the undef is a different arbitrary byte pattern, and with the proposed poison, it’s UB to branch on a value derived from poison (whereas with undef it would non-deterministically pick one branch). And this is not because LLVM is being antagonistic, giving stronger guarantees would severely curtail some desirable optimizations.
Trying to carve out a well-defined subset of operations on “undefined values” is fraught with issues, so it’s prudent to make it UB.
Relatedly but distinct, something is needed to deal with padding bytes, but they could very well be special cased. Even though we certainly want to be able to initialize a struct by initializing each field separately (and thus leaving padding uninitialized), uninitialized bytes in padding could be treated differently from uninitialized bytes that are not padding.
This is precisely my concern. libstd/heap.rs contains four related memory deallocation/reallocation functions that each seem to have the potential to leak prior contents unless the released memory is overwritten as part of the release process. Here is the relevant code:
Replacing the standard underlying system allocator with a custom one that always block-overwrote released memory would be overkill. What seems desirable is some sort of indicator, perhaps via a new ZST, that specific structures (e.g., a specific vector containing sensitive information) should have its prior memory cleared whenever it is dropped or reallocated.
I have been unable to find the code for alloc_system::System, and thus unable to convince myself that this would be true for all ports of Rust, particularly for embedded ports to the various SoCs used in IoT.
Just to play devils-advocate in the hope of learning something:
Why wouldn’t a sufficiently-aware compiler see writing zeros to memory that is immediately freed as unused values, which it was free to optimize away?
With a side order of: (does any compiler actually do that?) How could you tell? It’s not something that you can write a test-case for.