There's been a lot of discussion recently about uninitialized memory in Rust, as it relates to the IO changes and more generally. Quoting from a recent RFC:
Exactly what is guaranteed by safe Rust code is not entirely clear. There are some clear baseline guarantees: data-race freedom, memory safety, type safety. But what about cases like reading from an uninitialized, but allocated slice of scalars? These cases can be made memory and typesafe, but they carry security risks.
In particular, it may be possible to exploit a bug in safe Rust code that causes that code to reveal the contents of memory.
Consider the
std::io::Read
trait:
pub trait Read { fn read(&mut self, buf: &mut [u8]) -> Result<usize>; fn read_to_end(&mut self, buf: &mut Vec<u8>) -> Result<()> { ... } }
The
read_to_end
convenience function will extend the given vector's capacity, then pass the resulting (allocated but uninitialized) memory to the underlyingread
method.
While the
read
method may be implemented in pure safe code, it is nonetheless given read access to uninitialized memory. The implementation ofread_to_end
guarantees that no UB will arise as a result. But nevertheless, an incorrect implementation ofread
-- for example, one that returned an incorrect number of bytes read -- could result in that memory being exposed (and then potentially sent over the wire).
That RFC, which is now closed, proposed a possibly too-radical change to address these concerns. I wanted to raise this issue in a less formal setting so that the various stakeholders can discuss it more broadly, and perhaps collectively produce a good solution.
As an important data point, zeroing in functions like read_to_end
can have a 20-25% overhead in somewhat realistic benchmarks.
Ideally, we would resolve the question about these IO functions by setting a policy at least at the level of std
, if not for the language/library ecosystem as a whole. Some questions:
-
Is it possible to set a policy that people will actually follow (and will not cause a mass use of
unsafe
functions to work around perf problems)? -
Is it even possible to set out a policy that can rule out a safe function like
read_to_end
while still allowing anunsafe
variant? (And not inadvertantly ruling out other things?) -
More broadly, this is all addressing just one narrow security problem. In general, what ambitions should we have to provide help here that goes beyond undefined behavior in LLVM? What are the tradeoffs, and how should we think about them?