Background reading
Current situation
The std::io::Read trait is the primary way to do the I half of I/O, such as reading from a file. It looks like this:
pub trait Read {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>;
fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> { ... }
fn read_to_string(&mut self, buf: &mut String) -> io::Result<usize> { ... }
// [Other default methods…]
}
A Read impl is expected to have its read method write to the start of buf, and return the number of bytes written. A well-behave impl does nothing else, but there is nothing stopping a buggy impl from reading from buf, or from returning an integer greater than the number of bytes actually written.
For this reason, although a typical use case of Read is to read into a newly-allocated buffer, callers cannot safely call read with an uninitialized buffer in a generic context. It has to initialize the entire buffer first, for example with zeroes. Zeroing reportedly has a 20-25% overhead in somewhat realistic benchmarks. Only with a concrete type such as fs::File whose Read impl is known to be “well-behaved” can we avoid this overhead.
And this is exactly what read_to_end and read_to_string do. The default implementations spend time writing zeroes, and many Read impls in the standard library override these methods to use uninitialized memory instead.
This trick only works for Vec<u8> and String. Other buffer types are out of luck. A method I just wrote taking a generic R: Read type parameter has to be unsafe, leaving it up to users to unsure that whatever Read impl they use is well-behaved.
Proposal
Edit: comments below list several reasons why this doesn’t work at all at maintaining safety.
I’d like to add a new default method to Read that calls read after zeroing, and can be overridden by Read impls to skip zeroing whey they also make sure that their read method is well-behaved.
This trait method must be unsafe to implement, but not to call. So the reverse of unsafe fn. This is what unsafe trait does, but applying it to Read would be a breaking change, and introducing a new trait would not help with all the existing Read impls (including those outside of the standard library).
So we have to get a bit creative.
pub trait Read {
// [Existing methods…]
fn read_into_uninitialized(&mut self, buf: *mut [u8]) -> io::Result<TrustedUsize> {
unsafe {
let slice = &mut *buf;
ptr::write_memory(slice.as_mut_ptr(), 0, slice.len());
// `slice` is now full initialized
self.read(slice).map(TrustedUsize::new)
}
}
}
#[derive(Debug, Copy, Clone)]
pub struct TrustedUsize(usize);
impl TrustedUsize {
pub unsafe fn new(x: usize) -> Self { Self(x) }
pub fn get(self) -> usize { self.0 }
}
We use a raw slice (Which is totally a thing! Thanks generalized DST!) so that reading form it (or writing to it) requires unsafe to dereference it. Introducing a whole new TrustedUsize type is unfortunate, but I don’t know how else to “protect” from a non-zero return value in an impl that doesn’t write anything to buf (and so doesn’t need to unsafe’ly dereference it).
Details subject to bikeshed, of course.
Alternatives
Tokio has a different approach.
I’ve edited it to cross it out.