Background reading
Current situation
The std::io::Read
trait is the primary way to do the I half of I/O, such as reading from a file. It looks like this:
pub trait Read {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>;
fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> { ... }
fn read_to_string(&mut self, buf: &mut String) -> io::Result<usize> { ... }
// [Other default methods…]
}
A Read
impl is expected to have its read
method write to the start of buf
, and return the number of bytes written. A well-behave impl does nothing else, but there is nothing stopping a buggy impl from reading from buf
, or from returning an integer greater than the number of bytes actually written.
For this reason, although a typical use case of Read
is to read into a newly-allocated buffer, callers cannot safely call read
with an uninitialized buffer in a generic context. It has to initialize the entire buffer first, for example with zeroes. Zeroing reportedly has a 20-25% overhead in somewhat realistic benchmarks. Only with a concrete type such as fs::File
whose Read
impl is known to be “well-behaved” can we avoid this overhead.
And this is exactly what read_to_end
and read_to_string
do. The default implementations spend time writing zeroes, and many Read
impls in the standard library override these methods to use uninitialized memory instead.
This trick only works for Vec<u8>
and String
. Other buffer types are out of luck. A method I just wrote taking a generic R: Read
type parameter has to be unsafe
, leaving it up to users to unsure that whatever Read
impl they use is well-behaved.
Proposal
Edit: comments below list several reasons why this doesn’t work at all at maintaining safety.
I’d like to add a new default method to Read
that calls read
after zeroing, and can be overridden by Read
impls to skip zeroing whey they also make sure that their read
method is well-behaved.
This trait method must be unsafe
to implement, but not to call. So the reverse of unsafe fn
. This is what unsafe trait
does, but applying it to Read
would be a breaking change, and introducing a new trait would not help with all the existing Read
impls (including those outside of the standard library).
So we have to get a bit creative.
pub trait Read {
// [Existing methods…]
fn read_into_uninitialized(&mut self, buf: *mut [u8]) -> io::Result<TrustedUsize> {
unsafe {
let slice = &mut *buf;
ptr::write_memory(slice.as_mut_ptr(), 0, slice.len());
// `slice` is now full initialized
self.read(slice).map(TrustedUsize::new)
}
}
}
#[derive(Debug, Copy, Clone)]
pub struct TrustedUsize(usize);
impl TrustedUsize {
pub unsafe fn new(x: usize) -> Self { Self(x) }
pub fn get(self) -> usize { self.0 }
}
We use a raw slice (Which is totally a thing! Thanks generalized DST!) so that reading form it (or writing to it) requires unsafe
to dereference it. Introducing a whole new TrustedUsize
type is unfortunate, but I don’t know how else to “protect” from a non-zero return value in an impl that doesn’t write anything to buf
(and so doesn’t need to unsafe
’ly dereference it).
Details subject to bikeshed, of course.
Alternatives
Tokio has a different approach.