That is, it takes a mutable reference to a maybe-uninitialized buffer,
and on success it gives you back a mutable reference to a subslice of that buffer
which is now guaranteed to be initialized.
It’s safe to call, but implementations will need internal unsafe blocks.
I don’t know how to extend this to vectored reads,
but I think it should take care of encapsulating the unsafety of partial buffer initialization
without needing all the complexity of ReadBuf or any changes to LLVM.
Hypothetically, the safe and unsafe Read implementations
from sfackler’s RFC
would look like this
using this interface:
impl ReadIntoUninit for MyReader {
fn read_buf(&mut self, buf: &'a mut [MaybeUninit<u8>])
-> io::Result<&'a mut [u8]> {
// Fill the whole buffer with some nonsense.
for (i, byte) in unfilled.iter_mut().enumerate() {
byte.write(i as u8);
}
// And indicate that we've written the whole thing.
Ok(unsafe { buf.slice_get_mut() })
}
}
impl ReadIntoUninit for TcpStream {
fn read_buf(&mut self, buf: &'a mut [MaybeUninit<u8>])
-> io::Result<&'a mut [u8]> {
unsafe {
// We're just delegating to the libc read function, which returns an `isize`.
// The return value indicates an error if negative and the number of bytes read otherwise.
let nread = libc::read(self.fd, buf.as_mut_ptr().cast::<libc::c_void>(), buf.len());
if nread < 0 {
Err(io::Error::last_os_error())
} else {
Ok(buf[..(nread as usize)].slice_get_mut())
}
}
}
}
Iirc, the main issue with this proposal is that you can't implement read in terms of read_buf, because casting from &mut [u8] to &mut [MaybeUninit<u8>] is unsafe. We really want an &out abstraction for this to work properly.
Could this be done by extending the Read trait with a new method like this?
pub trait Read {
fn read(&mut self, buf: &mut [u8]) -> Result<usize>;
// Name subject to bikeshedding. Let's ignore that unimportant detail.
fn read_uninit<'a>(&mut self, buf: &'a mut [MaybeUninit<u8>]) -> Result<&'a mut [u8]> {
let ptr = MaybeUninit::first_ptr_mut(buf);
let init_buf;
unsafe {
core::ptr::write_bytes(ptr, 0u8, buf.len());
init_buf = MaybeUninit::slice_get_mut(buf);
}
let len = self.read(init_buf)?;
return Ok(&mut init_buf[0..len]);
}
}
The default implementation is suboptimal because it has to initialize the whole buffer, particularly since only a portion of the buffer may be filled. But implementations that explicitly implement read_uninit could be optimal.
I know this is more or less what the Dropbox paper proposes and then discards. I'm not sure I fully understand Dropbox's concerns with the API. I think the read_uninit method could also return Result<(&'a mut [u8], &' mut [MaybeUninit<u8>])>, which is basically buf.split_at_mut(...).
As for the vectored API: I've never used vectored I/O APIs. I'm sure they're useful, but I don't think this should be abandoned just because the vectored API doesn't have an obvious solution. There are plenty of people like me that would benefit from a non-vectored API.
The Read trait is not unsafe, so we can’t assume it’s implemented as we’d expect. The implementation could read from the buffer, or return the wrong number of bytes read.
(quoting the document you linked) Since I think we'd have to make any new read_buf function be part of a new trait anyway, why not have it be an unsafe trait? Implementations inevitably need unsafe code internally, so it isn't that much of an extra requirement.
That's no obstacle to the blanket impl that would be in the stdlib, though, surely? Concretely, this compiles fine for me ...
Additionally, the method’s signature doesn’t provide enough guarantees for common cases. Imagine a read_exact method. It’s going to be repeatedly calling read2 with progressively smaller slices of the input buffer. For that to work, it needs to be the case that the returned slice comes from the start of the input buffer, not the middle, or end, or some other random region of memory. Since Read is not an unsafe trait, the read_exact implementation can’t make that assumption, and would have to manually check that the slice points at the right place.
In addition, the default implementation has no other option than to zero out the entire buffer every time before passing it to read, which imposes a signifcant overhead when using a reader that doesn't override it.
I like this (it's certainly way more elegant and causes way less churn than a whole new kind of pointer.).
However, I have one concern: in particular, I'm not sure that
is sound, because the read_buf() method would then need to make sure not to read from the uninitialized buffer. Can this be resolved without marking read_buf() as unsafe?
The various guarantee issues seem like they'd all go away if the new extension trait were unsafe; what is the obstacle to that?
the default implementation has no other option than to zero out the entire buffer every time before passing it to read
Does there need to be a default implementation of read_into_uninit? There certainly needs to be a default implementation of Read for anything that implements the new trait, but ISTM we could get away without going the other way.
Why do you think the RFC is "stuck"? I don't see anything indicating that.
I could be wrong, but it was approved for merge almost two months ago, this was immediately followed by a report of problems with the implementation, discussion since then seems not to have resolved said problem, and the current state is "merging is blocked".
It doesn't need a default implementation if it's going to be a new trait, but that makes it incompatible with anything that works with Read and doesn't want to require implementations of the new trait.
As my comment said, the issue I identified doesn't block the RFC merging. It's just needs someone with the right permissions to actually merge it.