Pre-RFC: Read::read_into_uninitialized

SimonSapin · May 14, 2017, 5:50pm

Background reading

Current situation

The std::io::Read trait is the primary way to do the I half of I/O, such as reading from a file. It looks like this:

pub trait Read {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>;

    fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> { ... }
    fn read_to_string(&mut self, buf: &mut String) -> io::Result<usize> { ... }

    // [Other default methods…]
}

A Read impl is expected to have its read method write to the start of buf, and return the number of bytes written. A well-behave impl does nothing else, but there is nothing stopping a buggy impl from reading from buf, or from returning an integer greater than the number of bytes actually written.

For this reason, although a typical use case of Read is to read into a newly-allocated buffer, callers cannot safely call read with an uninitialized buffer in a generic context. It has to initialize the entire buffer first, for example with zeroes. Zeroing reportedly has a 20-25% overhead in somewhat realistic benchmarks. Only with a concrete type such as fs::File whose Read impl is known to be “well-behaved” can we avoid this overhead.

And this is exactly what read_to_end and read_to_string do. The default implementations spend time writing zeroes, and many Read impls in the standard library override these methods to use uninitialized memory instead.

This trick only works for Vec<u8> and String. Other buffer types are out of luck. A method I just wrote taking a generic R: Read type parameter has to be unsafe, leaving it up to users to unsure that whatever Read impl they use is well-behaved.

Proposal

Edit: comments below list several reasons why this doesn’t work at all at maintaining safety.

I’d like to add a new default method to Read that calls read after zeroing, and can be overridden by Read impls to skip zeroing whey they also make sure that their read method is well-behaved.

This trait method must be unsafe to implement, but not to call. So the reverse of unsafe fn. This is what unsafe trait does, but applying it to Read would be a breaking change, and introducing a new trait would not help with all the existing Read impls (including those outside of the standard library).

~~So we have to get a bit creative.~~

pub trait Read {
    // [Existing methods…]

    fn read_into_uninitialized(&mut self, buf: *mut [u8]) -> io::Result<TrustedUsize> {
        unsafe {
            let slice = &mut *buf;
            ptr::write_memory(slice.as_mut_ptr(), 0, slice.len());
            // `slice` is now full initialized
            self.read(slice).map(TrustedUsize::new)
        }
    }
}

#[derive(Debug, Copy, Clone)]
pub struct TrustedUsize(usize);

impl TrustedUsize {
    pub unsafe fn new(x: usize) -> Self { Self(x) }
    pub fn get(self) -> usize { self.0 }
}

We use a raw slice (Which is totally a thing! Thanks generalized DST!) so that reading form it (or writing to it) requires unsafe to dereference it. Introducing a whole new TrustedUsize type is unfortunate, but I don’t know how else to “protect” from a non-zero return value in an impl that doesn’t write anything to buf (and so doesn’t need to unsafe’ly dereference it).

Details subject to bikeshed, of course.

Alternatives

Tokio has a different approach.

Ekleog · May 14, 2017, 6:36pm

Hmm… What would prevent me from writing something like this?

struct MyWrongStuff {};

impl Read for MyWrongStuff {
    fn read_into_uninitialized(&mut self, buf: *mut [u8]) -> io::Result<TrustedUsize> {
        let x = SomeOKStruct {};
        x.read_into_uninitialized(buf.wrapping_offset(1)) // Now I'm returning a TrustedUsize just by using one generated by a well-behaving impl
    }
}

sfackler · May 14, 2017, 7:47pm

Rezeroing the buffer on every call by default is going to be pretty brutal performance-wise. If you have e.g. a BufferedReader over SomethingThatDoesntOverrideReadIntoUninitialized it’s going to be writing 8KB of zeroes and then reading probably way less than that.

SimonSapin · May 14, 2017, 8:04pm

Almost all impls in the standard library are well-behaved and so should override read_into_uninitialized to not do zeroing. BufferedReader in particular already has a zeroed vec![0; cap] buffer for calling the nested Read::read.

But yeah, it’s pretty heavy.

sfackler · May 14, 2017, 8:34pm

Every reasonable implementation of Read anywhere should be not reading from the buffer.

BufferedReader only has a zeroed buffer because read_into_uninitialized doesn’t already exist. If it did, we’d have to deal with the same question that everyone else would have to deal with - is it better to use read_into_uninitialized and have significantly worse behavior when interacting with code that hasn’t been updated to override read_into_uninitialize or continue using read and not take advantage of the implementations that do override it.

It seems like an unsafe trait TrustedRead {} together with specialization would be a more robust way of doing this, but that’s obviously blocked on specialization stabilizing.

Fylwind · May 15, 2017, 12:51am

This trait method must be unsafe to implement, but not to call

What would stop someone from doing this?

let buf: *mut [u8] = {
    let mut v = vec![0u8; 16];
    &mut v as &mut [u8]
    // v is dropped here
};
f.read_into_uninitialized(buf);

comex · May 15, 2017, 1:20am

It would be better to have an ‘uninitialized buffer’ trait rather than forcing every Read impl to take care of it (and use unsafe code). Something like

unsafe trait UninitializedBuffer<T> {
    fn get(&mut self) -> *mut [T];
    unsafe fn did_fill(&mut self, size: usize);
    // + safe helper methods to append, etc.
}
// example
unsafe impl<T> UninitializedBuffer<T> for Vec<T> {
    fn get(&mut self) -> *mut [T] {
        unsafe {
            slice::from_raw_parts_mut(self.as_mut_ptr().offset(self.len() as isize),
                                      self.capacity() - self.len()) as *mut [T]
        }
    }
    unsafe fn did_fill(&mut self, size: usize) {
        self.set_len(self.len() + size);
    }
}

sfackler · May 15, 2017, 1:34am

https://github.com/rust-lang/rust/pull/42002

SimonSapin · May 15, 2017, 7:57am

Ok, so my initial proposal doesn’t work at all I’ve edited it to cross it out.

@comex, How would UninitializedBuffer be used with Read?

jongiddy · May 16, 2017, 5:54am

If the read method parameter type can change, how about creating an InitializeOnReadSlice trait that provides slice-like operations, with two implementations:

A pass-through wrapper for an initialized slice which implements From<&mut [u8]>;
A wrapper for an uninitialized slice, which (partially?) initializes elements only if the slice is read, using clone of a default element.

Change read to accept mut T where T : Into<InitializeOnReadSlice> so that current caller code works.

Although all read implementations must be changed to the new signature, the unsafety is confined to the uninitialized wrapper implementation.

sfackler · May 16, 2017, 6:22am

We are not breaking every implementation of Read in existence.

zackw · May 16, 2017, 2:20pm

Please consider other “read-like” low-level APIs as well, such as recvmsg and inflate.

arielb1 · May 16, 2017, 7:36pm

I think you can save the initial proposal by using phantom lifetimes:

#![feature(unique)]

use std::marker::PhantomData;
use std::ptr::{self, Unique};

// invariant: `len` is equal to the length of the uninitialized
// buffer with the lifetime of `sticky`.
pub struct TrustedLen<'sticky> {
    len: usize,
    sticky: PhantomData<*mut &'sticky ()>
}

impl<'sticky> TrustedLen<'sticky> {
    fn get(self) -> usize { self.len }
}

pub struct UninitializedBuffer<'sticky, T> {
    ptr: Unique<T>,
    capacity: usize,
    len: usize,
    sticky: PhantomData<*mut &'sticky ()>, // IMPORTANT: invariant lifetime
}

impl<'sticky, T> UninitializedBuffer<'sticky, T> {
    // ... all non-resizing Vec methods, including `to_raw_parts`
    pub fn push(&mut self, value: T) {
        if self.len == self.capacity {
            panic!("ran out of space")
        }
        unsafe {
            let end = self.ptr.as_ptr().offset(self.len as isize);
            ptr::write(end, value);
            self.len += 1;
        }
    }

    /// important: this borrows the UninitializedBuffer for 'sticky, so you can't change the
    /// length afterwards (this can also return a &mut [T] if that is desired).
    pub fn trusted_len(&'sticky mut self) -> TrustedLen<'sticky> {
        TrustedLen { len: self.len, sticky: PhantomData }
    }

    // requirement: ptr..ptr+capacity must be modifiable memory for the duration
    // of this function
    pub unsafe fn with<F>(ptr: *mut T, len: usize, capacity: usize, f: F) -> usize
        where F: for<'s> FnOnce(&'s mut UninitializedBuffer<'s, T>) -> TrustedLen<'s>
    {
        let mut this = UninitializedBuffer {
            ptr: Unique::new(ptr),
            capacity: capacity,
            len: len,
            sticky: PhantomData,
        };
        f(&mut this).get()
    }
    
    pub fn with_vec<F>(vec: &mut Vec<T>, f: F)
        where F: for<'s> FnOnce(&'s mut UninitializedBuffer<'s, T>) -> TrustedLen<'s>
    {
        unsafe {
            // if we want to allow *shrinking* our Vec, we need to zeroize the length on a panic
            // but ATM it can only lead to leaks.
            let len_ = Self::with(vec.as_mut_ptr(), vec.len(), vec.capacity(), f);
            vec.set_len(len_)
        }
    }
}

pub fn evil() {
    let mut v = Vec::with_capacity(16);
    UninitializedBuffer::with_vec(&mut v, |buf0| {
        buf0.push("hello".to_owned());
        let mut w : Vec<String> = Vec::with_capacity(16);
        UninitializedBuffer::with_vec(&mut w, |buf1| {
            buf0.trusted_len() //~ ERROR cannot infer an appropriate lifetime for autoref
        });
        panic!("game over - {:?}", w.get(0)) // or is it?
    });
}

fn main() {
    // but this isn't evil, so it works OK.
    let mut v = Vec::with_capacity(16);
    UninitializedBuffer::with_vec(&mut v, |buf0| {
        buf0.push("hello".to_owned());
        buf0.trusted_len()
    });
    println!("{:?}", v)
}

Actually, all that complexity isn’t needed - you can just use UninitializedBuffer by itself.

Rumata · May 17, 2017, 2:46am

Isn’t it simply reading into an existing Vec?

It should be possible to have Buffer type that will be similar to Vec in having pre-allocated size and also the amount of bytes actually filled, but the difference would be that the slice is allocated on stack rather than on heap, and it doesn’t allow to change allocated amount.

arielb1 · May 17, 2017, 2:42pm

Nope, because this is agnostic to the buffer's location - it can be a pre-existing Vecs allocation, a stack buffer, a field in a struct, or anything else.

system · March 25, 2019, 8:28am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Uninitialized memory	57	10174	March 25, 2019
Reading into uninitialized buffers, yet again	11	2950	January 26, 2021
API to acquire arbitrarily initialized buffer?	17	2065	January 17, 2021
Why are the Read/Write traits so different libs	10	1454	March 25, 2019
ReadBuf as part of Rust edition 2021 libs	11	1684	June 13, 2021

Pre-RFC: Read::read_into_uninitialized

Background reading

Current situation

Proposal

Alternatives

Related topics