Add `BufReader::grow_buffer`

fintelia · April 15, 2022, 1:31pm

Right now, the BufReader type provides no guarantees about the minimum length of the buffer returned by buffer / fill_buf. And contrary to the name, fill_buf doesn't actually increase the size of the internal buffer unless it is currently empty. This can be rather frustrating for some cases when you want to decode from it without adding their own second layer of buffering. A slightly contrived example:

struct ReadHashed<R: Read> {
    reader: BufReader<R>,
}
impl<R: Read> ReadHashed<R> {
    fn read(&mut self) -> std::io::Result<u64> {
        let buf = self.reader.fill_buf()?;
        if buf.len() == 0 {
            Err(Error::from(ErrorKind::UnexpectedEof))
        } else if buf.len() >= 64 {
            let h = hash(&buf[..64]);
            self.reader.consume(64);
            Ok(h)
        } else {
            // XXX: I guess we have to do our own second level
            // of buffering here? This would be even worse if we
            // didn't know a precise number of bytes to pass to
            // read_exact.
            let mut v = vec![0; 64];
            self.reader.read_exact(&mut v)?;
            Ok(hash(&buf[..64]))
        }
    }
}

This situation could be alleviated with an additional function on BufReader (exact return type TBD):

impl<R: Read> BufReader<R> {
    /// Increase the size of the internal buffer if 
    /// there is more input and it isn't already full.
    fn grow_buffer(&mut self) -> Result<usize> {
        if self.buffer().len() == self.buf.len() {
             return Ok(0); // internal buffer is full
        }

       // Actual code has to deal with uninitialized memory, 
       // but this is the general idea:
    
       if self.pos != 0 {
           self.buf.copy_within(self.pos..self.cap, 0);
           self.cap -= self.pos;
           self.pos = 0;
       }
       let n = self.inner.read(&mut self.buf[self.cap..])?;
       self.cap += n;
       return Ok(n);
    }
}

jkugelman · April 15, 2022, 2:15pm

By "increase the size of the internal buffer", I think you mean "read more data into the internal buffer", yes?

A BufReader's buffer isn't intended to grow like a Vec does. It's a fixed size array with a certain capacity.

If you want to consume 64 bytes at a time you don't need to—and shouldn't—access the buffer directly. fill_buf and consume are low-level methods that one shouldn't normally need to call. You can instead have all the reads use read_exact and not worry about the state or size of the buffer at all.

impl<R: Read> ReadHashed<R> {
    fn read(&mut self) -> std::io::Result<u64> {
        let mut v = vec![0; 64];
        self.reader.read_exact(&mut v)?;
        Ok(hash(&v))
    }
}

This means that ReadHashed doesn't even need to require the reader be a BufReader. It could very well just operate directly on the original R: Read object and let the user decide if buffering is required. That way if they've already got a BufReader it wouldn't be wrapped in another layer of buffering by ReadHashed.

struct ReadHashed<R: Read> {
    reader: R,
}

fintelia · April 15, 2022, 2:58pm

A BufReader 's buffer isn't intended to grow like a Vec does. It's a fixed size array with a certain capacity.

I'm not imagining growing the buffer beyond the initially created capacity. Rather, if there's currently or 1 or 2 bytes left in the internal buffer, I'd like to be able to peak further into the stream without consuming those bytes first.

If you want to consume 64 bytes at a time you don't need to—and shouldn't—access the buffer directly. fill_buf and consume are low-level methods that one shouldn't normally need to call. You can instead have all the reads use read_exact and not worry about the state or size of the buffer at all.

The example is a bit contrived, but the point I'm trying to get at here is wanting to avoid the extra allocation + copy and be able to handle cases where the number of bytes to consume might not be so predictable.

Or viewed differently: I'd like to implement a type basically with the same functionality as BufReader. It should read big chunks from its input without necessarily knowing in advance the total input length, should be able to operate on the un-consumed data read so far in a single contiguous buffer, and should periodically consume prefixes of the input to pass along to a user.

One option (as you point out) is to simply implement this functionality myself. However, increasingly this is going to look just like a re-implementation of BufReader with the one additional method I'm proposing.

system · July 14, 2022, 2:59pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: Add `BufRead::fill_buf_min()` libs	1	1252	March 25, 2019
Add `is_at_eof()` to `BufRead` trait libs	19	2603	September 2, 2021
`size_hint` for `std::io::Read` libs	17	1291	May 30, 2022
I think I've found a hole in BufReader's API libs	5	2155	March 25, 2019
Extend io::BufRead to read multiple lines at once libs	4	2304	August 15, 2019

Add `BufReader::grow_buffer`

Related Topics