I think I've found a hole in BufReader's API

abonander · July 25, 2015, 1:55am

This has bitten me in the butt before and I ended up having to work around it.

I’ve run into a situation where I need to express to BufReader that I need at least X bytes in the buffer without consuming them. Currently, the only way to get BufReader to read more bytes from the underlying stream is to consume all the bytes in the buffer; this is a problem if you want more data but want to keep what you have in the buffer already.

My use-case involves searching for specific byte sequences in HTTP streams:

github.com

abonander/multipart/blob/boundary_fix/src/server/boundary.rs

use server::buf_read::CustomBufReader;

use std::cmp;
use std::borrow::Borrow;

use std::io;
use std::io::prelude::*;

use std::ptr;

/// A struct implementing `Read` and `BufRead` that will yield bytes until it sees a given sequence.
#[derive(Debug)]
pub struct BoundaryReader<R> {
    buffer: CustomBufReader<R>,
    boundary: Vec<u8>,
    search_idx: usize,
    boundary_read: bool,
    at_end: bool,
}

This file has been truncated. show original

The idea is that this Read implementation will yield bytes up until the boundary sequence, which signals the end of a string or byte-stream in the HTTP request. Then, .consume_boundary() must be called on it to move it to the start of the next field.

My problem arises when a partial read cuts off the next boundary. I can’t naively consume all the bytes in the buffer because they may or may not be part of the boundary, but I can’t get any more data unless the buffer is empty.

The only way I could fix this now is by reading bytes into a temporary buffer, but I’ve tried this before. I ended up going so far as to reimplement the functionality of BufReader as part of the BoundaryReader struct, with the additional methods I needed to make it work. The result was not pleasant to look at or work on.

I’m thinking BufRead (trait) needs a method that allows the user to express when they need more data in the buffer without having to consume it. Perhaps something like:

pub trait BufRead: Read {
    fn fill_buf_min(&mut self, min: usize) -> io::Result<&[u8]>;
}

Here, if the inner buffer does not contain at least min bytes, the implementation should perform another bulk read from the source. The user should still check if the returned buffer is long enough; if it isn’t, they can decide to retry by calling this method again.

Alternately, this can start life as an inherent method on BufReader. Then, if the functionality is desired widely enough, it can be moved to the BufRead trait.

Edit: I copied BufReader to my repo in order to implement this method on it. You can see the implementation (minus some unnecessary documentation) here:

github.com

abonander/multipart/blob/boundary_fix/src/server/buf_read.rs

use std::cmp;
use std::fmt;

use std::io::prelude::*;
use std::io;

const DEFAULT_BUF_SIZE: usize = 64 * 1024;

/// A copy of `std::io::BufReader` with additional methods needed for multipart support.
pub struct CustomBufReader<R> {
    inner: R,
    buf: Vec<u8>,
    pos: usize,
    cap: usize,
}

impl<R: Read> CustomBufReader<R> { 
    #[doc(hidden)]
    pub fn new(inner: R) -> Self {
        CustomBufReader::with_capacity(DEFAULT_BUF_SIZE, inner)

This file has been truncated. show original

After refactoring for the new semantics, the problem has been fixed, along with a couple other bugs.

bkoropoff · July 25, 2015, 6:46pm

This seems like reasonable functionality to have. The minimum value could potentially be plumbed all the way to SO_RCVLOWAT on the underlying socket if doing so is advantageous.

seanmonstar · July 25, 2015, 8:10pm

Also see this RFC: https://github.com/rust-lang/rfcs/pull/1015

abonander · July 25, 2015, 9:07pm

I did put some thought into back-compat when drafting this proposal. There’s a couple options:

Provide a default impl that just calls BufRead::fill_buf() and ignores its argument, which downstream implementors can override at their leisure. Since the API contract explicitly does not guarantee the length of the returned buffer, this wouldn’t technically be misleading.
Implement it first as an inherent method on BufReader, since that’s the struct that’s giving me problems with its buffering behavior. Later, if the method is desired widely enough, it can be moved to BufRead, optionally with the above behavior by default. This would work just fine for my purposes.

I think I’m about ready to draft an actual RFC for this. Would you like to pre-read it when it’s done?

abonander · July 25, 2015, 10:10pm

I have drafted an RFC but I have not opened a PR yet:

https://github.com/cybergeek94/rfcs/blob/master/text/0000-fill-buf-min.md

system · March 25, 2019, 8:24am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: Add `BufRead::fill_buf_min()` libs	1	1252	March 25, 2019
Add `BufReader::grow_buffer`	3	682	July 14, 2022
Extend io::BufRead to read multiple lines at once libs	4	2332	August 15, 2019
Idea aboute BufReaderWriter libs	8	1572	May 23, 2022
Add `is_at_eof()` to `BufRead` trait libs	19	2724	September 2, 2021

I think I've found a hole in BufReader's API

Related topics