I think I've found a hole in BufReader's API


#1

This has bitten me in the butt before and I ended up having to work around it.

I’ve run into a situation where I need to express to BufReader that I need at least X bytes in the buffer without consuming them. Currently, the only way to get BufReader to read more bytes from the underlying stream is to consume all the bytes in the buffer; this is a problem if you want more data but want to keep what you have in the buffer already.

My use-case involves searching for specific byte sequences in HTTP streams:

The idea is that this Read implementation will yield bytes up until the boundary sequence, which signals the end of a string or byte-stream in the HTTP request. Then, .consume_boundary() must be called on it to move it to the start of the next field.

My problem arises when a partial read cuts off the next boundary. I can’t naively consume all the bytes in the buffer because they may or may not be part of the boundary, but I can’t get any more data unless the buffer is empty.

The only way I could fix this now is by reading bytes into a temporary buffer, but I’ve tried this before. I ended up going so far as to reimplement the functionality of BufReader as part of the BoundaryReader struct, with the additional methods I needed to make it work. The result was not pleasant to look at or work on.

I’m thinking BufRead (trait) needs a method that allows the user to express when they need more data in the buffer without having to consume it. Perhaps something like:

pub trait BufRead: Read {
    fn fill_buf_min(&mut self, min: usize) -> io::Result<&[u8]>;
}

Here, if the inner buffer does not contain at least min bytes, the implementation should perform another bulk read from the source. The user should still check if the returned buffer is long enough; if it isn’t, they can decide to retry by calling this method again.

Alternately, this can start life as an inherent method on BufReader. Then, if the functionality is desired widely enough, it can be moved to the BufRead trait.

Edit: I copied BufReader to my repo in order to implement this method on it. You can see the implementation (minus some unnecessary documentation) here:

After refactoring for the new semantics, the problem has been fixed, along with a couple other bugs.


#2

This seems like reasonable functionality to have. The minimum value could potentially be plumbed all the way to SO_RCVLOWAT on the underlying socket if doing so is advantageous.


#3

Also see this RFC: https://github.com/rust-lang/rfcs/pull/1015


#4

I did put some thought into back-compat when drafting this proposal. There’s a couple options:

  • Provide a default impl that just calls BufRead::fill_buf() and ignores its argument, which downstream implementors can override at their leisure. Since the API contract explicitly does not guarantee the length of the returned buffer, this wouldn’t technically be misleading.

  • Implement it first as an inherent method on BufReader, since that’s the struct that’s giving me problems with its buffering behavior. Later, if the method is desired widely enough, it can be moved to BufRead, optionally with the above behavior by default. This would work just fine for my purposes.

I think I’m about ready to draft an actual RFC for this. Would you like to pre-read it when it’s done?


#5

I have drafted an RFC but I have not opened a PR yet:

https://github.com/cybergeek94/rfcs/blob/master/text/0000-fill-buf-min.md