Add `is_at_eof()` to `BufRead` trait

YuhanLiin · May 29, 2021, 5:18am

I don't think there exists a way to check whether an instance of Read has reached EOF without consuming data from the instance. A workaround I've been using is to wrap the instance in a BufReader and calling .fill_buf().is_empty()?. This technically does consume data from the reader under the hood but does not remove the data from the BufReader, so future reads are unaffected. It'd be nice to have a provided method like fn is_at_eof(&mut self) -> Result<bool> in the BufRead trait to indicate a way of checking for EOF.

CAD97 · May 29, 2021, 4:30pm

I assume a default implementation of .fill_buf().is_empty() would be provided? Then specific (basically all) implementations would be able to override the default to avoid the underlying read if unnecessary.

Assuming as such, this fits the "simple lib addition" criteria to just be added as unstable in a PR, and we encourage anyone to submit a PR adding it if they think it'd be useful, along with reasoning about how it's useful and meaningful for the stdlib to provide. (Though search and make sure there isn't an open PR for it already.)

zackw · May 30, 2021, 7:42pm

This is a troublesome feature because of the way the underlying I/O primitives work: read only tells you that you're at EOF when you try to read past the end of the file. The proposed implementation in terms of fill_buf might avoid the footgun that C has with its feof (see "Why is while(!feof(file)) always wrong?") but it is not apparent to me, from the documentation, whether fill_buf guarantees to return an empty buffer when and only when the "inner reader" has actually signaled EOF.

Even if it does, I'd be worried about people trying to optimize out the read and getting it wrong.

YuhanLiin · May 30, 2021, 10:54pm

fill_buf definitely is not a victim to the off-by-one bug present with feof, since it always returns an empty buffer when called on an empty reader, as can be seen here. The documentation for fill_buf states that the internal buffer is filled "with more data from the inner reader if it is empty". This to me means that if the buffer is empty, then fill_buf will always try to read more data from the internal reader to fill it, so it will never return an empty buffer as long as the reader still has data remaining. Therefore the only time an empty buffer is returned is when there's an EOF.

Faulty implementations are always a concern. but I don't believe this is any "trickier" to get right than fill_buf, which you'd also have to implement when you want to roll your own BufReader impl.

H2CO3 · May 31, 2021, 8:00am

I was going to ask the exact same thing. @YuhanLiin, what do you need this for?

YuhanLiin · May 31, 2021, 1:57pm

Continuously deserializing objects from a reader.

while !file.is_at_eof()? {
    let obj = rmp_serde::decode::from_read(&mut file)?;
    // rest of the loop
}

If I don't check for EOF then I get a deserialization error, and with some serde libs it's hard to tell whether that error is caused by EOF or something else.

NoamB · May 31, 2021, 3:34pm

You can call this method with 1 byte remaining in the stream and still get an error because of EOF. This new method doesn't help you here.

mbrubeck · May 31, 2021, 5:00pm

Presumably the semantics of this reader are such that a well-formed byte stream can have EOF immediately after a valid MessagePack object, but nowhere else. So at the start of the loop, EOF is not an error, but an unexpected EOF in the middle of MessagePack parsing should still be treated as an error.

YuhanLiin · June 1, 2021, 3:04am

That's an EOF error that occurs in the middle of deserialization, which shows that the input is invalid so getting an error is actually good. If the input is valid, then after the last object is deserialized the stream should only contain EOF, in which case having an EOF check prevents the deserializer from throwing an error.

jplatte · June 3, 2021, 10:45am

What would be the advantage of .is_at_eof() instead of .fill_buf().is_empty() in the provided example? If you're not at EOF you want to read the next bytes so AFAICT it's not at all a problem that they were read into the buffer already as part of the EOF check.

mbrubeck · June 3, 2021, 4:06pm

I think the main advantages are clarity and discoverability. It was certainly not obvious to me that .fill_buf().is_empty() is the right way to check whether a BufRead is at EOF.

H2CO3 · June 3, 2021, 6:06pm

However unclear that may be, it probably is the right solution. The EOF bit can usually only be obtained from a file descriptor by trying to read from it, so unless past-the-end reading was attempted, is_at_eof() would probably return false, resulting in the classic off-by-one error.

mathstuf · June 3, 2021, 6:43pm

Hmm. What happens with BufReader on top of stream readers (say a pipe or socket) that just don't have data yet and are in a non-blocking mode?

quinedot · June 3, 2021, 6:50pm

Error with io::ErrorKind::WouldBlock.

CAD97 · June 4, 2021, 6:05am

Nobody is asking for is_at_eof() to read the EOF bit without trying to continue to read. BufRead::is_at_eof() would just be a provided method alias for .fill_buf()?.is_empty().

It's also worth noting that many users of the BufRead trait never use fill_buf or consume, the only two required methods. Instead, they interact with the trait either by forwarding (a mut ref to) it to a consumer or by the provided read_until/read_lines methods.

While fill_buf()?.is_empty() is clearly the correct general way to check if we're at the end of the read stream, it's not something that we should necessarily require users to remember to do. We can just provide the is_at_eof() alias that does the right thing.

zackw · June 4, 2021, 1:46pm

On the whole I agree with this but I do have one concern: defining is_at_eof as fill_buf()?.is_empty() means that it can block and/or fail, which is surprising for a function whose name sounds like it just tests a flag. Can we think of a name that indicates that this operation may have to do more I/O, with everything that that implies?

CAD97 · June 4, 2021, 2:09pm

check_eof() might communicate the presence of IO better than just is_at_eof(), but might also imply the simpler flag check? One thing that helps is that we return io::Result<bool>, which does indicate the presence of a fallible IO operation.

Plus, I think the optimal general implementation would probably be is_empty() && fill_buf()?.is_empty(), which reduces the need for IO calls to just the case when the internal buffer has been exhausted. It obviously doesn't eliminate the need, but limits it. (Which may actually be detrimental, since it'd block more irregularly? )

YuhanLiin · June 4, 2021, 7:18pm

In my PR and the previous PR for this issue, naming was also an issue (specifically, names containing "eof" were frowned upon), so suggestions are welcome.

Calling is_empty() doesn't reduce IO calls since fill_buf() only calls the internal reader if the buffer is empty. Also, is_empty() isn't a part of the BufRead trait. The only way to access the internal buffer of a BufRead instance is fill_buf().

CAD97 · June 4, 2021, 9:10pm

D'oh, sorry, I overlooked that

system · September 2, 2021, 9:11pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre-RFC: Add `BufRead::fill_buf_min()` libs	1	1258	March 25, 2019
Add {BufWriter,BufReader}::buffer() to BufRead trait	4	524	March 22, 2021
Add `BufReader::grow_buffer`	3	696	July 14, 2022
read_exact and EOF libs	5	4764	March 25, 2019
Pre-RFC: Read::read_into_uninitialized libs	15	2316	March 25, 2019

Add `is_at_eof()` to `BufRead` trait

Related topics