Add `is_at_eof()` to `BufRead` trait

I don't think there exists a way to check whether an instance of Read has reached EOF without consuming data from the instance. A workaround I've been using is to wrap the instance in a BufReader and calling .fill_buf().is_empty()?. This technically does consume data from the reader under the hood but does not remove the data from the BufReader, so future reads are unaffected. It'd be nice to have a provided method like fn is_at_eof(&mut self) -> Result<bool> in the BufRead trait to indicate a way of checking for EOF.

3 Likes

I assume a default implementation of .fill_buf().is_empty() would be provided? Then specific (basically all) implementations would be able to override the default to avoid the underlying read if unnecessary.

Assuming as such, this fits the "simple lib addition" criteria to just be added as unstable in a PR, and we encourage anyone to submit a PR adding it if they think it'd be useful, along with reasoning about how it's useful and meaningful for the stdlib to provide. (Though search and make sure there isn't an open PR for it already.)

This is a troublesome feature because of the way the underlying I/O primitives work: read only tells you that you're at EOF when you try to read past the end of the file. The proposed implementation in terms of fill_buf might avoid the footgun that C has with its feof (see "Why is while(!feof(file)) always wrong?") but it is not apparent to me, from the documentation, whether fill_buf guarantees to return an empty buffer when and only when the "inner reader" has actually signaled EOF.

Even if it does, I'd be worried about people trying to optimize out the read and getting it wrong.

3 Likes

fill_buf definitely is not a victim to the off-by-one bug present with feof, since it always returns an empty buffer when called on an empty reader, as can be seen here. The documentation for fill_buf states that the internal buffer is filled "with more data from the inner reader if it is empty". This to me means that if the buffer is empty, then fill_buf will always try to read more data from the internal reader to fill it, so it will never return an empty buffer as long as the reader still has data remaining. Therefore the only time an empty buffer is returned is when there's an EOF.

Faulty implementations are always a concern. but I don't believe this is any "trickier" to get right than fill_buf, which you'd also have to implement when you want to roll your own BufReader impl.

I was going to ask the exact same thing. @YuhanLiin, what do you need this for?

Continuously deserializing objects from a reader.

while !file.is_at_eof()? {
    let obj = rmp_serde::decode::from_read(&mut file)?;
    // rest of the loop
}

If I don't check for EOF then I get a deserialization error, and with some serde libs it's hard to tell whether that error is caused by EOF or something else.

1 Like

You can call this method with 1 byte remaining in the stream and still get an error because of EOF. This new method doesn't help you here.

Presumably the semantics of this reader are such that a well-formed byte stream can have EOF immediately after a valid MessagePack object, but nowhere else. So at the start of the loop, EOF is not an error, but an unexpected EOF in the middle of MessagePack parsing should still be treated as an error.

3 Likes

That's an EOF error that occurs in the middle of deserialization, which shows that the input is invalid so getting an error is actually good. If the input is valid, then after the last object is deserialized the stream should only contain EOF, in which case having an EOF check prevents the deserializer from throwing an error.

What would be the advantage of .is_at_eof() instead of .fill_buf().is_empty() in the provided example? If you're not at EOF you want to read the next bytes so AFAICT it's not at all a problem that they were read into the buffer already as part of the EOF check.

1 Like

I think the main advantages are clarity and discoverability. It was certainly not obvious to me that .fill_buf().is_empty() is the right way to check whether a BufRead is at EOF.

5 Likes

However unclear that may be, it probably is the right solution. The EOF bit can usually only be obtained from a file descriptor by trying to read from it, so unless past-the-end reading was attempted, is_at_eof() would probably return false, resulting in the classic off-by-one error.

Hmm. What happens with BufReader on top of stream readers (say a pipe or socket) that just don't have data yet and are in a non-blocking mode?

Error with io::ErrorKind::WouldBlock.

2 Likes

Nobody is asking for is_at_eof() to read the EOF bit without trying to continue to read. BufRead::is_at_eof() would just be a provided method alias for .fill_buf()?.is_empty().

It's also worth noting that many users of the BufRead trait never use fill_buf or consume, the only two required methods. Instead, they interact with the trait either by forwarding (a mut ref to) it to a consumer or by the provided read_until/read_lines methods.

While fill_buf()?.is_empty() is clearly the correct general way to check if we're at the end of the read stream, it's not something that we should necessarily require users to remember to do. We can just provide the is_at_eof() alias that does the right thing.

2 Likes

On the whole I agree with this but I do have one concern: defining is_at_eof as fill_buf()?.is_empty() means that it can block and/or fail, which is surprising for a function whose name sounds like it just tests a flag. Can we think of a name that indicates that this operation may have to do more I/O, with everything that that implies?

2 Likes

check_eof() might communicate the presence of IO better than just is_at_eof(), but might also imply the simpler flag check? One thing that helps is that we return io::Result<bool>, which does indicate the presence of a fallible IO operation.

Plus, I think the optimal general implementation would probably be is_empty() && fill_buf()?.is_empty(), which reduces the need for IO calls to just the case when the internal buffer has been exhausted. It obviously doesn't eliminate the need, but limits it. (Which may actually be detrimental, since it'd block more irregularly? :grimacing:)

1 Like

In my PR and the previous PR for this issue, naming was also an issue (specifically, names containing "eof" were frowned upon), so suggestions are welcome.

Calling is_empty() doesn't reduce IO calls since fill_buf() only calls the internal reader if the buffer is empty. Also, is_empty() isn't a part of the BufRead trait. The only way to access the internal buffer of a BufRead instance is fill_buf().

D'oh, sorry, I overlooked that :sweat_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.