Poor naming of read()/write() I/O methods

Can you have must_use on an integer in the return, or would it need to be wrapped?

A bug made it to production in our Rust code at my work, caused by using read instead of read_exact. It was especially tricky because the method was called on a BufReader, and it worked fine except when the read happened to cross the end of the BufReader’s internal buffer. So the bug was triggered very rarely, and wasn’t caught in our automated or manual testing.

(The code originally called read directly on a File. Since it was reading only two bytes into a [u8; 2] buffer, it might even have been a semi-reasonable assumption that it would never result in a partial read. But when we later wrapped the File in a BufReader to improve performance, it violated this assumption which was buried elsewhere in the code.)

I was also thinking recently that read and write should have less-innocuous names. And we should use read_exact and write_all in docs wherever possible.

When sample code has to use read or write, it should always show how to use the return value. I fixed some documentation a while ago, but it looks like I missed some, like this one.

19 Likes

Currently there's no way to add must_use to the usize it returns. #[must_use] on a function applies to the return type as a whole (so here the Result), and #[must_use] on usize itself would wreck havoc.

However, somehow adding a must-use-like warning in the compiler about the incomplete length read/written would be helpful here.

3 Likes

Minor correction: The Read method is named read_exact, not read_all.

1 Like

My experience is exactly the opposite; I find that sockets, pipes, stdio, and types like BufReader are much more likely to return partial reads. For example, if you are trying to read a four-byte value from a socket, but its bytes have been split across a packet boundary.

In contrast, at least on the platforms and filesystems that I target, I've never seen File::read or File::write fail to process the maximum number of bytes. So I'd say this problem is mostly about non-File types.

7 Likes

I agree that it's a footgun. It would be nice to have a clippy lint that detects cases where the count returned by read/write is discarded. Renaming and deprecating an std method is not fast, and adding a clippy lint would be immediately benefitial. I'm not sure how easy it is, though: in addition to simple cases like io.write(slice)?;, you can also write if let Err(err) = io.write(slice) or match against Ok(_), etc.

7 Likes

I think the "correct" thing to do here would be to add more #[must_use] (and the required plumbing to make it work), so it'd be something like

fn read(&mut self, but: &mut [u8]) -> Result<#[must_use] usize>;

Or, without adding a new location for #[must_use], make

#[must_use]
fn read(&mut self, but: &mut [u8]) -> Result<usize>;

work such that the "must use" is applied "through" the Result as it's already #[must_use].

Of course, clippy can probably add this lint easier than rustc proper.

5 Likes

Right. I was a bit confused, and was thinking of read_to_end. My point was just that there's no such thing as reading to the end of many types, but I didn't understand it was about reading to the end of the buffer.

2 Likes

I think clippy already has this lint: https://rust-lang.github.io/rust-clippy/master/#unused_io_amount

6 Likes

It's great that this lint exists, but I somehow managed to make this mistake and haven't seen this lint (I'm probably not running Clippy often enough).

Could this be helped further? e.g. can this lint be added as a warning in rustc itself?

3 Likes

Is changing cargo to run clippy by default, at least for newly created projects, out of the question?

If there's a desire to run clippy by default, we might as well just uplift specific lints into the compiler proper (which I think we might want to do for some of the uncontroversial lints).

7 Likes

A compounding problem is that the docs don't help either. For example, when std::net::TcpStream implements std::io::Read, it does so without documentation. So when rustdoc renders the docs, it will use the generic documentation from the Read trait (which obviously don't include any specific info on TcpStream).

(Also, in current rustdoc methods like read() and write() are tucked away under a collapsed "hidden undocumented" section, which is how rustdoc signals "you should already know the method names in this trait" and what they do)

I know that your main comment was on the method names, but do you think that mistakes could be reduced if we had some better documentation? (I think we might also need rustdoc enhancements to allow trait implements to augment the method docs, instead of simply replacing them, which is their only option today)

3 Likes

If it was literally just running Clippy, then I wouldn't like that, because there are Clippy lints that are purely about idiomatic style, or approximate checks that may have false positives, so it would make a worse signal/noise ratio of the compiler warnings.

The way I understand it is that rustc warnings are about serious issues and warnings that are problems with high probability. Warnings about issues that are too subjective or too imprecise are delegated to Clippy.

In this case misused read()/write() call is not merely non-idiomatic code, but most certainly a bug, so it could be graduated from being a lint to a warning.

2 Likes

There also seems to be a small parallel with the recent #[must_bind] thread. In that thread, it was observed that not binding a guard is often wrong in subtle ways (and #[must_use] doesn't really help). In this thread, the #[must_use] on the return from Read::read() also doesn't really help, because it's often wrong to ignore the usize within the Ok variant.

4 Likes

Another thing that read and write don't do that other methods do is handle ErrorKind::Interrupted, which you'll probably never encounter until you do.

6 Likes

I'm pretty sure naming comes from libc's read/write so I don't really see problem there. Unless you're concerned about lazy users who doesn't read docs I do not see anything poor with this naming. Renaming methods just for the sake of it is pretty much pointless effort.

io traits should provide comprehensive documentation for user to understand each part of interface

To be completely fair to the users here: many uses of the Read/Write traits are done from concrete types, which on their documentation page just show that they implement Read::read/Write::write without any documentation. (Because the documentation is provided only on the trait page and not the concrete type's page.)

I agree that it would be quite beneficial for any "root" Read/Write type to provide specific documentation on these methods for whether and when they can provide partial reads/writes. I don't think a method rename is needed (or really all that desirable) here, and this can be handled by targeted docs on the impls.

1 Like

In my case it's not laziness or not reading the docs. I'm fully aware of these methods and the pitfall, and yet from time to time I forget and make this mistake, because nice and simple read/write names sound like what I want to do. They sound correct. They don't stand out in the code as doing something else than "simply" read/write.

I've mentioned earlier that I suspect it's a homage to libc::read/write, but I've never used them. For networking I always try to use anything higher-level (exactly because there are so many sharp edges), and for files there's fread/fwrite (not fread_exact/fwrite_all).

I wanted to say that other languages don't have that footgun, but Python actually does. It even has examples in the documentation that don't check if the data was written! OTOH in Node.js write() always writes everything.

11 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.