Poor naming of read()/write() I/O methods

I/O traits Read and Write have read() and write() methods. The problem is that it's not clear from these methods' names that they have specific behavior which doesn't guarantee reading/writing the full slice. Using these methods as if they were read_exact/write_all is a huge footgun. It's a nasty kind of bug, because depending on the underlying I/O implementation, wrong code may mostly work most of the time. I've made that mistake, I've seen others make that mistake.

These methods names look innocent, so the mistake doesn't stand out in code reviews. IMHO names of these methods should reflect their behavior better. I guess Rust's naming is supposed to reflect POSIX functions of the same name. However, "read" and "write" are very generic names, and not something that everyone will immediately understand as referring to very specific best-effort behavior of a couple of low-level system calls.

The functionality of these methods is fine and useful. It's just naming that's a problem. Partial reads/writes are a nice optimization. However, using this optimization correctly requires special care and proper integration into a larger I/O buffering algorithm, so I consider it a specialized use-case, and not something to be given to the shortest/simplest method names.

I suggest migrating away from "bare" read()/write() names. This might be done by adding an alias like read_partial(), and Clippy lints and eventually deprecation warnings that suggest using the more specific name, so that the programmer has to consciously decide whether they've meant read_exact or read_partial.

27 Likes

The logic of this sems specific to files. For sockets, pipes, stdio, etc, there is no notion of a "full" read/write.

1 Like

By "full" I mean read_all/write_all, which exists for all of these, even if it has to be implemented with multiple syscalls, etc.

I'm not objecting to behavior of read/write. I understand that there are very good reasons why these methods don't guarantee full reads/writes, and that it makes a very good I/O primitive. I'm not suggesting removing that functionality.

The issue is someone using write(slice) and mistakenly thinking it writes the content of the slice. The name of the method doesn't make its behavior clear. If this method was called write_some_but_maybe_not_all(slice), then the behavior would be clear to the programmer, and people wouldn't use it by accident where they should have used write_all.

5 Likes

I would say a clippy lint would be useful, however both read and write already return a Result that can't be easily implicitly ignored and thus make it obvious they only give/consume a set amount of bytes?

2 Likes

I don't think it's obvious they're partial, though, because it's easy to do foo.write(slice)?; and ignore the Ok value.

5 Likes

If it wasn't partial then why return the count?

And true, perhaps it can return the count wrapped in a newtype that has mustuse, that would make sense I'd think.

1 Like

Result doesn't help here.

io.write(slice)?;

is valid and used. You'd need #[must_use] on the integer it wrapped.

2 Likes

Can you have must_use on an integer in the return, or would it need to be wrapped?

A bug made it to production in our Rust code at my work, caused by using read instead of read_exact. It was especially tricky because the method was called on a BufReader, and it worked fine except when the read happened to cross the end of the BufReader’s internal buffer. So the bug was triggered very rarely, and wasn’t caught in our automated or manual testing.

(The code originally called read directly on a File. Since it was reading only two bytes into a [u8; 2] buffer, it might even have been a semi-reasonable assumption that it would never result in a partial read. But when we later wrapped the File in a BufReader to improve performance, it violated this assumption which was buried elsewhere in the code.)

I was also thinking recently that read and write should have less-innocuous names. And we should use read_exact and write_all in docs wherever possible.

When sample code has to use read or write, it should always show how to use the return value. I fixed some documentation a while ago, but it looks like I missed some, like this one.

19 Likes

Currently there's no way to add must_use to the usize it returns. #[must_use] on a function applies to the return type as a whole (so here the Result), and #[must_use] on usize itself would wreck havoc.

However, somehow adding a must-use-like warning in the compiler about the incomplete length read/written would be helpful here.

3 Likes

Minor correction: The Read method is named read_exact, not read_all.

1 Like

My experience is exactly the opposite; I find that sockets, pipes, stdio, and types like BufReader are much more likely to return partial reads. For example, if you are trying to read a four-byte value from a socket, but its bytes have been split across a packet boundary.

In contrast, at least on the platforms and filesystems that I target, I've never seen File::read or File::write fail to process the maximum number of bytes. So I'd say this problem is mostly about non-File types.

7 Likes

I agree that it's a footgun. It would be nice to have a clippy lint that detects cases where the count returned by read/write is discarded. Renaming and deprecating an std method is not fast, and adding a clippy lint would be immediately benefitial. I'm not sure how easy it is, though: in addition to simple cases like io.write(slice)?;, you can also write if let Err(err) = io.write(slice) or match against Ok(_), etc.

7 Likes

I think the "correct" thing to do here would be to add more #[must_use] (and the required plumbing to make it work), so it'd be something like

fn read(&mut self, but: &mut [u8]) -> Result<#[must_use] usize>;

Or, without adding a new location for #[must_use], make

#[must_use]
fn read(&mut self, but: &mut [u8]) -> Result<usize>;

work such that the "must use" is applied "through" the Result as it's already #[must_use].

Of course, clippy can probably add this lint easier than rustc proper.

5 Likes

Right. I was a bit confused, and was thinking of read_to_end. My point was just that there's no such thing as reading to the end of many types, but I didn't understand it was about reading to the end of the buffer.

2 Likes

I think clippy already has this lint: https://rust-lang.github.io/rust-clippy/master/#unused_io_amount

6 Likes

It's great that this lint exists, but I somehow managed to make this mistake and haven't seen this lint (I'm probably not running Clippy often enough).

Could this be helped further? e.g. can this lint be added as a warning in rustc itself?

3 Likes

Is changing cargo to run clippy by default, at least for newly created projects, out of the question?

If there's a desire to run clippy by default, we might as well just uplift specific lints into the compiler proper (which I think we might want to do for some of the uncontroversial lints).

7 Likes

A compounding problem is that the docs don't help either. For example, when std::net::TcpStream implements std::io::Read, it does so without documentation. So when rustdoc renders the docs, it will use the generic documentation from the Read trait (which obviously don't include any specific info on TcpStream).

(Also, in current rustdoc methods like read() and write() are tucked away under a collapsed "hidden undocumented" section, which is how rustdoc signals "you should already know the method names in this trait" and what they do)

I know that your main comment was on the method names, but do you think that mistakes could be reduced if we had some better documentation? (I think we might also need rustdoc enhancements to allow trait implements to augment the method docs, instead of simply replacing them, which is their only option today)

3 Likes