`read_line` is a beginner footgun. How to fix it?

Stdin::read_line is very commonly used in programs written by beginners in rust. (For example, see the guessing game program in the Book.) Therefore, I think that it should be as easy to use as possible.

Code using read_line tends to be more verbose than similar code in other languages (because you need to manually create an empty string yourself before using it). Furthermore, it can be error-prone because read_line returns the string including the trailing newline, tripping up many beginners. (This issue comes up pretty frequently in the rust community discord.)

To make it worse, the documentation of Stdin::read_line, including the example, doesn't give any hints about the behavior of returning a trailing newline. It does say "For detailed semantics of this method, see the documentation on BufRead::read_line", and documents the trailing newline behavior there, but most people aren't going to click through that link.

Additionally, trimming the newline, and nothing else, can't be done conveniently. Most people end up calling str::trim or str::trim_end. This will trim not just the newline, but also any other whitespace.

To fix this situation, I'd propose doing all of the following:

  1. Add the methods Stdin::next_line and BufRead::next_line. Instead of taking a &mut String, these methods will return an io::Result<Option<String>>. This string will not have a newline, and None will be returned in the case of EOF. This behavior is similar to the pre-existing Stdin::lines and BufRead::lines.
  2. Add a method str::trim_newline, which trims either one occurrence of "\r\n" or "\n" from the end of the string.
  3. The documentation for both Stdin::next_line and BufRead::next_line should link to the newly-created methods.
  4. Change the guessing game program in the Book to use next_line.

Any thoughts?

6 Likes

I believe that Rust's current line support strikes a good balance between usability and actually avoiding footguns. I feel that implicit trimming (especially if its behavior is system-dependent) to actually be quite subtle due to the different possible endings (CRLF, LF, EOF; maybe even legacy CR if you want to go that way).The current behavior avoids losing information and allows easier manual handling.

Importantly, it also matches the POSIX definition of a line:

3.206 Line

A sequence of zero or more non-<newline> characters plus a terminating <newline> character.

All of this means that joing the result of line reads returns the input unchanged. It's a strong property.

A str helper to trim the newline may be helpful however. But even there I don't think there's an unambiguous way to do it because some people would expect system dependent behavior or not.

In general I feel that this is the classic issue of oversimplifying for beginners VS asking the developer to be explicit in the semantics they want. For beginners BufRead::lines with str::trim feels sufficient.

I'm all for calling out the presence of the newline in the doc though.

2 Likes

I think the bigger footgun might be that it requires you to clear the String, while the equivalent c++ method getline doesn't.

Rust's way is better, because it's easier and zero cost to ask the caller to clear a string than to clear it for them, and there are reasonable use cases for appending (consider reading lines until you have an entire definition in a repl, reading chunks of an ini file to be processed in parallel, ...); but it's definitely going to bite a few early converts.

Anything to point new users towards the friendlier lines() is probably a good thing.

6 Likes

Someone can improve this right now with a PR.

I've never had a need to trim only the newline, but line.trim_end_matches(['\n', '\r']) seems pretty convenient to me.

Arguably, it is a bigger footgun to allocate for each line. The current behavior of those iterators is actually pretty unfortunate.

What I think could strike a good middle ground is a fn next_line(&self, &mut String) -> Io::Result<&str> that clears the string beforehand and returns the line with the newline trimmed, but still allows for allocation reuse.

4 Likes

Why does it need to return &str (io::Result<&str>)?

Returning &str allows for trimming the newline without modifying the String itself. It's handy to have the function return a value. Though it maybe should be Option<&str> to signal EOF

2 Likes

How much is this about beginners vs other uses?

Because for beginner use, that makes me think of RFC: Add std::io::inputln() by not-my-profile · Pull Request #3196 · rust-lang/rfcs · GitHub that's talking about easier restricted-form input. That link is specifically to my brainstorming there, where I discuss it probably including things like the FromStr too, to make the guessing game that much simpler.

I don't think this part is especially error-prone; I would venture a guess that most simple uses of read_line aren't likely to need spaces or tabs preserved at the beginning or end, and would be happy with trim. For the unusual cases that don't, I don't think having to use a more elaborate incantation is likely to be an issue.

This by far seems like the most error-prone part of read_line, yes.

Pointing users towards lines would help in the case of wanting to read all of stdin or a file line by line, where the performance of not reallocating a buffer seems particularly important.

Meanwhile, for reading a single line from stdin, possibly interactively, I think something like inputln or similar would be ideal, and buffer reuse seems less critical there.

buf.next_line() is meant to do exactly buf.lines().next().transpose() though, correct? Because this new invocation is not much shorter than just buf.lines().next(). The advantage would maybe be discoverability, but the soft introduction of the iterator concept means it might be useful to leave it as is. There is definite benefit of making things like inputln extremely accessible for people writing their 1st program.

There is maybe some benefit of rallying around one right way to do it, perhaps the guessing game should use

-    let mut guess = String::new();
-
-    io::stdin()
-        .read_line(&mut guess)
-        .expect("Failed to read line");
+    let guess = io::stdin()
+        .lines()
+        .next()
+        .unwrap()
+        .expect("Failed to read line");

Edit: It could also be updated for main to return a Result, making both versions clearer. I don't know if that omission is deliberate.

What would you prefer? Iterators can't hand out references to the internal buffer, as the caller can keep the result around, which would be both unsafe and cause incorrect behavior.

I'd prefer having some kind of LendingIterator support that avoids that issue.

3 Likes

I disagree, I think it would be a better default to just allocate and return String. It is the memory allocator's job to reuse the same memory when possible. Clearing and reusing the same String is a hack where you're doing a specific memory allocation algorithm manually: it might be useful, but it doesn't need to be the default.

2 Likes

That would result in a borrow error when trying to read multiple lines at the same time, arguably a bigger footgun for beginners.

Minor caveat: even if the global allocator does a good job of reusing and making short-lived allocations cheap, it's still a dynamic call into reasonably complex allocation machinery to do so, and this is non-negligible overhead[1] that isn't necessary. Although I do agree that generally this isn't particularly impactful in most cases[2], and that Rust on average encourages playing "alloc golf" a bit more than is typically beneficial.

There's an easy to overlook inconsistency here, though, since read_line maintains the terminating newline, but lines removes the newline.

There's also the fact that calling .trim() gives you &str when for beginner-level content you probably want to stick to owning String. It's far from uncommon to see people write essentially s = s.trim().to_string() to trim String for this reason.

It's a very interesting line to play, trying to balance encouraging an alloc efficient "pit of success" without unduly penalizing simple usage that is happy using short-lived allocations to present a simpler API. This is why I'm generally[3] fond of the fs::read style of convenience functions — the object/handle manipulating API exposes the lower level "bring your own buffer" APIs and free functions wrap that into convenient one-off APIs.


  1. Especially if allocation is done by OS call, such as on Windows that uses HeapAlloc in the process-global heap, instead of using a userspace allocator (e.g. jemalloc) layer around the OS-level allocation. ↩︎

  2. Although in managed/GC languages where this style of short-lived allocation is common, there's typically a specialized allocator that makes small short-lived allocations cheaper that Rust doesn't have. ↩︎

  3. I do remain (perhaps unfairly) unfond of taking impl AsRef<Path> arguments, since that consumes ownership of owning parameter types while being unable to take advantage of that, often leading devs via compiler errors' hints to write f(path.clone()) instead of equally acceptable f(&path). If the function can't benefit from taking ownership, it would be better to take &(impl AsRef<Path> + ?Sized). And IIUC, the choice to take owned impl AsRef was made solely because a pre-1.0 contributor found it more pleasing to write fn<P: AsRef<Path>>(path: P) than fn<P: AsRef<Path> + ?Sized>(path: &P). ↩︎

7 Likes

An error at compile time is a totally different case than a runtime bug.

2 Likes

Note that many Windows API calls, HeapAlloc included, are implemented in user space, so they're not necessarily any more expensive than compiled in functions (excluding inlining), unlike Linux system calls (and more like libc).

2 Likes

HeapAlloc is particularly problematic because it takes a lock, meaning all allocations are synchronized. Every so often we get bug reports or forum issue from people discovering this when using multiple threads.

3 Likes