You cannot distinguish between \r\n and \r without external state. Solutions that involve external state do not count.
Hmm… perhaps I’m misunderstanding what you mean by “external state”. The example I just presented only has internal state, as far as I can tell (at least, that’s the only additional state compared to how stdlib already functions). The state is entirely contained within the iterator. But maybe I’m still missing your point. If so, can you point out a use-case where the solution I presented doesn’t work, but the current Lines iterator does?
In any case, as I already noted, the solution I presented depends on a specific property of the Lines iterator: it doesn’t include the line endings in its returned strings. So this doesn’t work for
read_line, which does include line endings. Is that what you’re getting at?
However, it’s still worth keeping in mind that most people don’t seem to realise or understand that when you wrap a reader, either with a buffer or some other kind of hidden state, you fundamentally cannot extricate that reader ever again without potentially losing information.
Indeed. And this is why creating a
BufReader from a reader of some kind consumes the reader, correct? In fact, producing a
Lines iterator from a
BufReader consumes the
BufReader as well.
But I guess what I’m trying to get at is that just because we can’t give perfect unicode support here in all situations, doesn’t mean we shouldn’t give the best unicode support we reasonably can in each case. I’m suggesting that the
Lines iterator can handle CR/CRLF just fine, and so it should. Whereas
read_line cannot, and so it shouldn’t. And it should be documented what limitations (if any) each has, and why.
Alternatively, we could also give
read_line a boolean parameter specifying whether it should wait for the next character (or EOF) after meeting a CR or not. Then the calling code can decide whether to have fully proper unicode support vs better behavior in an interactive/streaming setting.