Issues in new I/O


#1

There is a number of issues with new std::io which have been discussed here and elsewhere, but didn’t get a dedicated thread. I’ll try to round them up here.

  1. There must be a close operation on types backed by POSIX file descriptors to be able to prevent silent data loss after writing (there’s nothing a program can do about the descriptor after failed close, but it must be possible to observe the error). RFC PR #770 has some suggestions; I disagree with the idea of making it a part of Write, but there may be a need for some trait-based solution as well. It’s also not clear what to do when dropping the object in case close was not called; there is talk of RAII guards, but I don’t think blocking calls and the RAII pattern mix well.

  2. The contract of flush does not fit all use cases. It’s specified as a top-to-bottom flush, but methods like into_inner on composed writers call it before returning the underlying writer that you may want to continue writing to. It may be preferred in some cases (e.g. stateful encoding writers) to only finalize output on the topmost layer.

  3. The Write trait mixes two different concerns. It’s a trait for byte-oriented output, but it is also the trait that provides flush. There is a need for output traits specialized to certain data types, most notably UTF-8 character output. Those may need flush as well, but adding flush to each of them looks like silly duplication. It seems prudent to separate flush (or a pair of “shallow” and “deep” flush methods as proposed above) into its own trait that Write and similar traits would require.

I’d like to get some thought on these issues before coming up with an RFC detailing solutions.


#2

In fact, this is the only motivation I have for a “shallow flush” at the moment, and this can be addressed by changing the contract on into_inner to only finalize writing, but not flush the returned writer. With an implication that the consumer should take care of the eventual flushing.


Pre-RFC: Separate reading/writing String from std::io::Read/std::io::Write
#3

cc @alexcrichton 20chars


#4

Does this have something to do with StdFS Tempdir ignoring errors on drop? That sure looks like a special case of this general problem: Dropping a value, while conveniently implicit, allows no side channel through which to deploy error information when something goes wrong.

As I stated in that thread, a flexible solution would involve adding a handler callback on creation of the Writer/Reader.

Requiring close() to handle errors fails to be as convenient as RAII. Likewise requiring an explicit guard. A callback that could be an empty function to opt out of handling drop errors seems the optimum in terms of user experience.

The callback may even return some value to the drop code to decide what to do next (like abort / retry / ignore?).

Now let me be the first to admit that using a callback is not without problems. For example, what should we do if the callback panics? Should we require the callback to run in its own task? — no, because that would add a dependendy to IO that may or may not be available depending on the system we deploy on. Also the callback would block the drop.

I still think that this problem is inherent in the nature of any side channel for error informations. If handling the error panics or fails to return, the program was wrong in the first place.


#5

@llogiq: Yes, TempDir is yet another object that needs explicit close to observe potential errors.

There is a postponed RFC PR on linear types; I wonder if that could help if File, TempDir, etc. cannot be silently dropped without it causing a compiler warning, and need to be consumed by a fn close(self) -> io::Result<()>.


#6

Let me reiterate, from an ergonomics standpoint, linear types are not the solution. Yes, they can generate a compiler error if one forgets to close(), but they don’t stop me from ignoring errors on close(). Worse, if they did, we’d just be stuck with more catch-all code that developers are going to paste all over their file handling.

We should strive to make it as simple as possible to do the right thing while making it as hard as possible to do the wrong thing. But in this case the wrong thing (ignoring errors) in one case (high-security code) can be the right thing in another (other code, e.g. irrelevant temporaries).

Thus. since we cannot make it any harder to do the wrong thing, we might as well at least make it easier to do the right thing. I believe that error handler callbacks may provide such a solution, but I’m open to other suggestions.

Note that a way to require exception-free code would go some way to make those callbacks somewhat safe.


#7

But then, that is your explicit choice, and it is evident in the code (you must consume the Result somehow).


#8

Agreed. Yet including a callback in the constructor function also makes error handling explicit, it doesn’t require linear types, it makes for shorter code in the case where errors can be ignored while still making the other case feasible.

Python, Java and Lisp have set good precedent in this case that programmers welcome block-like semantics for their resource allocations.


#9

Having special callback error handling for certain cases seems inconsistent with how other error handling works in Rust.

I’m also unsure how it would look when you do want to handle the errors, since it seems that this would optimize for the “don’t handle errors” case. Could you give an example of what you’d like it to look like?


#10

Thanks for taking the time to write this up!

Yes this is definitely something that we’re aware of, and the RFC you linked to is the current place this is being discussed. We’ve currently been operating under the assumption that it will be backwards compatible to add these features at a later date which is why we haven’t been pressing too much on them. If you have something in mind though that’s backwards-incompatible, please let us know!

I would personally expect inherent methods on “immediately buffered” objects to do something like a shallow flush. I’m not sure how much we’d benefit from having it as a trait method.

In your use case of into_inner, however, I think it’s totally fine to not do a deep flush in that case and only write the remaining buffered data.

We’ve considered a Flush trait in the past but we don’t want to go too too crazy in this respect. By moving it to a separate trait it prevents Box<Write> from being flushable, for example, and it’s much less ergonomic to pass around Box<Write + Flush> in some circumstances. Just pointing out that there is a bit of an ergonomic loss in splitting the trait.


#11

Thank you for the thorough response.

But if Write requires Flush, that means Box<Write> has all methods of Flush as well? Does it make sense to have datatype-specific write traits, like TextWriter?


#12

You are right, it definitely doesn’t look very Rusty.

I’ll retract my suggestion, though I’m not entirely happy with requiring close. I presume more high-level constructs (perhaps working with closures) will appear on cargo sooner or later anyway.


#13

I also don:'t think that having a no-op flush in non-flushable implementations would be problematic.

However, implementations should certainly document their behavior, to avoid confusion about the necessity of flushing.


#14

Personally, I’d love linear types, since they’re nice for all “multiple-actions-should-all-complete” patterns. But for this case, I agree that something with a closure scope will help. If there were a consuming close method returning errors, you’d have the option of using an

let result = path.with_opened(|file| {
  try!(foo(file));
  try!(bar(file));
  ...
});

pattern (or something attached to OpenOptions).

I believe people are excited for a linear type solution because Rust is already very close ot it. But I understand that it’s hard to combine with unwinding. Personally, I’ve wondered why it even matters when you’ve already panicked and are unwinding. An error on close would then just be another error that a thread had.


#15

There are two concerns:

  1. Making sure that the programmer “does the right thing” to finalize usage of a resource. This can be addressed by linear types.
  2. Performing the necessary cleanup, preferably also in case of unwinding. This can be implemented explicitly, by spawning a scope thread for processing that may panic and joining it before doing the cleanup. Cleanup may also be done in the destructor, but the possibilities here are limited. Perhaps blocking on POSIX close and potentially changing the errno value are acceptable effects.

#16

I have submitted an RFC PR proposing amendments for flush and into_inner.