Tokio psuedo-RFC: eliminate `io::Error`

canndrew · July 21, 2018, 1:48am

Since tokio seems set to replace the way we do IO in Rust, I think we should take the time to reflect on the mistakes of the past and see if we can use this as an opportunity to correct them.

To me, there is one thing in particular that sticks out as a mistake - io::Error is a horrible error type that is extremely difficult to handle correctly, and it spreads horrible error-handling through anything it touches.

To justify this assertion, let’s look at Rust when error-handling is done right.

Error handling in Rust - the ideal

Rust’s enums offer a perfect solution for describing the return types of functions that can fail. More generally, if a function can result in any of n different kinds of thing, you should use an enum with n variants for its return type. For an example, take HashMap's get method:

fn get(&self) -> Option<&T>

When we try to get something from a hash map, there are two different things that can result. Either we have an item corresponding to the given key (and we return Some(&item)), or we dont (and we return None). The Option<&T> return type faithfully represents the semantics of the function and forces us to handle all, and only, those possibilities that can result from calling it.

To appreciate how beautiful this is, have a look at this method in action:

match hash_map_of_ints.get(key) {
    Some(23) => ..,
    Some(..) => ..,
    None => ..,
};

Now contrast this with javascript:

switch(hash_map_of_ints[key]) {
    case 23: ...
    case "lol i'm a string": ...
}

Javascript doesn’t force us to consider the possibility that the map doesn’t contain the key. It also allows us to handle absurd cases such as a map of ints containing a string. This is called dynamic typing, and it sucks,

Another Rust type that does error handling properly is SyncSender. Suppose I try to send some value val on a SyncSender. There’s 3 things this can return:

Ok(())
Err(TrySendError::Full(val))
Err(TrySendError::Disconnected(val))

Note that, even though this a two-tiered enum where the error is a separate type, the return type still manages to capture the semantics of the function. In the event that sending fails, it could either be because the channel is full or because it is disconnected. These correspond precisely to the two Err possibilities that we’re given.

It would rarely make sense to ignore the value of a TrySendError. Whether the channel is temporarily full or permanently disconnected is usually going to determine what we do next, so most code will match on this error and respond accordingly. Horrible code, however, might just stick a ? on the Result and send the TrySendError up the stack where all the higher-level code with no insight into the lower-level channel-based implementation will be completely unable to make sense of it. Thankfully, the design of TrySendError guides users away from doing this.

The truism that bears repeating here is that exceptions are not exceptional. If a function call can fail in n different ways then you should generally have n different branches to handle those cases. It only makes sense to forward an error if something higher-level is expected to be able to handle it, or if its critical enough that it needs to kill the whole program or be reported to the user.

Unfortunately, types like HashMap and SyncSender are the exceptions to the rule…

Useless error types - Rust in practice

Now that we’ve had a look at how error handling can be done right, let’s look at how it’s usually done instead.

io::Error is, in some sense, the canonical useless error type in Rust. It’s canonical in the sense that it both the most common example of, and a major cause of, Rust’s useless error types.

One function that uses io::Error is File::open. Let’s see it in action:

let file = match File::open(path) {
    Ok(file) => file,
    Err(e) => ... // what goes here?
}

The problem is: how the hell are we meant to handle this error? io::Error's error variants include things like ConnectionRefused and InvalidInput which are clearly nonsensical in this context. Does that mean it’s safe to use unreachable!() on them? Are you brave enough to do that? Others, like NotFound clearly can occur and we can explicitly handle them, but how do you know once you’ve handled all the cases that are actually possible? The fact that this question makes sense shows that io::Error is not doing its job as a type.

In practice what nearly all Rust code does is treat io::Error as opaque and throw it up the stack. Even if people check for specific cases (which they usually don’t) they still propogate an io::Error in order to handle any mystery error cases that they didn’t anticipate. Often this means wrapping io::Error in some other error type with a variant called Io or something. When this happens it spreads the disease of opaque non-handle-able-ness into this wrapper type as well and, at this point, most crate authors give up and use this single opaque type throughout their library. The alternative is to have a plethora of error types for different functions, but since they would all be somewhat opaque and un-handle-able, it’s hardly worth the effort to not just combine them into a single type. While it’s true that these types usually allow you to manually test for the relevant error conditions, they don’t tell you which condtions these are, the type system doesn’t force you to handle them, and the type system does force you to either plumb around errors that can never occur or add a lot of risky panics to your code. When it turns our there really are errors conditions that the programmer didn’t anticipate, these can often crash the entire program as they get propogated all the way back up to main (since nowhere else in the stack knew how to handle them either).

In many cases though it’s not even true that you can test for the error conditions you need. Look at the signature of tokio::io::copy:

pub fn copy<R, W>(reader: R, writer: W) -> Copy<R, W>
    where R: AsyncRead,
          W: AsyncWrite;

type Copy: Future<Item = _, Error = io::Error>;

This function doesn’t give you any way to tell whether it was the reader or the writer that failed. Because both the reader and the writer throw the same useless, opaque error, the tokio authors decided to just squash these errors together, throwing away important and relevant information in the process. This would have been less inviting, or even impossible, if the reader and writer had more specific error types. Again, this shows how opaque error handling tends to spread like a virus.

io::Error is worse than just opaque though. In some cases it doesn’t even do it’s job as a programming abstraction. For example, the whole point of the standard library networking APIs is to provide a portable abstraction over the networking APIs of the various platforms it runs on. But there are cases where the same error can cause different io::Errors to be returned on different platforms, leading to platform specific code (and platform-specific programmer knowledge) to be needed for even the most simple error handling.

tokio can and should strive to do better than this.

Sane error handling for `tokio`

Earlier I used File::open as an example of poor error design. In a sane world, what might the error type of File::open look like?

What follows is my first approximation of an answer. Note that this was put together by just looking at the docs for open(2) and CreateFileW on Linux and Windows only. I’m aware that these docs are likely to be incomplete and so some cases are bound to be missing, and that tokio needs to support other operating systems as well. The idea here is to have a starting point for the sake of argument.

fn File::open<P: AsRef<Path>>(path: P) -> Result<File, OpenFileError>;

#[derive(Debug, Fail)]
pub enum OpenFileError {
    #[fail(display = "{}", _0)]
    FileAccess(FileAccessError),
    #[fail(display = "{}", _0)]
    ResourceLimit(ResourceLimitError),
}

#[derive(Debug, Fail)]
pub enum FileAccessError {
    #[fail(display = "{}", _0)]
    NotFound(NotFoundError),
    #[fail(display = "{}", _0)]
    FileUnreadable(FileUnreadableError),
}

#[cfg(target_platform = "unix")]
#[derive(Debug, Fail)]
pub enum NotFoundError {
    #[fail(display = "is a directory")]
    IsADirectory,
    #[fail(display = "too many symbolic links")]
    TooManySymbolicLinks,
    #[fail(display = "filename too long")]
    NameTooLong,
    #[fail(display = "parent directory is not a directory")]
    NotADirectory,
}

#[cfg(target_platform = "windows")]
#[derive(Debug, Fail)]
#[fail(display = "file not found")]
pub struct NotFoundError;

#[derive(Debug, Fail)]
pub enum FileUnreadableError {
    #[fail(display = "permission denied")]
    PermissionDenied,
    #[fail(display = "{}", _0)]
    Other(OtherFileUnreadableError),
}

#[derive(Debug, Fail)]
#[cfg(target_platform = "unix")]
pub enum OtherFileUnreadableError {
    #[fail(display = "no such device")]
    NoDevice,
}

#[derive(Debug, Fail)]
#[cfg(target_platform = "windows")]
pub enum OtherFileUnreadableError {
    #[fail(display = "sharing violation")]
    SharingViolation,
}

#[derive(Debug, Fail)]
pub enum ResourceLimitError {
    #[fail(display = "out of memory")]
    OutOfMemory,
    #[fail(display = "{}", _0)]
    FileDescriptorLimit(FileDescriptorLimitError),
}

#[cfg(target_platform = "unix")]
#[derive(Debug, Fail)]
pub enum FileDescriptorLimitError {
    #[fail(display = "process file descriptor limit hit")]
    ProcessLimitHit,
    #[fail(display = "system file descriptor limit hit")]
    SystemLimitHit,
}

#[cfg(target_platform = "windows")]
#[derive(Debug, Fail)]
#[fail(display = "process file handle limit hit")]
pub enum FileDescriptorLimitError;

The first thing to note that is that there is, broadly, two ways opening a file can fail. Either the file can’t be accessed on the filesystem, or we’ve hit a system resource limit. This difference is important. In the former case we probably want to notify the user, in the latter we need to crash or shed load.

By presenting the user with just two variants we can encourage them to think about how to handle this error and not just mindlessly propogate it. Since resource limit problems are common to lots of IO operations this even enables us to come up with generic ways of handling them. For example, we could write a function that retries operations with exponential back-off waiting for file descriptors to become available:

trait MaybeResourceLimitError {
    type OtherError;

    fn try_into_resource_limit_error(self) -> Result<ResourceLimitError, Self::OtherError>;
}

impl MaybeResourceLimitError for OpenFileError {
    type OtherError = FileAccessError;

    fn try_into_resource_limit_error(self) -> Result<ResourceLimitError, FileAccessError> {
        match self {
            OpenFileError::FileAccess(e) => Err(e),
            OpenFileError::ResourceLimit(e) => Ok(e),
        }
    }
}

async fn retry_waiting_for_available_file_descriptors<T, E, F>(f: F)
    -> Result<T, <E as MaybeResourceLimitError>::OtherError>
where
    F: FnMut() -> TryFuture<T, E>,
    E: MaybeResourceLimitError,
{
    let mut duration = Duration::from_millis(1);
    loop {
        match await!(f()) {
            Ok(t) => return Ok(t),
            Err(e) => match e.try_into_resource_limit_error() {
                Ok(ResourceLimitError::OutOfMemory) => panic!("out of memory"),
                Ok(ResourceLimitError::FileDescriptorLimit(..)) => {
                    await!(tokio::sleep(duration));
                    duration *= 2;
                },
                Err(e) => return Err(e),
            }
        }
    }
}

This is an example of how the structure of FileOpenError allows us to de-structure it and extract a more specific error once we handle a specific case. And to do so in a way that is generic and applicable to other errors.

If the file cannot be accessed on the filesystem, FileAccessError again subdivides the error into two possible cases - either no file could be found at that location or the file is unreadable. If we want to know in what sense no file could be found at that location, or why the file is unreadable, we can drill down even further into NotFoundError or FileUnreadableError, though at this point we’re getting into operating-system-specific errors.

Generally we should try to make available all the error information provided by the operating system. In cases where OSes differ in the level of information they provide we should bury this information behind a least-common-denominator error type so that people only need to write platform-specific code when they actually care about platform-specific behaviour.

As I’ve said, the error spec given above is definitely incomplete. For as long as this is the case tokio will need to panic and request a bug report whenever it encounters an OS error code that it doesn’t expect. While these holes get plugged, and as support for other OSes gets added, the error spec will expand and evolve. By stratifying the errors into lots of different levels by specificity we can keep these changes away from the top-level errors that people are likely to use in practice. As such, we should be able to start with something less-than-perfect and evolve it while causing minimal damage to the tokio ecosystem (and whatever we start with is bound to be better than what we currently have). In this way we can finally replace the god-awful, non-portable, operating system C API integer error codes with something sane, and by doing so create a knock-on effect that causes saner error handling to spread throughout the rest of the Rust ecosystem.

I expect identifying and classifying all these possible error conditions to be an annoying, laborious, and never-ending task. However, by taking on the responsibility of this task the tokio authors can spare every other user of tokio from partially and haphazardly going through the same processs, or worse, not going through it and instead writing buggy code and exporting their own useless error types.

I urge the Rust/tokio devs to consider this. io::Error is a wart, let’s take this one opportunity we have to freeze it off.

sfackler · July 21, 2018, 3:58am

What code is trying to exhaustively match against an IO error? How does it expect to "handle" every possible bad thing that can happen when opening a file?

This seems to be arguing directly against your proposal, right? A very high effort attempt to make extremely detailed method specific error types.

What does this look like in the limit? Does every single method have its own error type that exhaustively enumerates every single operation it performs? Are those exhaustively enumerated errors expected to be propagated in a structured sense all the way through an entire application? This seems like it would impose an enormous maintenance burden and severely limit the ability of a library author to change internal implementation details without breaking backwards compatibility.

Why is tokio the place to do this? Does blocking code not care about errors in the same way?

Why is "every other user" trying to exhaustively enumerate all possible errors?

This kind of project is going to consume quite a bit of effort. What other work is going to have to be put on the back burner to exhaustively enumerate all errors that could ever be produced by any version of any operating system supported by Rust?

It does not seem acceptable to me to panic when an unexpected error is encountered. For example, your strawman error type above doesn't handle open returning EOPNOTSUPP. If someone cared about matching against that case, they would file a bug because it wasn't exposed, and if they don't care about it, they certainly wouldn't want the whole event loop to go down because some subroutine touched a weird filesystem.

The enums in the proposed hierarchy are all going to need to be non-exhaustive anyway to enable them to be fixed over time for the things that they inevitably leave out.

canndrew · July 21, 2018, 4:37am

What code is trying to exhaustively match against an IO error? How does it expect to “handle” every possible bad thing that can happen when opening a file?

In my example error "every possible bad thing" is reduced to just two broad cases: there's a problem at the filesystem level, or we've run out of some system resource like memory or file descriptors. It's easy to handle just these failure modes - report to the user that the file can't be read, or else try to shed load or crash. The enum is split-up based on how the programmer might want to handle it.

If the programmer wants to match in more detail they can inspect the nested errors types as well. I doubt any code out there is trying to exhaustively match on every subtly different error condition, but if they want to they can keep matching on nested errors all the way to the bottom of the error type.

This seems to be arguing directly against your proposal, right? A very high effort attempt to make extremely detailed method specific error types.

My point was that io::Error currently makes this hard because it obscures what the underlying error scenarios are. I'm not saying people shouldn't use extremely detailed method-specific types, they absolutely should! But right now I don't blame them for not doing so and instead passing the burden of unhandleable errors onto their own users.

What does this look like in the limit? Does every single method have its own error type that exhaustively enumerates every single operation it performs? Are those exhaustively enumerated errors expected to be propagated in a structured sense all the way through an entire application?

Yes.

This seems like it would impose an enormous maintenance burden and severely limit the ability of a library author to change internal implementation details without breaking backwards compatibility.

Backwards compatibility shouldn't be a big issue so long as the errors are properly nested into enums with only two or three variants. The outer-most error should not reflect any arbitrary implementation details. If a user is matching on a deeply-nested error then they obviously care about the fine-grained implementation details and, as such, changes to those details should break that user's code.

Why is tokio the place to do this? Does blocking code not care about errors in the same way?

It does, but the standard library APIs are set in stone. tokio has a chance to start fresh, and in the future I expect almost all IO code is going to end up using tokio.

Why is “every other user” trying to exhaustively enumerate all possible errors?

They're not, they're trying to enumerate the errors that are relevant to them. But io::Error gives no help at all in knowing what those errors are and forces them to plumb around io::Errors anyway. In practice, many people just ignore error scenarios that they can and should be handling, leading to broken code.

This kind of project is going to consume quite a bit of effort. What other work is going to have to be put on the back burner to exhaustively enumerate all errors that could ever be produced by any version of any operating system supported by Rust?

As long as we get the top layer or two of errors right the lower layers can evolve in time.

It does not seem acceptable to me to panic when an unexpected error is encountered. For example, your strawman error type above doesn’t handle open returning EOPNOTSUPP. If someone cared about matching against that case, they would file a bug because it wasn’t exposed, and if they don’t care about it, they certainly wouldn’t want the whole event loop to go down because some subroutine touched a weird filesystem.

The user can fork tokio, add the relevant error, submit a PR, then keep using their fork while they wait for the tokio devs to merge the error into the error hierarchy. Yes, a panic would suck for the user. But once it's fixed it's fixed for everyone, and we all get to enjoy much more stable code due to having error types that programmers can understand how to handle. I expect tokio to be able to round up all but the most bizarre and obscure errors pretty quickly - you've already spotted EOPNOTSUPP, so that's another one down.

The enums in the proposed hierarchy are all going to need to be non-exhaustive anyway to enable them to be fixed over time for the things that they inevitably leave out.

As long as the top level or two of the hierarchy can be settled and stabilized I think this is fine.

quadrupleslap · July 21, 2018, 4:48am

There are three obvious problems.

Maintaining this kind of thing would be pure torture.
This would get so complex that using it would also be pure torture.
You’ll always need to handle an “There’s nothing sensible to do here but throw.” case when dealing with I/O, because, well, you’re talking to the real world, so I’m not sure what this provides over the current solution, beyond more granular errors.

The line “The enum is split-up based on how the programmer might want to handle it.” confuses me. If they wanted to panic, couldn’t they just call unwrap? Regardless, forcing a panic on an unknown error makes it literally impossible to make a robust application, so that’s a non-starter.

Anyway, do you have any concrete examples of people ignoring I/O errors that they should have handled?

canndrew · July 21, 2018, 5:11am

That's all the more reason to do it once, in one place, rather than forcing everyone everywhere to deal with C error codes.

I disagree. Almost all users will never venture beyond the first layer or two of errors. For them, handling errors will be made a lot simpler.

What about getting the remote address of a TcpStream? Couldn't this just be Option<SocketAddr> and not need an error type at all? And where there are highly unhandleable errors, eg. someone ripped out the network card from underneath your server app, it's best to make the corresponding types as small as possible so that they don't accidentally absorb errors that should have been handled.

I was referring to OpenFileError which has two variants because there are (as I see it) two different ways to handle the error at the broadest level.

I think this will make applications more robust since programmers are more likely to handle errors when the error types guide them to doing so. panics would represent a bug in tokio and would become extremely rare as tokio becomes stable.

Anyway, do you have any concrete examples of people ignoring I/O errors that they should have handled?

Not handling "destination unreachable" errors when reading from a UDP socket is a bug I had recently. And I don't think I've ever seen rust code where people handle file descriptor exhaustion gracefully.

sfackler · July 21, 2018, 5:19am

It is not true that "everyone" is dealing with C error codes.

Why we would want to throw the information about why the operation failed in the garbage? Weird uncommon failures are exactly the ones that you want the most information about when they occur.

Or they're going to be completely overwhelmed by the massive surface area of the error types and continue to not bother with anything. The API surface of "open a file" now involves an error hierarchy with 7 types!

How would this proposal solve either of those problems?

Tom-Phinney · July 21, 2018, 6:40am

Isn't that already the case with tokio today? To wit:

canndrew:

In many cases though it’s not even true that you can test for the error conditions you need. Look at the signature of tokio::io::copy :
pub fn copy<R, W>(reader: R, writer: W) -> Copy<R, W>
    where R: AsyncRead,
          W: AsyncWrite;

type Copy: Future<Item = _, Error = io::Error>;
This function doesn’t give you any way to tell whether it was the reader or the writer that failed.

Isn't providing an error status that's an undifferentiated union of reader errors and writer errors "throwing [away] information about why the operation failed"?

canndrew · July 21, 2018, 8:13am

Everyone who deals with io::Error is dealing with C error codes.

I wrote that on the assumption that the only reason TcpStream::remote_addr can fail is because the socket is disconnected. If that's not true then it was a bad example.

They're already completely overwhelmed by the surface area of io::Error which contains an indefinite number of variants. An enum with 2 variants is much more manageable. There's no reason for people to drill down into sub-sub-sub-errors if they don't need to.

Take the UDP error. I didn't know that that error was even a possibility until it came up in practice. Also it only shows itself on windows so I never found it during testing. If there was a UdpRecvError with two or three variants, one of them corresponding to this situation, I would have learned about it as soon as I looked at the signature for UdpSocket::poll_recv and looked at its error type. As it is, the type system does nothing at all to help me avoid these situations. All I knew is that UdpSocket::poll_recv could fail "for some reason" and that I needed to propagate the error coz what else was I supposed to do? It turns out the "destination unreachable" error could actually just be ignored in this case and it didn't need to crash my program.

The same applies for the file descriptor exhaustion thing. Suppose I'm writing a program that creates and writes n files to a directory in parallel. I write the call to File::create and then I need to think about what to do when it fails. The first possible reasons that come to mind are that the target directory is invalid or we don't have write permission or something. In either of these cases all we can do it bail out and tell the user. Since io::Error gives me no advice to the contrary I slap a ? on File::create and call it a day.

Then someone runs my program with n = 1000000 and it dies due to file descriptor exhaustion.

If File::create had it's own error - and if we had a convention where errors were useful - I would have been much less likely to mindlessly propagate that error and make my big parallel function also return a CreateFileError. Instead I would have taken a few seconds to look at the error, seen that it has two variants, and seen that one of them is not the user's fault. I would have been forced to think about what could go wrong and I would probably have ending up writing the retry_waiting_for_available_file_descriptors function from my original post.and wrapped every call to File::create in it.

Again, think of a comparison to HashMap. If HashMap::get returned Result<&T, io::Error> do you think people would reliably handle the case where an entry is missing? Because I think lots of people would either:

(a) completely neglect this edge case in their code and end up throwing io::Errors up the stack instead of doing whatever they should have done or (b) not .expect() even when their hash map must definitely contain the key (unless they have a logic error), and in doing so pollute their function's type signature with an io::Error that can never occur.

If that was how Rust's hash maps worked Rust programs would be so unstable that we might as well be programming in python or JS. But this is how Rust's IO operations work.

josh · July 21, 2018, 8:26am

While I absolutely understand what you’re going for here, and while it’d be nice if it were possible, IO errors don’t work that way. An open call, for instance, can fail with nearly any arbitrary errno code, and you cannot account for every possible case. In fact, io’s ErrorKind explicitly declares itself as as non-exhaustive.

None of those errors are typically recoverable, except by saying “hey, user, this error happened while opening the file”. And code should definitely not be trying to think it can handle resource limit issues internally.

canndrew · July 21, 2018, 8:55am

In practice though? I know on Linux I could write a file system driver that returns ECONNREFUSED from open but surely that would be considered a bug, no? I think drivers are expected to conform to some kind of interface (?).

Why not? Why can't we keep expanding our error types until we're no longer getting panics? How much do we want to support that one, hypothetical, ultra-rare, buggy kernel driver at the expense of everyone else?

But sometimes they are recoverable. io::Error obscures the difference though between recoverable and unrecoverable errors and so people don't try to recover even when they could. I gave two examples of this above (UDP "destination unreachable" errors and file descriptor limit errors).

Why not ? If I'm running some server app on a dedicated machine for example, I would hope it's handling resource limit issues internally. Or what if I want to write a millions files in parallel?

canndrew · July 21, 2018, 9:02am

To be clear, I know that what I’m proposing would be really hard and would cause panics for people on the edge cases for a while. But I also think it would be well worth the effort and I’m not (yet) convinced that it’s impossible or that it couldn’t eventually be very stable.

josh · July 21, 2018, 10:04am

josh:

An open call, for instance, can fail with nearly any arbitrary errno code

In practice though? I know on Linux I could write a file system driver that returns ECONNREFUSED from open but surely that would be considered a bug, no? I think drivers are expected to conform to some kind of interface.

man 2 open already lists half the possible errno codes as potential return values. And no, it wouldn't necessarily be a bug. Consider, for instance, a FUSE filesystem designed to support paths like /sftp/user@hostname/path. Or consider the unlimited possibilities of what a read or write to a file in /dev could meaningfully return. Or consider the possibility that you're running in a WebAssembly environment and your "files" are being loaded from a site.

josh:

and you cannot account for every possible case

Why not? Why can't we keep expanding our error types until we're no longer getting panics? How much do we want to support that one, weird, ultra-rare, buggy kernel driver at the expense of everyone else?

How much do you want to "clean up" error handling for your own use cases at the expense of making it utterly unusable in cases you didn't anticipate?

You can always choose to handle a specific error differently if you wish. It's common to, for instance, handle ENOENT specially and exit with an error message for any other error. But you can't assume that you know every error the system could produce.

josh:

And code should definitely not be trying to think it can handle resource limit issues internally.

Why not ? If I'm running some server app on a dedicated machine for example, I would hope it's handling resource limit issues internally. Or what if I want to write a millions files in parallel?

As one of many possible failure modes: consider what happens if the problem isn't that your process is out of file descriptors but instead the system is out of file descriptors, and no amount of waiting on your process's part can possibly change that? The correct answer is to print an error message the user can immediately recognize and diagnose, not to sleep and give no indication of the problem.

You don't necessarily have to exit on error. For instance, if you're a GUI application and you're presenting a file open dialogue, you can simply print the message to the user and continue running. If you're a run-and-exit CLI application, you can print the error and exit. If you're a long-running system daemon you could log the error, drop the current request, and continue on to handle the next one. But in all three of those cases, you don't want to panic on an unknown error.

birkenfeld · July 21, 2018, 10:05am

Quoting ECONNREFUSED sounds like a strawman. I would quote the list of errors from http://man7.org/linux/man-pages/man2/open.2.html#ERRORS but it is far too long. I counted 25 distinct E… codes.

canndrew · July 21, 2018, 10:44am

Okay, maybe in the case of open we can't reasonably panic since people can make filesystems that do anything at all. But that doesn't mean we can't try to organize these errors based on how and whether to recover from them and just dump all incomprehensible errors into an "unrecoverable" bin buried a few layers deep into the error type. And it doesn't mean that there aren't other syscalls that do have a well-defined error API where we can just panic safely on impossible errnos.

How much do you want to enable preventable bugs by not cleaning up error handling? There's a trade-off here. And I still think the volume of "cases [I] didn't anticipate" could be made very small in the long run.

Cool, our hypothetical program is getting less buggy . The first iteration would crash if you gave it too big an n, the second iteration could survive this (thanks to sensible error handling in tokio) but would lock up completely if the system didn't even have a single free file descriptor left. And now (thanks to your insight) we know that we should also treat the very first file we open specially and bail out if we get a resource exhaustion error on that one. Each iteration here was an improvement over the previous.

canndrew · July 21, 2018, 10:53am

I deliberately chose it as an error that doesn't apply to open. The ones listed in open(2) should definitely be handled (well, except those that don't apply like EEXIST or EFAULT).

newpavlov · July 21, 2018, 2:12pm

I think useful functionality can be to create enum aliases with limited variants. Something like:

enum MyError {
    A(T1),
    B(T2),
    C,
}

// Can only have A and B variants, while having
// the same structure as MyError under the hood
// note that we do not repeat T1 and T2 here.
// Essentially it's just a type alias which influences
// exhaustiveness of match statements
enum MyError1 = MyError { A, B };
enum MyError2 = MyError { A, C };

// functionally equivalent to an empty enum,
// but will have the same size as MyError
enum MyError3 = MyError { };

fn foo() -> Result<(), MyError1> { .. }

fn bar() -> Result<(), MyError> {
    // note that we import variants from MyError
    use MyError::*;
    // this match is exhaustive
    match foo() {
        Ok(()) => (),
        Err(A(v)) => handle1(v),
        Err(B(v)) => handle2(v),
    }
    // MyError1 can be safely coerced to MyError
    foo()
}

fn baz() -> MyError3 {
    match foo() {
        Ok(()) => do_stuff().map_err(|_| C),
        Err(A(v)) => handle1(v),
        Err(B(v)) => Err(B(v)),
    }
}

Tom-Phinney · July 21, 2018, 2:27pm

Since the examples of io::Error so far in this thread imply that io::Error handling is to at least some extent dependent on the class of called function that reported the error, could this “drill-down” issue not be handled by companion discriminator functions, to be called on the error path, that would partition the potential reported error codes into a) the general class of those errors that cannot be handled locally and b) disjoint subclasses of those errors for which local error recovery might be possible?

A crate of match-like helper error-analysis functions for specific classes of io::Error-throwing primary functions, together with appropriate enums to provide the classifications for those helper-function results, could provide much of the error-discriminating functionality being discussed in this thread for those who wish to employ it, without breaking changes and without burdening new users with the details of such fine-grain analysis until/unless they found themselves needing it.

@canndrew’s examples at the start of this thread show what the first level or so of those enums might be. I personally would prefer a flatter enum hierarchy for the causes, so that my a) case above – those that are locally unhandlable – and all the first-level b) cases above – those that might be handled locally – were in the primary enum, thus simplifying the match statements that I would write that would inspect the output of the associated io::Error-classifying functions.

CAD97 · July 21, 2018, 3:37pm

Potential compromise strawman:

pub enum FileSystemError {
    FileAccess(..),
    ResourceExhaustion(..),
    TheSkyIsFalling(io::Error),
}

The semantics of which would say any unexpected strange environment gets put in the sky is falling case, but is also potentially allowed to be moved into a better location?

sfackler · July 21, 2018, 3:47pm

But it wouldn't be one of the two or three variants of some nice small error type, it'd be buried down in a platform-specific variant in the third level of some huge error hierarchy at least going by the structure of the open errors above.

Your failure mode here is not going to be file descriptor exhaustion, it's going to be a failure to create 1,000,000 concurrent threads. Staring deeply at every single individual system call and trying to figure out how to locally "handle" every possible error is both extremely costly in terms of development time, and probably not the right approach. I have never seen any program that automatically "handles" all open calls through something like retry_waiting_for_available_file_descriptors. Systems are instead designed to not run into hard resource limits in the first place, and if you do it's either because the limit is below what's required to run the thing and the user needs to raise it, or there's a deeper logic bug in the program.

What is unique here about Rust's hash maps in comparison to Python's or Javascript's? I'm not aware of any language that has a hashmap which returns an IO error on lookup.

Each iteration here also has nothing to do with the specific form of the error type.

birkenfeld · July 21, 2018, 4:41pm

I think it would be very useful for this proposal to take a medium-sized (no toy examples) real-world application (I guess it wouldn’t matter too much if it was std::io or tokio based) and to rewrite it using a mock-up of the proposed error types, and to introduce the assumed better local handling of individual errors. Then others could judge the introduced trade-offs much better.

Topic		Replies	Views
Simplify error handling language design	25	2467	August 16, 2020
Error ergonomics language design	34	3766	March 25, 2019
Insufficient `std::io::Error` libs	6	2725	March 25, 2019
Error handling	6	745	March 25, 2019
Allow return more then one error type from function? language design	9	1951	March 25, 2019

Tokio psuedo-RFC: eliminate `io::Error`

Error handling in Rust - the ideal

Useless error types - Rust in practice

Sane error handling for tokio

Related topics

Sane error handling for `tokio`