IMO the std::io::Error type is not sufficent especially to handle accept() failures correctly.
The current std::io::ErrorKind::Other variant mixes fatal and nonfatal errors and there’s no possibility to decide (except for parsing the message which is obviously not portable).
Errors that can be returned by accept on linux:
Fatal errors
EBADF => ErrorKind::Other
EINVAL => ErrorKind::InvalidInput
EOPNOTSUPP => ErrorKind::Other
ENOTSOCK => ErrorKind::Other
EFAULT => ErrorKind::Other
All of those should be panics IMO.
Semi-transient errors/unclear
EPERM => ErrorKind::PermissionDenied
EMFILE => ErrorKind::Other
ENFILE => ErrorKind::Other
ENOBUFS, ENOMEM => ErrorKind::Other
EPROTO => ErrorKind::Other
Transient errors
EAGAIN or EWOULDBLOCK => ErrorKind::WouldBlock
ECONNABORTED => ErrorKind::ConnectionAborted
EINTR => ErrorKind::Interrupted
According to the man pages, it’s also possible that errors can be returned, that are already pending for the new socket, those should be treated like EAGAIN, i.e. they are nonfatal. Many of those are also ErrorKind::Other.
That means, that it’s currently impossible to handle this correctly… ErrorKind::Other could mean anything.
Is there anything that can be done to clean this up a bit?
Sounds to me like your problem would be mostly solved if you could distinguish between the variety of resource-exhausted errors and other Other errors, plus perhaps a change from some of the Other errors to InvalidInput? That last one might be hard, since different error values can mean different things for different system calls and different platforms. So if you want to be true to a particular OS, you’ll need to interpret the raw code regardless (and then no longer be platform-independent).
Whoa, how could I miss that. That's a bit embarassing now...
Specifically, on OS X the default maximum file handles is ridiculously low (256). I've run into this (ENFILE, or was it EMFILE..) and took it as a motivation to improve the error handling generally.
Previously I just handled all unknown errors as fatal, but here I can obviously just go on and once there are enough free file handles, it will all be back to normal.
Same is true for other kind of errors, at least on linux apparently:
Error handling
Linux accept() (and accept4()) passes already-pending network errors
on the new socket as an error code from accept(). This behavior
differs from other BSD socket implementations. For reliable
operation the application should detect the network errors defined
for the protocol after accept() and treat them like EAGAIN by
retrying. In the case of TCP/IP, these are ENETDOWN, EPROTO,
ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP, and
ENETUNREACH.
OTOH I'm very hesitant to just retry on any unknown error, I fear that this will "end" in an endless loop eating up all my CPU.
Using the raw system error code is a last resort but it's also not ideal. I think it should be possible to handle this in a robust way by just using the standard library without additionally depending on libc or nix (for the error constants).
Ideally the standard library would provide a classification of the error depending on the system call and the platform.