Pre-RFC: TCP keepalive


#1

Summary

Stabilize TcpStream.set_keepalive, and add a method, is_closed to check for connection status.

Motivation

Keep-alive is a common feature used in TCP, and should be exposed. Likewise, knowing when a TcpStream is dead is beneficial to know before reading so a new connection can be made. It’s currently possible to shutdown a TcpStream, but a later function is not able to query if the stream has been shutdown other than to try to read from it like business as usual.

Detailed design

Adjust and stabilize set_keepalive to this, following the example from the socket timeouts RFC:

impl TcpStream {
     fn set_keepalive(&mut self, duration: Duration) -> io::Result() { ... }
}

Separately, add a method to query the status of the connection. Being able to do this without reading out bytes would be useful, so as to know whether to continue to use this Stream, or to create a new one, before trying to read bytes out of it, or passing the stream to a more generic read method not designed to reconnect.

impl TcpStream {
     fn is_closed(&self) -> io::Result<bool> { ... }
}

The other methods that close/shutdown a Stream could set state on the TcpStream itself, and is_closed could short circuit based on that state, before trying a poll on the socket.

Drawbacks

Alternatives

Unresolved questions

Should is_closed return just a bool instead of a Result? If there is any error case, it can most likely be considered closed by the user. Any error they might care about should show it’s head when they try to TcpStream::connect again.

Similar to the sockets timeout RFC, would it make more sense for set_keepalive to receive a Option<Duration>, so turning off keepalive can be done by passing None, instead of Duration::zero()?


#2

It may be important to note in the documentation that is_closed is somewhat racy in the negative case; the stream can always be closed between the check and wherever it is used. That being said, I still think it’s valuable. The biggest issue I’ve had using TcpStream in the past is that the documentation is not clear on how TcpStream::read will respond when:

  • The stream has been shutdown, and there are more bytes to be read.
  • The stream has been shutdown, and there are no more bytes to be read.
  • The stream has been closed by the remote host, and there are more bytes to be read.
  • The stream has been closed by the remote host, and there are no more bytes to be read.

I think it would simplify TcpStream error handling immensely if these cases were well specified in the documentation. All of these need to be considered in code using TcpStream, even if an is_closed method is provided. Further complicating the issue is whether the four cases are handled differently on different platforms?


#3

cc @aturon @carllerche @reem


#4

Some questions I might have:

  • Right now the implementation of set_keepalive sets both the SO_KEEPALIVE and TCP_KEEPALIVE options, which may be unintuitive (doing two things at once). This behavior is just inherited from libuv and has not been scrutinized. Could this include some rationale as to why it should be stabilized this way?

  • It looks like the options for keepalive at the OS level take a seconds argument, but this RFC proposes taking a Duration. Can you clarify what it means to specify a fractional second? It would also be nice to clarify what it means to specify a 0-length duration.

  • How is is_closed implemented? I agree with @danburkert that this is a racy function to expose.

  • Why take &mut self vs &self for set_keepalive? (the &mut isn’t required by the OS)


#5

Right now the implementation of set_keepalive sets both the SO_KEEPALIVE and TCP_KEEPALIVE options, which may be unintuitive (doing two things at once). This behavior is just inherited from libuv and has not been scrutinized. Could this include some rationale as to why it should be stabilized this way?

Sure. Setting SO_KEEPALIVE will merely turn on that said socket should exhibit keepalive behavior. It will be set to use the default values provided by the OS. Passing the TCP_KEEPIDLE (TCP_KEEPALIVE on mac) option is need to adjust the timeout to something other than the default.

An implementation would be need to be added for Windows, setting SIO_KEEPALIVE_VALS, to do similarly on Windows.

It looks like the options for keepalive at the OS level take a seconds argument, but this RFC proposes taking a Duration. Can you clarify what it means to specify a fractional second? It would also be nice to clarify what it means to specify a 0-length duration.

The value is seconds on Linux, but it is milliseconds on Windows. For platforms that use seconds, the milliseconds could be truncated to the seconds place.

I couldn’t find anything specifically detailing when this value is 0. The Windows page puts the minimum value at 1, which is 1 millisecond.

So then, I’d suggest that passing Duration::zero() would turn off the keepalive feature, just as in the timeouts RFC.

How is is_closed implemented? I agree with @danburkert that this is a racy function to expose.

It didn’t sound like he meant the function itself is racy, just that trusting it could be. For example:

if !tcp.is_closed() {
    // tcp could have been closed at this point
    try!(tcp.read(buf));
}

In this case, that seems fine to me. I’d personally handle an ErrorKind::ConnectionAborted (or whichever it is) myself. I’d just rather be able to check that the stream is closed, such as in a ConnectionPool, before handing it out to be used. Once a timeout happens then, it can be a legitimate timeout.

Why take &mut self vs &self for set_keepalive? (the &mut isn’t required by the OS)

That was an oversight, I’m so used to setters taking &mut self. I just double checked the socket timeouts RFC, and it takes &self as well.


#6

Thanks for the explanation! So it sounds like there are three states a socket can be in:

  • No keepalive at all
  • Keepalive is turned on, but it’s set to the default value
  • Keepalive is turned on and it’s using a custom value

In the current proposal it sounds like it’s only possible to transition between the first and last states and the first state is signified with Duration::zero(). I’m personally ok with that, but it may be nice to add some text to the RFC about this.

Fascinating! Your recommendations here sound fine to me!

Could you clarify on how you envision is_closed being implemented?


#7

This part is where I’m least certain. I’ve done zero actual socket coding in C, so I’d welcome someone help fill in the details.

We can check for proper shutdown using SO_ERROR with getsockopt, so that may be part of it. However, the reading I did suggests this won’t catch improper shutdowns, such as power going out, or what-have-you. Then, the suggestions are that you need to actually test a read or write. This can be done with a non-blocking call and 0 bytes, and then checking the return value and errno.

It may make sense to keep a closed property on the TcpStream, which can short-circuit these checks once it already knows, so successive is_closed calls terminate quickly. The shutdown() method could also modify this property.


#8

A possible alternative to is_closed doing all those checks, would be to be able to pass a closure or fn to the TcpStream that would be invoked when the Stream is closed. This would be when shutdown() is called, or when the tcp keepalive goes unanswered.


#9

I personally don’t know how to implement such a function or what its semantic meaning would be, which is why I, too, was curious.

Unfortunately this will undermine the from_raw_fd function as you don’t know whether an incoming file descriptor is open or closed. This also doesn’t catch when you shutdown manually elsewhere (perhaps through as_raw_fd).


#10

It shouldn’t. Unless someone pulls a nasty trick (opens a new connection and uses dup2 to swap out the file descriptor), sockets should only ever transition from open to closed. All you have to do is assume that the socket is open unless you observe that it is closed; when you observe that it is closed, you can set the flag and never check again.