TcpStream always terminates connections successfully (even on panic)

(Originally posted here, but I got referred to "internals", so I repost it here.)

TCP connections can be terminated either successfully with the FIN flag set, or aborted with a TCP reset (RST flag), in which case previously sent data may be discarded (e.g. if the RST packet arrives early).

Under normal conditions, it is important to terminate the connection with the FIN flag. However, on critical errors, sending a TCP reset may help the peer to detect that the connection didn't terminate successfully but that something went wrong.

I noticed that dropping a TcpStream always terminates open connections with a TCP FIN flag, even in case of panics (via unwind or abort). In case of a program abort, the FIN might even be sent out despite some buffered data not having been sent out yet.

Consider the following program:

use std::io::{Write, BufWriter};
use std::net::{Shutdown, TcpListener, TcpStream};

fn handler(stream: TcpStream) -> Result<(), Box<dyn std::error::Error>> {
    let mut writer = BufWriter::new(&stream);
    writeln!(writer, "Hello.")?;
    Err("Some error here")?;
    // or: panic!("Some error here");
    // or: std::process::exit(0);
    writeln!(writer, "We are done!")?;
    writer.flush()?;
    stream.shutdown(Shutdown::Both)?;
    Ok(())
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    for accepted in TcpListener::bind("[::]:1234")?.incoming() {
        let stream = accepted?;
        match handler(stream) {
            Ok(_) => eprintln!("Handler successful"),
            Err(err) => eprintln!("Error in handler: {}", err),
        };
    }
    unreachable!();
}

Even when there is an error in the handler (either by returning prematurely or by causing a panic with stack unwinding enabled), the connection will be gracefully shut down, and the peer receives a TCP FIN.

Moreover, when I disable unwinding by setting panic = 'abort' in my Cargo.toml file and cause a panic (or simply call std::process::exit(0)), then I will get a zero byte response with proper termination (i.e. FIN) in the case of the above code example (operating system: FreeBSD). In either case, a FIN might confuse network peers, as they see a properly terminated connection where in fact the program on the remote end crashed.

Looking into the documentation of TcpStream, there seems no way to reset a connection (i.e. send an RST packet), not even explicitly.

I personally don't like the current behavior. I would prefer if a dropped TcpStream results in a TCP RST by default, and only an explicit shutdown sends a FIN. To most other people, however, that might be unexpected and cause weird behavior (and break existing code). And maybe there are some more downsides I haven't thought of yet.

So I would propose to add a method to TcpStream, which allows to set default behavior to TCP RST. On the operating system level under FreeBSD and Linux, this is achieved by calling setsockopt with SO_LINGER set to l_onoff = 1 and l_linger = 0.

What do you think?

6 Likes

For comparison, Tokio has a TcpStream::set_linger method, which makes the appropriate setsockopt call.

5 Likes

For my project, I implemented it on my own with a tiny piece of C code:

#include <sys/socket.h>

int set_linger(int fd, int timeout) {
  struct linger lingerval = { 0, };
  if (timeout >= 0) {
    lingerval.l_onoff = 1;
    lingerval.l_linger = timeout;
  }
  return setsockopt(fd, SOL_SOCKET, SO_LINGER, &lingerval, sizeof(lingerval));
}

And corresponding Rust bindings:

extern "C" {
    #[link_name = "set_linger"]
    fn set_linger_unsafe(fd: c_int, timeout: c_int) -> c_int;
}

pub fn set_linger(stream: &TcpStream, timeout: Option<u32>) -> io::Result<()> {
    let fd: RawFd = stream.as_raw_fd();
    let result: c_int = unsafe {
        set_linger_unsafe(
            fd,
            match timeout {
                None => -1,
                Some(secs) => c_int::try_from(secs).expect("integer out of bounds"),
            },
        )
    };
    if result == 0 {
        Ok(())
    } else {
        Err(io::Error::last_os_error())
    }
}

But I feel like it's something that would be nice in the standard library. But in either case, the issue might be worth mentioning in the documentation. I was confused to see a TCP FIN on a crashed program (though it's the same in C if you do not care about SO_LINGER).

Note that you can achieve this in pure Rust using the libc crate.

1 Like

Yes, though the libc interface – by principle – doesn't feel very Rusty :wink: (plus it also requires unsafe code, I guess). So I still would like to see something to control the RST vs. FIN behavior of TcpStream in the standard library, I guess.

With the short C snippet, I avoided an extra dependency, but I might look into libc in future. Thanks for the hint.

1 Like

Nominally, but not really. If it runs on a modern (non-RTOS) OS, libc is in there somewhere.

1 Like

Yes, the dependency issue is rather a minor thing. But that's not the real issue. The problem is (apart from the ugly C interface) that the libc interface is highly platform dependent. Especially sockets may behave differently on different platforms. It would be good to provide an abstraction layer, which fixes these OS specific differences, either through a crate, or through Rust's standard library.

Since the standard library provides a TcpStream, one could expect closing a TCP stream (successfully or with an error status) being part of the standard library as well.

For an example where a different aspect of TCP socket behavior depends on the platform, see this thread on stackoverflow.

Claudiu: When using a TCP socket, what does “shutdown(sock, SHUT_RD);” actually do? […]

[…]

user207421: It has two effects, one of them platform-dependent.

  1. recv() will return zero, indicating end of stream.
  2. Any further writes to the connection by the peer will either be (a) silently thrown away by the receiver (BSD), (b) be buffered by the receiver and eventually cause send() to block or return -1/ EAGAIN/EWOULDBLOCK (Linux), or (c) cause the receiver to send an RST (Windows).

I'm not sure how this is affects the Rust interface, but TcpStream::shutdown already documents some (different) OS specific behavior, which “may change in the future”:

Platform-specific behavior

Calling this function multiple times may result in different behavior, depending on the operating system. On Linux, the second call will return Ok(()) , but on macOS, it will return ErrorKind::NotConnected . This may change in the future.

Any platform-specific behavior seen by the remote peer, however, is not documented, which could affect the usefulness of TcpStream::shutdown(Shutdown::Read).

But that is just one example. There are more examples (apart from networking), where OS specific differences are passed-through to the user of the standard library. For example in RwLock:

Struct std::sync::RwLock

[…]

The priority policy of the lock is dependent on the underlying operating system’s implementation, and this type does not guarantee that any particular policy will be used. In particular, a writer which is waiting to acquire the lock in write might or might not block concurrent calls to read.

In some cases, like maybe in the above case with RwLock, platform-specific behavior of the standard library may be justified (e.g. due to performance reasons), as long as these issues or undefined aspects are documented.

In case of TCP connections, I would prefer if basic operations such as opening and closing (or shutdown-ing) sockets would ideally show the same behavior on all operating systems regarding:

  • Whether a TCP FIN is sent out, or a TCP RST (that is currently the case, though maybe not on Windows after a Shutdown::Read)
  • Whether data from remote peers is silently dropped or cause a block/congestion at the peer

Having to resort to libc when I want to signal a failure state to the remote peer doesn't seem to be a very nice interface (at least to me), and as an application programmer, I would have to care for potential operating system specific differences (that currently exist or may exist in the future).

If the standard library offers TCP networking in std::net, then it would IMHO be best if that interface was showing an as-platform-independent-as-possible behavior on the network level (as TCP/IP is used to connect machines running on different platforms).

6 Likes

This is (to me at least) a novel use of RST.

If I understand it correctly, you want the peer to see an ECONNRESET error rather than EPIPE if the socket was discarded as part of an unwind.

How would the peer distinguish this from other cases of ECONNRESET? How useful would that be in practice? I think it would be, at best, a hint of a possible malfunction but due to its unreliability application layer protocols would still need to have another mechanism to decide whether a particular request was responded to. If such a mechanism exists, however, any information conveyed in how the connection is terminated seems redundant.

I'd love to better understand the larger picture/background behind this idea, or who else is doing this.

I think the network behavior on panic should be the same as if you killed the process using the most abrupt process killing interface (i.e. SIGKILL on Linux, TerminateProcess on Windows). I'm not sure what that behavior is though.

On Linux (POSIX) it closes the file descriptor as if close(2) had been called. Read the specification:

  • All of the file descriptors, directory streams, conversion descriptors, and message catalog descriptors open in the calling process shall be closed.

It is important that the consequences of process termination as described occur regardless of whether the process called _exit () (perhaps indirectly through exit ()) or instead was terminated due to a signal or for some other reason.

I do not believe this is correct. A RST segment is generated only if closing the socket immediately would result in a TCP data loss event. Whether this is the case depends on how quickly data has been sent over network and acknowledged by the peer.

As far as I know, there is no reliable way in the BSD sockets API to generate an RST segment for an established connection.

I checked some old RFC (RFC 793), and in chapter 3 (Functional Specification), section 3.8 (Interfaces), it suggests in subsection "User/TCP Interface" two different ways to terminate a connection:

"Close: This command causes the connection specified to be closed. […] Closing connections is intended to be a graceful operation in the sense that outstanding SENDs will be transmitted (and retransmitted), as flow control permits, until all have been serviced. […]"

"Abort: This command causes all pending SENDs and RECEIVES to be aborted, the TCB to be removed, and a special RESET message to be sent to the TCP on the other side of the connection. Depending on the implementation, users may receive abort indications for each outstanding SEND or RECEIVE, or may simply receive an ABORT-acknowledgment."

Thus (at least considering this old RFC), it was intended to provide users of the TCP stack the ability to abort a connection and send a "special reset message" to the other side of the connection.

An EPIPE would only be returned when sending to a peer which reset the connection. More important to me is that a reader does not retrieve an EOF but an ECONNRESET (or, depending on the particular OS implementation any other error, as long as it is an error and not an EOF).

It cannot be distinguished.

You instantly know when a response was interrupted due to an error. The operating system won't try to flush out any fragments of an already broken message. The remote peer can distinguish successful EOFs (e.g. due to half-close or full-close) from unhandled errors (e.g. panics in Rust).

Operating systems won't report a normal EOF when receiving a TCP RST. Thus retrieving a TCP RST is a clear sign that something went wrong. However, you are right in the opposite case: Receiving a TCP FIN isn't a clear sign that everything went okay (which is why I think it's bad practice to "close" a connection on a panic rather than "abort"ing it, using RFC793's phrasing).

You are right about the redundancy.

If (some) programs do "close" instead of an "abort" on error, then we need this redundant information and cannot rely on having received a "successful" EOF on the TCP layer.

Edit to clarify: What I meant is, if there are programs out there, which "close" a connection even on error, then we need additional mechanisms to validate that a response is complete. Then, the "reset information" is redundant. Yet it can make sense to abort the connection instead of closing it (as it seems semantically more correct and can avoid unnecessary data processing, as explained in the next paragraph).

There are also many other reasons in which case this info is available redundantly (e.g. a CRC, etc). But I don't think that's a good reason to keep things as is, as in some application contexts, the message might not contain a CRC or "successful termination" string. Just to name one example, compare HTTP-connections with the Connection header set to Close and responses that don't contain length information or additional content-transfer-encoding. Also, detecting an error early may avoid unnecessary data processing.

I agree, it would be nice to see how other high-level interfaces handle this. Though I don't think that should be the only consideration when deciding what's best to do.

I just tested it on FreeBSD. It indeed is no difference whether I close the socket or kill the process. Though in both cases SO_LINGER is considered.

1 Like

I have used SO_LINGER in past, and it always resulted in a TCP RST when I enable it with a timeout set to zero.

I rechecked on Linux (5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux) and FreeBSD (12.2) with the following program:

#include <sys/socket.h>
#include <netdb.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>

static int set_linger(int fd, int timeout) {
  struct linger lingerval = { 0, };
  if (timeout >= 0) {
    lingerval.l_onoff = 1;
    lingerval.l_linger = timeout;
  }
  return setsockopt(fd, SOL_SOCKET, SO_LINGER, &lingerval, sizeof(lingerval));
}

int main(int argc, char **argv) {
  const char *host = "::";
  const char *port = "1234";
  struct addrinfo hints = { 0, };
  struct addrinfo *ai;
  int sock;
  FILE *f;
  hints.ai_family = AF_UNSPEC;
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_protocol = IPPROTO_TCP;
  hints.ai_flags = AI_ADDRCONFIG;
  if (getaddrinfo(host, port, &hints, &ai)) abort();
  sock = socket(ai->ai_family, ai->ai_socktype | SOCK_CLOEXEC, ai->ai_protocol);
  if (sock < 0) abort();
  if (set_linger(sock, 0)) abort();
  if (connect(sock, ai->ai_addr, ai->ai_addrlen)) abort();
  freeaddrinfo(ai);
  f = fdopen(sock, "r+");
  if (!f) abort();
  fprintf(f, "Hello!\n");
  fflush(f);
  sleep(2);
  //kill(getpid(), SIGKILL);
  return 0;
}

lingerval.l_onoff = 1; lingerval.l_linger = 0; reliably causes a TCP RST to be sent out, both on Linux and on FreeBSD (even on a half-closed connection, I double-checked on Linux and FreeBSD using "socat -t 60 STDIO TCP6-LISTEN:1234 < /dev/null" on the other end of the connection).

The function __tcp_close in Linux first checks for the data loss event and if so, sends RST. Right after though it checks whether SO_LINGER is set with a lingertime of 0:

	} else if (sock_flag(sk, SOCK_LINGER) && !sk->sk_lingertime) {
		/* Check zero linger _after_ checking for unread data. */
		sk->sk_prot->disconnect(sk, 0);
		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONDATA);

This should call tcp_disconnect, which, in turn, sends a RST if the connection is in a state where it needs that, which is checked via tcp_need_reset.

The source code credits antirez with this.

1 Like

Summarizing:

  • According to Internet Standard STD 7 (aka RFC 793), TCP connections can be closed by applications in two ways:
    • close (sending FIN)
    • abort (sending RST)
  • Peer applications can distinguish whether a connection was successfully closed (they receive an EOF) or was aborted (they receive an error).
  • Aborting a connection may cause data that has already been sent to be lost (which also avoids trying to flush out data that has not been confirmed by the peer yet).
  • Libc under Linux and FreeBSD provide a way to abort connections (using setsockopt with SO_LINGER).
  • The current implementation in Rust's standard library in combination with libc behavior on at least Linux and FreeBSD never aborts a connection (not even on panic) but always uses "close" (as defined in STD 7). Moreover, it is not possible to change this behavior without manually changing socket options using other libraries or C functions.

Thus my question is: should this behavior be changed? And if yes, how?

And: does anyone know how other high-level interfaces or applications typically handle this?

1 Like

My vote would be yes, it’d at least make errors noticeably more clear on the other end, instead of giving confusing internal state errors with “connection closed unexpectedly”, it’d be “peer reset”.

Does anyone know how node.js (on top level exception bubbling) and golang (on panic) handles this?

2 Likes

I just tested how Python 3.7 handles it in the case of

  • a forced process termination (SIGKILL) and
  • dropping the value through del and waiting for garbage collection.

In both cases, I witnessed a FIN, aka graceful close (same to what Rust does).

It also seems ugly to manually cause a connection abort, as this thread on stackoverflow suggests:

You have to be careful to set the SO_LINGER option on the right sockets API level and to use the right encoding for the option value (it's a struct).

[…]

con.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 0))

Here, the application programmer has to pack binary data to achieve the desired behavior.

However, I still believe it can (and should be) done differently in Rust.

Variant 1

How about the following idea:

  • panics cause an "abort" (according to RFC 793),
  • calling TcpStream::shutdown with Shutdown::Write or Shutdown::Close cause a "close" (according to RFC 793),
  • a new method will be provided to allow an explicit abort, such as TcpStream::abort,
  • dropping a TcpStream cause a "close" by default (ensures backward compatibility and also avoids data loss such as when error messages are sent to the peer before dropping the stream),
  • it's possible to switch the default drop behavior to "abort" by calling something like TcpStream::abort_on_drop() (useful for error handling with the "?" operator).

Implementation might be tricky. It could be done by setting SO_LINGER to a timeout of 0 after creating the socket and (unless abort_on_drop has been called previously) disabling SO_LINGER on drop or shutdown. But that would mean default behavior could interfere with other libraries (such as the libc crate) which may also perform setsockopt operations on the underlying socket.

So I'm not sure if this is the best idea.

Variant 2

  • panics cause an "abort" (according to RFC 793),
  • dropping the TcpStream causes an "abort" as well,
  • in order to achieve a graceful "close" (send an EOF to the peer and flush data out), the connection must be closed explicitly by calling a new method, something like TcpStream::close, possibly with an optional timeout parameter that indicates how long the operating system should attempt to flush data out.

This could keep interference with socket operations by other libraries to a minimum, as the only change would be to set SO_LINGER to a timeout of 0 after creating sockets. Other libraries may do whatever they want with the socket. Only calling TcpStream::close would set SO_LINGER (which would be documented, of course).

Downside of this is that it would break existing code and might be prone to cause errors in applications that forget to properly close a connection (in most cases, it would work, but with bad timing, information might be lost… a horrible scenario).

Variant 3

Alternatively, there could just be one method added that allows to modify SO_LINGER manually (at least to set it to "close" or "abort" by disabling it or setting a timeout of 0, respectively). That way, an application programmer who wants to care about properly closing or aborting the stream has the ability to do it, while other programmers aren't bothered.

However, this still bears the disadvantage of panics causing a graceful close by default, which seems semantically wrong, and could cause "confusing internal state errors" as @ShadowJonathan pointed out in the previous reply. (Edit: Maybe it isn't that bad and applications should/could always expect an EOF to be, in fact, a crashed peer. But it still feels semantically wrong when there is the possibility to properly report errors instead.)

Summarizing, I dislike all variants :weary:. (But variant 3 would at least be an improvement to the status quo.)

Some applications perform the entire lifecycle of such connections (custom protocols etc.), closing a connection would then be a “conscious”/intended effort on the other side’s part, but an abort would then be a good catch-all for all kinds of irrecoverable errors (network reset, timeout, or in this case, critical application failure). So that distinction should mean that applications treat aborts distinctly differently with exceptions, errors, and failed states, which is at least better than detecting if a failed state occurred after a close, which is what I was getting at. An abort is definitively “abnormal”, while with a close it depends on the application.

I like variant 3 the best as it doesn’t cause backwards incompatibility, but it would possibly not be seen and used by many developers. Still, is there a way for a struct to detect its being dropped as part of an unwind? Or detect inside a drop that current thread is panicking?

Changing default behaviour should be discussed with the lang team, as it’s technically a backwards-incompatible change. I’d argue for variant 3 right now, and then switch to something like variant 1 (with default behaviour abort) with the Lang team.

Personally, also, I think that dropping a connection is “bad manners”, as with a “live connection”, it’d be like dropping an unfinished ice cream in the trash, if you’re gonna throw it away, at least finish it and get all of the remains. The same applies here, if a connection still has queued data (for whatever reason), the other side might expect “this side” to have read that data, and so possibly confusion could occur as later bugs appear that the other side has not properly received that data, as it was in the process of dropping the connection. Making abort explicit on dropping a (unclosed) connection would disincentivize developers from dropping connections implicitly, though if the concern to make sure developers actually read the remaining buffer is real, then maybe close() could return the remaining buffer, though I don’t see that happening anytime soon, because the API is stable now.

(Though maybe this could be implemented as a new function; finalise(), which closes and reads the remaining buffer until the other side also has sent its FIN-ACK after the remaining buffer)

I’m not exactly sure what the library team’s opinion of “closing with unread data” is, but personally, I feel that such situations has data fall “through the cracks”, and so could be classified as a subtle uncommon footgun, similar to the abort/close behaviour this thread is addressing.

1 Like

Yes, that is surprisingly simple: std::thread​::panicking

1 Like

Nice!

Then I don’t think introducing a variant of variant 1, where it’ll switch to abort on drop, would be a big deal, as I’d argue that (beyond the backend implementation), the consensus here seems to be that aborting on critical failure is okay-ish (please disagree with me if that’s not the case, though)