The stdlib does not have a way to terminate a worker thread other than from the thread itself. If the worker thread is blocked then it cannot check for any “terminate” message from the parent, and is effectively stuck (unless the entire process exits).
Of course, avoiding blocking in the child doesn’t have this problem, but has its own issues with API complexity, and in the case that caused me to post this, doesn’t help: the DM_DEV_WAIT Linux devicemapper ioctl just…blocks the caller until an event happens (if it ever does). Yucky API. I don’t want my main thread to block forever so I use a thread, but then there’s no way to kill the thread.
If I were writing in C, I would use pthread_cancel. There was some discussion on irc about how the stdlib loops if it receives EINTR but from looking at the pthread_cancel manpage it doesn’t appear to me that the thread would ever return from the function that was the cancellation point. There does seem to be a mechanism (see pthread_cleanup_push) to accommodate cleanup so Rust may still be able to do what it needs to do before the thread dies.
Cancelling a thread is generally considered a poor idea, in my understanding. It’s a minefield of having invariants accidentally broken. But that advice is usually for non-Rust languages, maybe we mitigate those issues with the type system.
I agree with @steveklabnik in general that there are likely some tricky safety issues here.
Do you happen to know what similar functionality is available on Windows, here? If this was ever to go in std, it’d need strong cross-platform support. Regardless, this also seems like something that can easily live outside of std.
Windows’ TerminateThread is more from the “don’t do this unless you know exactly what you’re doing” camp with no hooks for any cleanup, so you are right there’s no way to do this cross-platform. OK.
I was chatting a bit about this on IRC with @agrover, and I think there are two relevant things that I haven't seen in previous discussions of Rust and thread cancellation:
The unwind-safety story is a little better defined (or at least better explored) after the huge discussion about catch_panic / recover, and there's a little more understanding of what Drop implementations have to do. So being able to tell a remote thread to start panicking is probably sufficiently well-defined now to be permissible.
This particular use case is an OS API that cannot be interrupted other than by sending it a signal, and cannot be made async except by running it in a thread. If you signal the remote thread (with anything, as long as there's a signal handler), it will interrupt the ioctl and return EINTR. It seems useful for Rust to at least make that easy, even if it doesn't provide full cancellation support. If it were the case that all OS interfaces supported a non-blocking / async mode, Rust could just tell people to use that.
I haven't yet found the discussions about why there is no thread cancellation support, other than a 2011 post from Graydon which talks at least somewhat favorably about cancellation in general, though not pthread_cancel as an implementation thereof.
There was some discussion on irc about how the stdlib loops if it receives EINTR but from looking at the pthread_cancel manpage it doesn't appear to me that the thread would ever return from the function that was the cancellation point
Rust isn't bound to pthread semantics, but as far as I can tell: the implementation of pthread_cancel (in glibc / NPTL) seems to send an unconditional signal to the remote thread, which can cause any syscall to return early with EINTR. glibc's syscall wrappers typically use the SYSCALL_CANCEL macro, which appears to temporarily set a flag that tells the signal handler to actually do the unwinding. But, as far as I can tell, if you make a raw syscall and bypass glibc's wrapper, it will just return early with EINTR, and only start unwinding at the next cancellation point (in libc code) when it notices the thread was supposed to be cancelled.
If we were to implement cancellation, we would choose how this works in a way that makes sense for Rust, which might be to match pthreads but might not be. (And on Windows, it might involve ... sending it an exception or something? instead of TerminateThread, and handling that exception by doing a regular Rust panic.)
In particular, for your use case, returning early from the syscall via EINTR but not unwinding seems to solve the problem, and also avoids all the tricky questions about unwind safety etc. That's why I think that standardizing a way to signal a remote thread would be useful. (However, most libstd functions loop on EINTR, so maybe there would be some need to control that.)
Hrm, that post mentions Thread.interrupt, which appears to be a way to cause a remote thread to throw an exception, but only triggers in functions declared as throwing InterruptedException. (Otherwise it just sets a flag.) There's probably a way to adapt that to Rust safety semantics.
In particular, only interrupting system calls is probably fine.
You need to be careful. It’s entirely fair for unsafe code to assume that blocking on a syscall won’t lead to a panic today (why would it?). This would make that code unsound.
As mentioned before you might be able to send a signal to the thread to make the syscall return. Another thing that might work, at least it does on Windows, is to close the file descriptor.
Ah, good point. What about making the Rust function return Err (with ErrorKind::Interrupted), instead of forcibly unwinding? And, like Java, set a flag that the next syscall wrapper checks but also user code can explicitly check. So this isn't thread cancellation any more, unless you choose to .unwrap(), just syscall cancellation. And the Rust syscall wrappers already could return Err.
It'd still be a function on a Thread, so maybe a better name than "cancel" is in order.
Oh nice, that looks close enough to the options on UNIX to make this work. In particular, CancelSynchronousIO takes one parameter, a thread. (CancelIoEx is a little more precise, since it takes a file handle, but I don't think there's a way to do that on UNIX.)
There’s a lot of direct use of system APIs outside of the standard library. Anything that proposes to cause unwinding from those is going to cause massive safety issues. This really should be handled in a more precise use case specific way rather than having an API to just blindly cancel whatever system call is going on at the moment. Plus there’s all sorts of various platform specifics in which system APIs can get cancelled leading to lots of frustrated and annoyed users when it turns out CancelSynchronousIo doesn’t work on something when the unix equivalent cancels just fine.