Pre-RFC: Fix spuriosity in thread::park

I think that this is a terrible design decision because it is sometimes essential to keep the orderings of which threads park first but this doesn’t allow for this to be possible.

See: https://doc.rust-lang.org/std/thread/fn.park.html

I don’t think that requiring strictness is a good idea, because it may make you sad on certain platforms. Plus, unless I’ve misread, park()-as-implemented is not spurious; it ends with a loop from which it does not return unless it is woken up non-spuriously.

How could it make you sad on some platforms? I cannot think of a single reason when you would want it to wake up before it was told to.

Sebastian Malton

The documenation does says that it thread::park can be woken up spuriously, which is what @Nokel81 is refering to.

Hence “Requiring strictness.” We have it but do not require it.

Re why even: this boils down to an epic sadness of POSIX: your syscalls can be interrupted. The loop has a cost associated with it, since every spurious wakeup means another futex (or non-Linux equivalent) syscall; yes, in 99% of cases this cost is acceptable, but requiring it on every single platform is limiting. I’m sure someone who knows more about non-linux-x86_64 platform can elucidate.

I think that, rather than asserting that park() should guarantee strictness, it might be better to get context about what you’re doing. Maybe park() is the wrong shape.

3 Likes

I don’t recall where I saw this, but spurious wakeups are impossible to avoid if you want to guarantee that all wake-ups go through.

Basically: the OS has critical sections where it can receive but not handle a wake-up request. Since it knows it missed a request but not which parked thread it missed the wake-up for, it wakes them all up.

This applies to any un/park mechanism, whether epoll or park. If you’re getting pinged that an event happened, if you don’t want to be able to miss the ping, you have to handle spurious pings. (And this is a global choice for the notification system.)

1 Like

The things is that I have seen the opposite. That spurious wakeups are never required, even from an OS standpoint.

Sebastian Malton

I know that syscalls can be interrupted but I don’t see why it thus necessitates spurious wakeups.

Sebastian Malton

Found the (a) source!

Just think of it... like any code, thread scheduler may experience temporary blackout due to something abnormal happening in underlying hardware / software. Of course, care should be taken for this to happen as rare as possible, but since there's no such thing as 100% robust software it is reasonable to assume this can happen and take care on the graceful recovery in case if scheduler detects this (eg by observing missing heartbeats ).

Now, how could scheduler recover, taking into account that during blackout it could miss some signals intended to notify waiting threads? If scheduler does nothing, mentioned "unlucky" threads will just hang, waiting forever - to avoid this, scheduler would simply send a signal to all the waiting threads.

This makes it necessary to establish a "contract" that waiting thread can be notified without a reason. To be precise, there would be a reason - scheduler blackout - but since thread is designed (for a good reason) to be oblivious to scheduler internal implementation details, this reason is likely better to present as "spurious".


The pthread_cond_wait() function in Linux is implemented using the futex system call. Each blocking system call on Linux returns abruptly with EINTR when the process receives a signal. ... pthread_cond_wait() can't restart the waiting because it may miss a real wakeup in the little time it was outside the futex system call. This race condition can only be avoided by the caller checking for an invariant. A POSIX signal will therefore generate a spurious wakeup.

Summary : If a Linux process is signaled its waiting threads will each enjoy a nice, hot spurious wakeup .

The most common way you prevent spurious wake-ups is

while !i_should_be_awake() {
    i_sleep()
}

at some level. It doesn't have to be at your level, but the primitive shouldn't force this way on you automatically.

3 Likes

Which is precisely how std wraps the condvar it uses to implement park. =P

So park isn’t spurious because it is an abstraction.

Sebastian Malton

park is very much unlike a condvar though, and there was a lot of discussion about it last year:

Also see this fixing PR:

park/unpark is really an optimization for a spinning loop, not some weird form of condvar. From what I understood, it is not always possible or desirable to guarantee that every parked thread is woken up exactly once. You are arguing based on the current implementation, which might change any time.

Even if spurious wakeups are not possible, that still does not help users of the API: to rely on “no spurious wakeups”, you’d have to also ensure "no spurious calls to unpark" in every user of this API, and that I think is not possible in general (a thread could use multiple different abstractions that do not know about each other but all use parking). Imagine implementing a lock on top of this, so there’s a wait list of the lock… and after a thread woke up, before it got around to remove itself from the wait list, some other thread already calls unpark.

In fact, the the aforementioned PR for a use of parking where there may certainly be spurious calls to unpark. That is an intended and allowed usage of this API. Your proposal would thus break backwards compatibility.

Much of the implementation of this will change anyway with

Cc @Amanieu

3 Likes

Do we want to document this? If you squint at the shape of the API it looks a lot like a weird condvar, hence I intuit about it as though "this is a condvar".

We could probably document this better! We have the following in there

For example, it would be a valid, but inefficient, implementation to make both park and unpark return immediately without doing anything.

but that is clearly not sufficient.

I'd be happy to help review proposals for better docs. :slight_smile:

I don’t think there’s a conflict here. Most condvar implementations allow spurious wakeups; park allows spurious wakeups; so it is like a weird condvar.

In a sense, spurious wakeups are always possible because the primitive’s semantics are unlikely to be precisely the same as your application’s. Even if the primitive has a strict 1:1 wake to wakeup ratio, its likely that your higher-level code will evaluate its own state and find that it still can’t make progress and needs to sleep again.

You can put a lot of effort into making sure that your higher-level semantics are precisely the same as the primitive’s, but that ends up getting complex. Or simply tautological - you make the primitive have the semantics you want by giving it a callback which implements those semantics but the callback itself can be evaluated spuriously.

In practice I find reducing the number of spurious wakeups is a helpful performance optimization, but making “no spurious wakeups” a hard guarantee of an interface doesn’t turn out very useful.

3 Likes

That is a separate issue especially if the condvar isn’t OS backed.

Sebastian Malton

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.