Mutex Locking and Poisoning

mitsuhiko · May 12, 2015, 8:03pm

Since the scoped threads are temporarily gone I’m playing around with examples for mutexes now and I feel like API suffers from the poisoning a lot. There is a lot of mutex.lock().unwrap().

From what I understand the poisoning is really only there when you have threads that panic. So why not imply the .unwrap() and provide a better error when it fails? Most code should never see a poisoned mutex and if unwinding is disabled then it should never happen.

But maybe I’m misunderstanding the issue here? In any case the mutex as it exists currently is quite painful to use API wise.

brson · May 12, 2015, 9:51pm

I’ve also been bitten by poisoning recently, for other reasons. I had a program where thread panics were expected, so all my locks had to be ‘doubly unlocked’ by reaching into the poisoned inner lock (I don’t remember the exact API). It was annoying.

mitsuhiko · May 12, 2015, 11:20pm

I’m basically ignoring recovery from panics as I’m not a fan of unwinding. I did try to use the poison feature now and it does indeed seem very tricky to use properly. The docs also pretty much leave out how to properly use respond to poisoned mutexes.

Is there a canonical example how you are actually supposed to use this?

seanmonstar · May 12, 2015, 11:48pm

Mutexes used to panic on lock if they had been poisoned. They were changed to return a Result, because it was argued unacceptable that there was no way to gracefully handle poisoning.

rkjnsn · May 13, 2015, 6:22am

Do you know why a new method wasn’t added? Something like lock if you want to propagate panics, or try_lock if you wanted to handle it?

oli-obk · May 13, 2015, 7:04am

It wasn’t added due to the rust-principle that there should never be a panicking and a non-panicking function that do the same thing.

You can always write yourself a macro lock! and try_lock! that can be used directly on the mutex type…

mitsuhiko · May 13, 2015, 1:23pm

Mutexes used to panic on lock if they had been poisoned. They were changed to return a Result, because it was argued unacceptable that there was no way to gracefully handle poisoning.

I would understand that in many situations but not with locks. If you consider how you use them the only reason why their get poisoned is a panic. If you already subscribed to the idea of recovering from panics then you could just use a secondary API that gives you a result.

Right now everybody has to pay the price of unwrapping the lock result and the error messages are absolutely terrible. There is no indication that a mutex was poisoned, just that something panicked.

Sure, someone could .expect on it, but that really does not happen going by pretty much all uses of that mutex in the wild I could find (github, docs).

mitsuhiko · May 13, 2015, 1:29pm

Not sure if I was clear enough on this, but I think the idea that there should never be a panicking and a non panicking function on such a thing really does not apply to a mutex. The only reason it would panic is because something else already panicked. If you write code that does not panic then you will never encounter this situation, so I think it’s fair to make this thing panic by default and provide some other method for the 1% of people that need to handle this.

oli-obk · May 13, 2015, 3:57pm

every rule has it’s exception… And I agree that in this case there should be both, since the panicing version is the common one, but I guess it would need an RFC.

mitsuhiko · May 15, 2015, 1:54pm

Not a full RFC yet, but a start:

Feature Name: mutex-api
Start Date: 2015-05-13
RFC PR:
Rust Issue:

Summary

Currently mutexes in Rust suffer a bit from a less than ideal API caused by leaking out the internals of the poisoning system. This proposal changes this to a more user friendly API for the common case.

Motivation

Mutexes in Rust can be “poisoned” by a task panic. This essentially means that after a task failed another task will no longer be able to lock the mutex. This is a useful feature but for the vast majority of cases the poisoning state is actually not useful and the only reasonable action for the user is to propagate the failure by panicking the current task as well.

Right now for instance there is no real way to remove the poison flag from the mutex which either means that you need to continue using a poisoned mutex which makes the locking pattern a lot more complex or

Detailed design

The proposal of this RFC is to deprecate the current lock and try_lock methods and to introduce two new ones:

acquire(&self): this replaces the previous lock().unwrap() pattern. Unlike that however the default behavior is to panic with a reasonable error message. As poisoned locks are only encountered when a task already failed it means that task failure is already a considered behavior.
try_acquire(&self): same as acquire but does not block. This also propagates poisoned failures by panicking. This has the added advantage that code that handles the error case only needs to be considered with the failed acquisition of the lock and not the poison case.

In addition the old methods would survive with new names:

lock -> safe_acquire
try_lock -> safe_try_acquire

In addition the LockResult will gain a method into_inner_detox which converts the result into the guard but also releases the internal poison flag. This allows the recovering of the mutex without having to use a second mutex to guard the mutex swap.

Drawbacks

It deprecates a perfectly fine method name (lock).

Alternatives

Not doing anything.

Unresolved questions

The names of the new methods are not as nice as the old ones, maybe some alternatives can be found.

For the pattern where poisoning can be ignored it might be reasonable to add a force_acquire method that always returns the guard and ignores the poison flag. This can be useful

seanmonstar · May 15, 2015, 3:51pm

Looks like your thought cut off after "or".

Also, the first part of the statement is not true. It is possible to recover a poisoned mutex: the PoisonError has into_inner() to receive the poisoned value. You could use that to initialize a brand new value and insert it back. A sentinel thread could do this to make sure the value is cleaned up and still usable by other tasks, for example.

makoConstruct · January 25, 2016, 6:05am

Was the discussion logged? I’d like to see that argument. I’m not sure what could have convinced the implementers to go with an API that seems to violate the principle that panics are for unrecoverable situations. Everyone who learns this API is being told “if any of the other threads panicked, you’re supposed to handle that.” In practice, people don’t seem to bother most of the time, and for good reason.

I would like to see @mitsuhiko’s acquire added, to deftly express that you’re not expected to be handling poisoning all of the time.

nikomatsakis · January 25, 2016, 3:13pm

I don't believe there is a log of the conversation, however it's never quite as simple as "panics are for unrecoverable situations" -- in particular, we usually view threads as natural recovery points in the system, which makes mutex (and channels) a kind of middle ground.

system · March 25, 2019, 8:24am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unpoisoning a Mutex libs	8	2304	July 24, 2022
Could `Mutex<T>` have been `Mutex<Poison<T>>`? libs	6	431	July 7, 2025
Cell vs catch_unwind	16	3257	March 25, 2019
Expose Mutex in core for targets that don't need OS support language design	7	1010	August 3, 2023
Nesting std::rt::unwind::try	3	1286	March 25, 2019