Idea: Type-Based Deadlock-Freedom

From the book (16.3):

Another detail to note is that Rust can’t protect you from all kinds of logic errors when you use Mutex<T> . [...] Similarly, Mutex<T> comes with the risk of creating deadlocks . These occur when an operation needs to lock two resources and two threads have each acquired one of the locks, causing them to wait for each other forever. If you’re interested in deadlocks, try creating a Rust program that has a deadlock; then research deadlock mitigation strategies for mutexes in any language and have a go at implementing them in Rust.

I followed the suggestion and did some research on deadlock prevention. One of the ways to prevent deadlocks i found, is to have a partial order on all locks. This means each lock is assigned a lock level and it is only allowed to take a lock, if all locks currently held have a lower lock level than the lock to be taken.

Idea

My idea would be to introduce lock levels in Rust to allow for a static analysis whether a program satisfies a partial order over all its locks. I imagine those lock levels could look somewhat similar to lifetimes. In this proposal i will use the notation °a but other/better ideas for a notation are always appreciated. Also annotations for marking lock and unlock operations would be introduced (see Alternatives below).

This would allow us to define Mutex similar to this:

struct Mutex<°level, T> {...}

impl Mutex<°level, T> {
    // can be called from any lock level < °level
    // and after calling this function the lock level will be °level
    #[lock(°level)]
    pub fn lock(&self) -> LockResult<MutexGuard<T>> {...}
   
    // This function is to be called by the drop implementation of `MutexGuard`
    #[unlock(°level)]
    fn unlock(&self) {...}
}

Then at some time during compilation (probably on MIR level after or during the borrow check) a lock level checker would be run. For this check each function would have constraints on the lock level under which it may be called and the lock level after the function returns. The check would collect constraints for the lock levels in the program (e.g. °a < °b, °c = °d, ...) and check whether they can be satisfied.

Drawbacks

  • I have not figured out how try_lock could be handled (yet). The issue here is, that that the lock level does not necessarily rise when calling try_lock. If the returned Option is later matched the None branch should be allowed all lock operations as if try_lock was not called. This issue might apply to Enums in general.
  • The deadlock detection will increase compilation times. My naïve guess is, it would roughly take as much time as borrow checking.
  • This will disallow some programs. The following program does not contain a deadlock, but it cannot be satisfied by any locking order of °a and °b:
    fn main() {
        let a: Mutex<°a> = Mutex::new(0);
        let b: Mutex<°b> = Mutex::new(0);
        {
            let guard_a = a.lock();
            let guard_b = b.lock();
        }
        {
            let guard_b = b.lock();
            let guard_a = a.lock();
        }
    }
  • This approach can only prevent deadlocks caused by circular waits. It cannot prevent deadlocks if a thread diverges while holding a lock. The following example would be accepted but can still cause a deadlock:
    static m: Mutex<°a, u32> = Mutex::new(0);
    fn in_thread_a() {
        let guard = m.lock();
        loop {}
        use(guard);
    }
    fn in_thread_b() {
         let guard = m.lock();
    }
  • This approach cannot detect lock primitives. If locks are defined by a crate they have to be annotated in order for this deadlock detection to work. This should not be too much of an issue however, since i believe the number of crates providing lock primitives is quite limited.

Possible Extension

It might make sense to provide a function which allows deadlock-free locking of multiple locks with the same lock level.

Alternatives

  • Do not change the language - do not perform deadlock prevention in the compiler (everything stays the same)
  • Make the syntactic changes to the language but only perform the checks via an opt-in at crate level.
  • Only use dynamic deadlock detection as in parking_lot::deadlock - i.e. do not make any changes at the language level
  • Instead of annotations for lock and unlock functions the user visible function syntax could be extended to allow the inclusion of the effect on lock levels in the function signature

Questions

Before trying to implement a prototype of this idea i would appreciate some feedback in general and in particular on the following questions:

  • Is static deadlock detection something we want in the language? Is it believed to be useful or to just add unnecessary overhead?
  • If yes, is this approach going in the right direction or are there fundamentally different approaches that should be considered?
  • Are there major (or minor) issues one can envision with this approach?
  • Any other feedback on this idea :slight_smile:

To me this is a textbook example of something worth writing a crate for, but not so universal or fundamental that it would make much sense in std.

I should probably mention that IIUC, the RTFM (Real Time For the Masses) embedded Rust framework does something very similar to this to not only achieve correctness, but also compile-time proofs that certain resources are uncontended or certain tasks cannot be hardware-interrupted, and thus elide synchronization overhead (which kinda blew my mind when I first read about it): https://blog.japaric.io/multicore-rtfm/

8 Likes

It's also worth mentioning that given sufficiently smart const generics, this doesn't need to be a language feature at all. I'm too lazy I don't have time to implement this as a PoC, but you should give that a try before resorting to changing the core language. It basically only requires addition and comparison over integers at compilation time. Might even be possible to do using typenum.

Please don't do that, that is highly problematic as it would result in configuration hell. There'd be misunderstanding as to which crate requires this opt-in feature, and one could forget to test their code with the checking turned on, resulting in the publication of code that sometimes doesn't / shouldn't compile.

3 Likes

Some actual drawbacks:

  • As written, it doesn't work with recursive functions or in general locking multiple instances of an mutex that is the same field in an object of the same type (e.g. walking a tree where each node is protected by a mutex)
  • Any global inference scheme like this doesn't work well with traits and introduces hidden API constraints and thus can result in any change breaking compatibility
  • Probably don't want to introduce a new sigil just for this. However, there's no need to add any new syntax unless you require all mutex-taking functions to specify it, which would be backwards-incompatible.
6 Likes

You are right, RTFM does look very interesting and the possibility to elide synchronization overhead is amazing. But as far as i understand it, it does come with some limitations like no preemption by lower level threads and mandatory pinning of threads/tasks to a specific CPU core. So i would still like to pursue the idea of deadlock prevention for a more general case.

Whether or not this makes sense in std is definitely debatable. I feel like it would add to the promise of 'Fearless Concurrency'. Anyhow, implementing this in a crate first before further discussion sounds reasonable to me and i will try to figure out how this analysis can be performed in a crate.