Rust - Deadlocks

2023-05-28

Previously published

This article was previously published on len-learns-rust.com. A full index of these articles can be found here.

One reason that access shared data using locks is a bad idea is that, in complex code, it may be possible to deadlock. At their simplest, deadlocks are caused when one thread (thread a) obtains and holds a lock which another thread (thread b) requires whilst itself being blocked from obtaining a lock that it requires because thread b already holds it… Unfortunately, whilst Rust has eliminated data races in multithreaded code, it doesn’t prevent the possibility of deadlocks.

A deadlock is pretty easy to cause if you have two or more independent locks in a piece of code.

fn main() {
    let data1 = Arc::new(Mutex::new("data1".to_string()));
    let data2 = Arc::new(Mutex::new("data2".to_string()));

    let shared_data1 = Arc::clone(&data1);
    let shared_data2 = Arc::clone(&data2);

    let mut thread = ChannelThread::new(move |message| {
        println!("got message {}", message);

        let _lock1 = shared_data1.lock().expect("failed to lock data1");

        sleep(Duration::from_millis(100));

        let _lock2 = shared_data2.lock().expect("failed to lock data2");

        return true;
    });

    for i in 1..15 {
        println!("sending {} to thread", i);
        thread.send(i.to_string());

        let _lock2 = data2.lock().expect("failed to data2");

        sleep(Duration::from_millis(100));

        let _lock1 = data1.lock().expect("failed to data1");
    }

    println!("close channel, signal thread we're done");

    thread.shutdown();

    println!("wait for thread to end");

    thread.join();

    println!("all done...");
}

The spawned thread locks the two locks in this order; lock1, lock2. Whilst the main thread locks the locks in the opposite order. The delays are to make the problem more likely to show up quickly. Without them, the code will likely run for ages without showing any sign of the potential problem.

Join in

The code for this example can be found here.

Unfortunately, std::sync::Mutex doesn’t detect deadlocks and so the code will just block these two threads forever. It’s reasonably easy to detect deadlocks when they happen, though it’s time and resource consuming. You need to track each lock and who owns it, and you need to be able to query this information when a thread attempts to obtain a lock and is unable to do so because another thread is already holding the lock. You only need to check for a deadlock if the thread that can’t obtain the new lock already holds another lock… Unfortunately, that’s not really good enough. Whilst it’s useful to have the call stacks for the threads involved in a deadlock that has occurred, it’s much more useful to be told, in advance, that a deadlock could occur. This again is reasonably easy, you need to build data on each lock acquisition pattern and then analyse the acquisition sequences and see if locks acquired in one sequence are also acquired in the reverse sequence. This is how my Windows deadlock detection tool works.

If the Rust compiler knew about any code that implemented a ’lock’ it could, conceivably, work out the potential lock acquisition sequences during compilation and disallow potentially deadlocking code. However, this doesn’t appear to be something that is being considered, and I get the feeling that the general feeling in the community is that “you should design your software so that you don’t have these issues”…

So, until such a day as the compiler has a deadlock checker that works in a similar way to the borrow checker, we’re left with using something like no_deadlocks or parking_lot; a locking replacement that has some deadlock detection support.

no_deadlocks is a Rust crate that replaces std::sync::Mutex, etc. with locks that can detect deadlocks. Using it is as simple as adding use no_deadlocks::prelude::*; at the top of your source file. See here for the adjusted example that uses it.
parking_lot is also a crate that replaces std::sync::Mutex. This is less of a drop-in replacement than no_deadlocks and the deadlock checking is an optional feature that requires a bit more work to use. See here for the adjusted example that uses it.

However, with both of these, bear in mind that they only detect actual deadlocks that have actually happened and not the potential to deadlock in code that didn’t deadlock during the test run. Your tests need to ensure that they cover all combinations of your code paths, and even then you’ll be testing with code that likely runs slower, and at the very least runs a bit differently and that in itself may be enough to perturb the threading to the point where the deadlocks don’t happen during the testing.

Join in

The code can be found here on GitHub each step on the journey will have one or more separate directories of code, so this article’s code is here, here and here this allows for easy comparison of changes at each stage.

Of course, there may be a better way; leave comments if you’d like to help me learn.