bitcoin: Improve deadlock detection

Background

If two or more threads acquire mutexes in different order, that could cause a deadlock. Currently we have two mechanisms for detecting that - our DEBUG_LOCKORDER and the thread sanitizer.

Problem

Both methods may fail to detect some deadlocks:

deadlock type	detected by `DEBUG_LOCKORDER`	detected by TSan	detected by the proposed solution
A => B => C => A	❌	✔️	✔️
test case `deadlock_unlock_not_last`	❌	✔️	✔️
test case `deadlock_3`	❌	❌ ^[1]	✔️
A => B, restart the program, B => A ^[2]	❌	❌	✔️ ^[3]

^[1] submitted as a bug report at https://github.com/google/sanitizers/issues/1258 ^[2] I guess this is how the bug which https://github.com/bitcoin/bitcoin/pull/19132 fixes sneaked in ^[3] as long as just B => A is executed

test cases

class Event
{
public:
    void signal()
    {
        std::unique_lock<std::mutex> lk(m_mutex);
        m_occurred = true;
        m_cond.notify_all();
    }

    void wait()
    {
        std::unique_lock<std::mutex> lk(m_mutex);
        m_cond.wait(lk, [&]() { return m_occurred; });
    }

private:
    bool m_occurred{false};
    std::condition_variable m_cond;
    std::mutex m_mutex;
};

static std::mutex printf_mutex;
void printf_sync(const char* format, ...)
{
    va_list ap;
    va_start(ap, format);

    {
        std::unique_lock<std::mutex> lk(printf_mutex);
        vprintf(format, ap);
    }

    va_end(ap);
}

BOOST_AUTO_TEST_CASE(deadlock_3)
{
#if 0
    // detected by DEBUG_LOCKORDER
    // not detected by the thread sanitizer (when DEBUG_LOCKORDER is disabled)
    constexpr size_t n_threads = 2;
#else
    // deadlock is not detected by either one
    constexpr size_t n_threads = 3;
#endif

    // t0: lock m0
    // t1: lock m1
    // t2: lock m2
    // t0: try to lock m1, waits for t1
    // t1: try to lock m2, waits for t2
    // t2: try to lock m0, waits for t0 => deadlock

    std::array<Mutex, n_threads> mutexes;
    std::array<std::thread, n_threads> threads;
    std::array<Event, n_threads> locked_own;
    std::array<Event, n_threads> try_to_lock_next;

    auto thread = [&](size_t i) {
        LOCK(mutexes[i]);
        printf_sync("thread%zu locked mutex%zu\n", i, i);
        locked_own[i].signal();

        try_to_lock_next[i].wait();

        const size_t next = (i + 1) % n_threads;
        printf_sync("thread%zu trying to lock mutex%zu\n", i, next);
        LOCK(mutexes[next]);
    };

    for (size_t i = 0; i < n_threads; ++i) {
        threads[i] = std::thread{thread, i};
    }

    for (size_t i = 0; i < n_threads; ++i) {
        locked_own[i].wait();
    }

    for (size_t i = 0; i < n_threads; ++i) {
        try_to_lock_next[i].signal();
    }

    for (size_t i = 0; i < n_threads; ++i) {
        threads[i].join();
    }
}

BOOST_AUTO_TEST_CASE(deadlock_unlock_not_last)
{
    // t0: lock m0
    // t0: lock m1
    // t0: unlock m0
    // t1: lock m0
    // t1: try to lock m1, waits for t0
    // t0: try to lock m0, waits for t1 => deadlock

    Mutex m0;
    Mutex m1;
    Event t1_locked_m0;

    std::thread t0{[&]() {
        ENTER_CRITICAL_SECTION(m0);
        LOCK(m1);
        LEAVE_CRITICAL_SECTION(m0);
        t1_locked_m0.wait();
        LOCK(m0);
    }};

    std::thread t1{[&]() {
        LOCK(m0);
        t1_locked_m0.signal();
        LOCK(m1);
    }};

    t0.join();
    t1.join();
}

Solution

Attach a predefined integer to each mutex that denotes the locking order in relation to other mutexes. So, whenever attempting to lock a mutex a thread would check if the previous mutex it acquired has a lower number.

This would require some thorough considerations when adding a new mutex - when is it going to be used, what other mutexes are going to be held at the time the new one is acquired and what other mutexes may be acquired while holding the new one. That is a good practice and should be done anyway. Such a mechanism will enforce it.

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 1
Comments: 15 (15 by maintainers)

Most upvoted comments

FWIW, a custom hierarchical_mutex mutex type and its usage are provided by Anthony Williams in his C++ Concurrency in Action, SE, 2019 (3.2.4 Deadlock: the problem and a solution – Use a lock hierarchy).

hebasto on Jun 5, 2020

All, thanks for the input! Next I will see if I can turn this into some code, so that we have something concrete to discuss.

PS The cycle-finding algorithm is what TSan already does.

vasild on Jun 8, 2020