HIP: `hipFreeAsync` hangs

Hi, I’m experiencing hangs with hipFreeAsync and was wondering what could potentially cause that. From my perspective it looks like some kind of racing condition.

It consistently happens at the end of the test suite when we start to release memory of the device arrays used in the process in AMDGPU.jl which provides AMD GPU programming interface in Julia language. Just to note, that memory free happens a lot during tests, it just that it hangs at the end. I made sure that we do not destroy streams or respective context. Also, freeing arrays uses NULL stream, but for other operations we use other streams. I started seeing this issues with ROCm 5.6-5.7.1 and using RX7900XT.

Here’s gdb output of the process when it hangs: hang

On ROCm 5.4 it was not observed and the whole test suite ran fine.

If you need any additional info, I’m happy to provide.

About this issue

  • Original URL
  • State: open
  • Created 7 months ago
  • Comments: 18 (4 by maintainers)

Most upvoted comments

Mixing default and non-default streams in hip*Async functions seems to cause hangs.

Here’s C++ reproducer:

#include <hip/hip_runtime.h>
#include <thread>

void fn() {
    hipStream_t stream;
    hipStreamCreate(&stream);

    int n_elements = 1024 * 1024;
    int size = n_elements * sizeof(int);

    int *a = new int[n_elements];
    int *b = new int[n_elements];

    int *da, *db;
    hipMallocAsync(&da, size, stream);
    hipMallocAsync(&db, size, stream);

    hipMemcpyHtoDAsync(da, a, size, stream);
    hipMemcpyHtoDAsync(db, b, size, stream);

    /* hipFreeAsync(da, stream); */ // <--- Works fine.
    hipFreeAsync(da, nullptr); // <--- Mixing default stream with non-default causes hang!
    hipFreeAsync(db, stream);

    hipStreamSynchronize(stream);
    hipStreamDestroy(stream);

    delete[] a;
    delete[] b;
}

void thread_fn() {
    for (int i = 0; i < 1000; i++) {
        fn();
    }
}

int main() {
    std::thread t1(thread_fn);
    std::thread t2(thread_fn);
    std::thread t3(thread_fn);
    std::thread t4(thread_fn);

    t1.join();
    t2.join();
    t3.join();
    t4.join();
    return 0;
}