HIP: `hipFreeAsync` hangs
Hi, I’m experiencing hangs with hipFreeAsync and was wondering what could potentially cause that.
From my perspective it looks like some kind of racing condition.
It consistently happens at the end of the test suite when we start to release memory of the device arrays used in the process in AMDGPU.jl which provides AMD GPU programming interface in Julia language. Just to note, that memory free happens a lot during tests, it just that it hangs at the end. I made sure that we do not destroy streams or respective context. Also, freeing arrays uses NULL stream, but for other operations we use other streams. I started seeing this issues with ROCm 5.6-5.7.1 and using RX7900XT.
Here’s gdb output of the process when it hangs:
On ROCm 5.4 it was not observed and the whole test suite ran fine.
If you need any additional info, I’m happy to provide.
About this issue
- Original URL
- State: open
- Created 7 months ago
- Comments: 18 (4 by maintainers)
Mixing default and non-default streams in
hip*Asyncfunctions seems to cause hangs.Here’s C++ reproducer: