HIP: Device assert broken on gfx1030 with "Bus error", or hanging after synchronize
If for example i add assert(0); to the kernel in the vectorAdd sample: https://github.com/ROCm-Developer-Tools/HIP-Examples/blob/master/vectorAdd/vectoradd_hip.cpp
via
diff --git a/vectorAdd/vectoradd_hip.cpp b/vectorAdd/vectoradd_hip.cpp
index 0362c8a..a20bd2c 100644
--- a/vectorAdd/vectoradd_hip.cpp
+++ b/vectorAdd/vectoradd_hip.cpp
@@ -47,7 +47,7 @@ __global__ void
vectoradd_float(float* __restrict__ a, const float* __restrict__ b, const float* __restrict__ c, int width, int height)
{
-
+assert(0);
int x = hipBlockDim_x * hipBlockIdx_x + hipThreadIdx_x;
int y = hipBlockDim_y * hipBlockIdx_y + hipThreadIdx_y;
Then on mi250x I get the expected behavior
......
vectoradd_hip.cpp:50: void vectoradd_float(float *__restrict, const float *__restrict, const float *__restrict, int, int): Device-side assertion `0' failed.
vectoradd_hip.cpp:50: void vectoradd_float(float *__restrict, const float *__restrict, const float *__restrict, int, int): Device-side assertion `0' failed.
:0:rocdevice.cpp :2778: 1891319054196 us: 83888: [tid:0x7fd1f8497700] Callback: Queue 0x7fcfcce00000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016
make: *** [Makefile:24: test] Aborted (core dumped)
etc
However on gfx1030 using ubuntu 2204.03 and rocm5.7.1 (an officially supported combination), I get:
System minor 3
System major 10
agent prop name AMD Radeon PRO W6800
hip Device prop succeeded
Bus error
i.e. the assert message diagnostic is removed and replaced with “Bus error”.
About this issue
- Original URL
- State: open
- Created 7 months ago
- Comments: 27 (13 by maintainers)
Commits related to this issue
- Xfail all assert tests on hip. Due to https://github.com/ROCm/HIP/issues/3368 Signed-off-by: JackAKirk <jack.kirk@codeplay.com> — committed to JackAKirk/llvm by JackAKirk 4 months ago
Hi @JackAKirk I managed to reproduce the hanging issue with assert on a w6800 machine on windows. I will create an internal ticket to investigate the issue and will come back as soon as I have more details.