tensorflow-upstream: Error polling for event status: failed to query event: hipError_t(600)
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Here’s a very simple example to trigger this error:
import tensorflow as tf
x = tf.constant(2)
y = tf.constant(3)
myquot = tf.divide(x, y)
print("tf.divide(x, y)", myquot.numpy())
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Fedora 32, x86_64 using ROCm 3.3.0 rpms from http://repo.radeon.com/rocm/yum/rpm The same happens on Ubuntu 20.04 LTS, see https://github.com/RadeonOpenCompute/ROCm/issues/1074#issuecomment-626272350
- TensorFlow installed from (source or binary):
Binary (pip3 install --user tensorflow-rocm==2.2.0rc4)
- TensorFlow version (use command below):
v2.2.0-rc2-117-gbca3875 2.2.0-rc4
- Python version:
3.8.2
- GPU model and memory:
Radeon RX580 8GB
Describe the current behavior
Running the above code leads to this error:
2020-05-14 19:06:43.531958: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so
2020-05-14 19:06:43.566203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1579] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] ROCm AMD GPU ISA: gfx803
coreClock: 1.366GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: -1B/s
2020-05-14 19:06:43.610009: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2020-05-14 19:06:43.611542: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
2020-05-14 19:06:43.620835: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
2020-05-14 19:06:43.632251: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
2020-05-14 19:06:43.632388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-14 19:06:43.632700: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
2020-05-14 19:06:43.639185: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3593325000 Hz
2020-05-14 19:06:43.639544: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559d807b7840 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-14 19:06:43.639560: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-05-14 19:06:43.640635: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559d80850160 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
2020-05-14 19:06:43.640645: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Ellesmere [Radeon RX 470/480/570/570X/580/580X/590], AMDGPU ISA version: gfx803
2020-05-14 19:06:43.640737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1579] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] ROCm AMD GPU ISA: gfx803
coreClock: 1.366GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: -1B/s
2020-05-14 19:06:43.640763: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2020-05-14 19:06:43.640772: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
2020-05-14 19:06:43.640780: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
2020-05-14 19:06:43.640788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
2020-05-14 19:06:43.640829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-14 19:06:43.640840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-14 19:06:43.640844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-05-14 19:06:43.640848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-05-14 19:06:43.640910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7459 MB memory) -> physical GPU (device: 0, name: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590], pci bus id: 0000:09:00.0)
2020-05-14 19:06:43.729783: E tensorflow/stream_executor/rocm/rocm_event.cc:28] Error polling for event status: failed to query event: hipError_t(600)
2020-05-14 19:06:43.729818: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1
Abgebrochen (Speicherabzug geschrieben)
Describe the expected behavior
The code should run without issues. BTW, the HIP-Examples from https://github.com/ROCm-Developer-Tools/HIP-Examples do run without issues.
Standalone code to reproduce the issue
See above.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (3 by maintainers)
Great! Yes, the simple example does run without issues with tensorflow-rocm==2.2.0rc5. Thank you very much!
We believe we’ve fixed the issues with these release candidates. On a ROCm 3.3-based environment, can you please give this a try and let us know how it works on your example?
PyPI link: https://pypi.org/project/tensorflow-rocm/2.2.0rc5/#files