tensorflow-upstream: Unable to find a suitable algorithm for doing forward convolution

Hi, I get a weird error about Unable to find a suitable algorithm for doing forward convolution when I run the session. From what I understand, there is a kernel compiled with -DLOCAL_MEM_SIZE=19008 that is not something coming from my code. Even with a batch size of 1 I get the same error.

ml_1  | 2018-08-23 21:03:11.045474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1451] Found device 0 with properties:
ml_1  | name: Device 687f
ml_1  | AMDGPU ISA: gfx900
ml_1  | memoryClockRate (GHz) 1.63
ml_1  | pciBusID 0000:0c:00.0
ml_1  | Total memory: 7.98GiB
ml_1  | Free memory: 7.73GiB
ml_1  | 2018-08-23 21:03:11.045489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1562] Adding visible gpu devices: 0
ml_1  | 2018-08-23 21:03:11.045503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Device interconnect StreamExecutor with strength 1 edge matrix:
ml_1  | 2018-08-23 21:03:11.045510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:995]      0
ml_1  | 2018-08-23 21:03:11.045516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1008] 0:   N
ml_1  | 2018-08-23 21:03:11.045547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1124] Created TensorFlow device (/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Device 687f, pci bus id: 0000:0c:00.0)
ml_1  | 2018-08-23 21:03:26.581328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1562] Adding visible gpu devices: 0
ml_1  | 2018-08-23 21:03:26.581382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Device interconnect StreamExecutor with strength 1 edge matrix:
ml_1  | 2018-08-23 21:03:26.581396: I tensorflow/core/common_runtime/gpu/gpu_device.cc:995]      0
ml_1  | 2018-08-23 21:03:26.581407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1008] 0:   N
ml_1  | 2018-08-23 21:03:26.581440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1124] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Device 687f, pci bus id: 0000:0c:00.0)
ml_1  | 2018-08-23 21:04:20.430885: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
ml_1  | 2018-08-23 21:04:20.495395: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
ml_1  | 2018-08-23 21:04:20.557689: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
ml_1  | error: local memory limit exceeded (76032) in Im2Col
ml_1  | MIOpen Error: /data/repo/MIOpen/src/tmp_dir.cpp:18: Can't execute cd /tmp/miopen-MIOpenUtilKernels.cl-faa6-605d-295b-fc2e; /opt/rocm/bin/clang-ocl  -DNUM_CH_PER_WG=1 -DNUM_IM_BLKS_X=3 -DNUM_IM_BLKS=9 -DLOCAL_MEM_SIZE=19008 -DSTRIDE_GT_1=1 -DTILE_SZ_X=32 -DTILE_SZ_Y=8 -DUSE_IM_OFF_GUARD=1 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -mcpu=gfx900 -Wno-everything MIOpenUtilKernels.cl -o /tmp/miopen-MIOpenUtilKernels.cl-faa6-605d-295b-fc2e/MIOpenUtilKernels.cl.o
ml_1  | 2018-08-23 21:04:20.879002: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1803] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution
ml_1  | [I 21:04:21.291 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
ml_1  | WARNING:root:kernel 0d0fea33-23e8-4e97-8fa9-0bda0c19ea6f restarted
ml_1  | [I 21:04:37.435 NotebookApp] Saving file at /Road Segmentation.ipynb

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 30

Commits related to this issue

update layout map examples in doc, drop stale ones - add more examples for affine layout maps showing various use cases - affine map range sizes were removed from code, but examples in LangRef w... — committed to ROCm/tensorflow-upstream by bondhugula 5 years ago

Most upvoted comments

Closing this issue for now, we are working to fix the clang-ocl version. Please open a new issue for a new problem. Thanks for the feedback – keep 'em coming.

dagamayank on Jan 2, 2019

Please delete your persistent kernel cache located typically in: ~/.cache/miopen and try the test again. Make sure you path is pointing to the new clang-ocl.

daniellowell on Dec 24, 2018