tensorflow: OP_REQUIRES failed at conv_ops.cc:1106 : Not found: No algorithm worked!

System information

  • Linux Ubuntu 20.04
  • TensorFlow installed from Docker tensorflow/tensorflow:2.4.0rc1
  • TensorFlow version: 2.4.0rc2
  • Python version: 3.6.9
  • Installed using Docker
  • CUDA/cuDNN version: CUDA 11.1 cuDNN v8
  • GPU model and memory: RTX 3080 FE 10GB

Describe the problem While training custom resnet 50 model I get the following build error:

2020-11-20 12:05:01.826720: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at conv_ops.cc:1106 : Not found: No algorithm worked!

I don’t think the code has any issues. It works fine when training with CPU.

Any other info / logs

2020-11-20 12:04:55.291380: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2020-11-20 12:04:55.291414: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2020-11-20 12:04:55.291455: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs
2020-11-20 12:04:55.360280: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcupti.so.11.0
2020-11-20 12:04:55.491657: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2020-11-20 12:04:55.491780: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed
2020-11-20 12:04:56.592756: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-20 12:04:56.610956: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3899970000 Hz
Epoch 1/30
2020-11-20 12:04:58.010569: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2020-11-20 12:04:58.802284: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2020-11-20 12:04:58.807134: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2020-11-20 12:05:01.826720: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at conv_ops.cc:1106 : Not found: No algorithm worked!
Traceback (most recent call last):
  File "custom_resnet.py", line 131, in <module>
    train_model()
  File "custom_resnet.py", line 105, in train_model
    callbacks=[tensorboard_callback]
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
    tmp_logs = self.train_function(iterator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 888, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2943, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 560, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.NotFoundError:  No algorithm worked!
	 [[node model/conv1/Conv2D (defined at custom_resnet.py:105) ]] [Op:__inference_train_function_8452]

Function call stack:
train_function

2020-11-20 12:05:01.905250: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
	 [[{{node PyFunc}}]]

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00    Driver Version: 455.32.00    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3080    On   | 00000000:2B:00.0  On |                  N/A |
|  0%   43C    P8    25W / 320W |    857MiB /  9995MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

tf.test.is_gpu_available()

WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2020-11-20 12:10:11.234638: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-11-20 12:10:11.235502: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2020-11-20 12:10:11.269174: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-20 12:10:11.269569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:2b:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.71GHz coreCount: 68 deviceMemorySize: 9.76GiB deviceMemoryBandwidth: 707.88GiB/s
2020-11-20 12:10:11.269584: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2020-11-20 12:10:11.271142: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2020-11-20 12:10:11.271167: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2020-11-20 12:10:11.271830: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2020-11-20 12:10:11.271954: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2020-11-20 12:10:11.273538: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2020-11-20 12:10:11.273878: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2020-11-20 12:10:11.273963: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2020-11-20 12:10:11.274040: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-20 12:10:11.274432: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-20 12:10:11.274959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2020-11-20 12:10:11.274975: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2020-11-20 12:10:11.593266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-20 12:10:11.593303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2020-11-20 12:10:11.593309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2020-11-20 12:10:11.593483: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-20 12:10:11.593857: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-20 12:10:11.594195: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-20 12:10:11.594517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 8743 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3080, pci bus id: 0000:2b:00.0, compute capability: 8.6)
True

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 20 (10 by maintainers)

Most upvoted comments

try adding this just after importing everthing. physical_devices = tf.config.list_physical_devices(‘GPU’) tf.config.experimental.set_memory_growth(physical_devices[0], True)

I experienced this issue on an MSI GL65 with an RTX2070 on Ubuntu 20.04.

Dynamic libraries are the following:

In [1]: import tensorflow                                                                                                                                                                                          
2021-01-28 16:05:15.891481: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0

In [2]: tensorflow.__version__                                                                                                                                                                                     
Out[2]: '2.4.0'

In [3]: tensorflow.config.experimental.list_physical_devices('GPU')                                                                                                                                                
2021-01-28 16:06:40.579904: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-28 16:06:40.588165: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-28 16:06:40.619240: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-28 16:06:40.619800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.455GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 327.88GiB/s
2021-01-28 16:06:40.619823: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-01-28 16:06:40.627330: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-01-28 16:06:40.627382: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-01-28 16:06:40.631550: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-01-28 16:06:40.633606: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-01-28 16:06:40.642000: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-01-28 16:06:40.644472: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-01-28 16:06:40.645649: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-01-28 16:06:40.645749: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-28 16:06:40.646153: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-28 16:06:40.646490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
Out[3]: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Adding the lines indicated by @king398 solved my issue.

try adding this just after importing everthing. physical_devices = tf.config.list_physical_devices(‘GPU’) tf.config.experimental.set_memory_growth(physical_devices[0], True)

same issue with nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 and RTX 3080

using cuda 11.1 cause :

Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory

tried with rc0 -> rc4

Edit : Fixed

docker image : nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 tf version : tf-nightly-gpu

Need to change LD_LIBRARY_PATH in order to make simlink

ENV LD_LIBRARY_PATH=/usr/local/cuda-11.1/targets/x86_64-linux/lib

Make simlink so libcusolver.so.10 is defined

RUN ln -s /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.11 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusolver.so.10

if you have cublas error you can try this :

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)

TF 2.4 is built & tested against CUDA 11.0, not 11.1.

I’ve found a temporary solution by using software provided by lambda stack. It works on ubuntu 20.04 for all RTX 30 series GPUs.