tensorflow: NotFoundError: No algorithm worked! when using Conv2D
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes, see below.
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Arch Linux fully updated
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
- TensorFlow installed from (source or binary): Installed from pacman
- TensorFlow version (use command below): 2.3.0
- Python version: 3.8.5
- Bazel version (if compiling from source): N/A
- GCC/Compiler version (if compiling from source): N/A
- CUDA/cuDNN version: Cuda is 11.0, cuDNN is 8.0.2
- GPU model and memory: RTX 2070 Super
Describe the current behavior
I am running the following code: https://github.com/davidADSP/GDL_code/blob/tensorflow_2/02_03_deep_learning_conv_neural_network.ipynb
entry 11, when it does the fit, it fails. It returns:
NotFoundError: No algorithm worked! [[node functional_1/conv2d/Conv2D (defined at <ipython-input-7-10b06c61fca5>:1) ]] [Op:__inference_train_function_2021]
Function call stack: train_function
This used to work ok previously, 1-2 months ago. I was just testing some other conv2d today, and it failed, so I went to test this example which I know it worked, and it also fails
Describe the expected behavior
The code should work
Standalone code to reproduce the issue See the previous link. It’s a github jupyter code
Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
NotFoundError Traceback (most recent call last) <ipython-input-7-10b06c61fca5> in <module> ----> 1 model.fit(x_train 2 , y_train 3 , batch_size=32 4 , epochs=10 5 , shuffle=True
/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
106 def _method_wrapper(self, *args, **kwargs):
107 if not self._in_multi_worker_mode(): # pylint: disable=protected-access
–> 108 return method(self, *args, **kwargs)
109
110 # Running inside run_distribute_coordinator already.
/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing) 1096 batch_size=batch_size): 1097 callbacks.on_train_batch_begin(step) -> 1098 tmp_logs = train_function(iterator) 1099 if data_handler.should_sync: 1100 context.async_wait()
/usr/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in call(self, *args, **kwds) 778 else: 779 compiler = “nonXla” –> 780 result = self._call(*args, **kwds) 781 782 new_tracing_count = self._get_tracing_count()
/usr/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds) 838 # Lifting succeeded, so variables are initialized and we can run the 839 # stateless function. –> 840 return self._stateless_fn(*args, **kwds) 841 else: 842 canon_args, canon_kwds = \
/usr/lib/python3.8/site-packages/tensorflow/python/eager/function.py in call(self, *args, **kwargs) 2827 with self._lock: 2828 graph_function, args, kwargs = self._maybe_define_function(args, kwargs) -> 2829 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access 2830 2831 @property
/usr/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _filtered_call(self, args, kwargs, cancellation_manager)
1841 args and kwargs.
1842 “”"
-> 1843 return self._call_flat(
1844 [t for t in nest.flatten((args, kwargs), expand_composites=True)
1845 if isinstance(t, (ops.Tensor,
/usr/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager) 1921 and executing_eagerly): 1922 # No tape is watching; skip to running the function. -> 1923 return self._build_call_outputs(self._inference_function.call( 1924 ctx, args, cancellation_manager=cancellation_manager)) 1925 forward_backward = self._select_forward_and_backward_functions(
/usr/lib/python3.8/site-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager) 543 with _InterpolateFunctionError(self): 544 if cancellation_manager is None: –> 545 outputs = execute.execute( 546 str(self.signature.name), 547 num_outputs=self._num_outputs,
/usr/lib/python3.8/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 57 try: 58 ctx.ensure_initialized() —> 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 60 inputs, attrs, num_outputs) 61 except core._NotOkStatusException as e:
NotFoundError: No algorithm worked! [[node functional_1/conv2d/Conv2D (defined at <ipython-input-7-10b06c61fca5>:1) ]] [Op:__inference_train_function_2021]
Function call stack: train_function
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 47 (9 by maintainers)
Adding the following lines fixes the issue:
Odd, because I didnt need them before.
This is needed for the computer with the RTX 2070 Super. The one with GTX 1080 TI doesnt need them. Same libraries.
same issue here with docker image
tensorflow/tensorflow:2.4.0-gpu-jupyterRTX 2070 Super. P.S. Restarting the kernel and adding the code below resolves the issue.Yes!!! This solved my problem on RTX 2070 card!
I faced this very same error and turns out that when you read a Grayscale Image as an RGB Image (as I was using ImageDataGenerator and did not provide
color_mode="grayscale"as an argument to the generator when flowing the data), it throws the “Algorithm did not work” sort of error.Hopefully it helps someone!
I had the exact same error but for a much more stupid mistake than people mentioned above. Model was defined with input shape (height, width, 1) but images were RGB, not grayscale.
I see there is a “solution”, but this should never happen in a popular library in the first place. Why is this happening, and why users need to add this extra code to get tensorflow to work? I have another process that uses 6gb memory and for some reason tf cannot simply coexist with it, when I kill it the super helpful error message “UNIMPLEMENTED: DNN library is not found” disappears. I can understand that things can be more efficient if you map the whole gpu memory, but in case of issues at least it would be nice to fail with clear error messages. No issues with pytorch. Ehh, I can only hope Jax works out much better.
Btw, for me the following worked (limiting the memory, to 2gb, single gpu case):
Same here today; the extra config lines for gpu_options.allow_growth = True fixed the “NotFoundError: No algorithm worked!” issue for me for code that was running fine on one PC but gave the above problem on another PC with AFAIK the same TensorFlow configuration, same libraries etc but a different GPU (both NVIDIA).
OS: Windows 10 64-bit TF: 2.4.1 GPU: NVIDIA Quadro T1000 with Max-Q Design (on the other Windows 10 64-bit system PC w/o problems it was NVIDIA Quadro P1000) Python: 3.7.6 CUDA: 11.2 nvcc --version: 11.2, V11.2.67 (on the PC w/o problems: 11.2, V11.2.142) RAM: 32GB (on the PC w/o problems 16GB)
Why is this fix needed only in some cases and why does it work?
BTW, the shorter “os.environ[‘TF_FORCE_GPU_ALLOW_GROWTH’] = ‘true’” also worked fine for me.
This might be too late but still be of help to post it here since I too faced the exact same issue. For me, I just restricted the amount of GPU that the model was allowed to use. Also, the way I did is an older way to do it which I did it in tf 1.15 but it for sure works on the latest tf 2.4 version as well. Below is the code:
set_tensorflow_config()GPU: NVIDIA GeForce RTX 2070, 8GB
Also double check CUDA versions cause CUDA 11 Is on nightly. I don’t know what Arch Linux maintainers done but in that case It Is an Arch Linux issue. See https://github.com/tensorflow/tensorflow/issues/40227
This solution worked for me…Thanks for the help…
To check and will solve easily:
Steve
In my case, this error appears because the memory is full. Try to check
nvidia-smivia terminal. In my case, I use cloud server usingjupyter. Shutdown all kernels (not only close the file, but shutdown), and restarting solve the issue.Hope it helps anyone stumble upon this.
Worked for me: OS: Ubuntu 18.04 TensorFlow version: 2.4.1 Python version: 3.7.10 CUDA/cuDNN version: Cuda is 11.0, cuDNN is 8.0.4 GPU model and memory: GeForce GTX 1660 SUPER
Still doesn’t make any sense, thanks anyway 😃
Same issue on RTX 2060 Ti with TF 2.4.1 If I upgrade TF 2.4.1 to tf-nightly, the missing CUDA library error occurs.
The proposed above solution was best for me. But still, it has drawbacks that limit the GPU memories. It could be a temporary solution.
Same worked for me on 2070 Super tf 2.4.1 Thanks
Closing as stale. Please reopen if you’d like to work on this further.